Tom Daula points us to this post, “Economic Predictions with Big Data: The Illusion of Sparsity,” by Domenico Giannone, Michele Lenza, and Giorgio Primiceri, and writes:
The paper wants to distinguish between variable selection (sparse models) and shrinkage/regularization (dense models) for forecasting with Big Data. “We then conduct Bayesian inference on these two crucial parameters—model size and the degree of shrinkage.” This is similar to your recent posts on the two-way interaction of machine learning and Bayesian inference, as well as to your multiple comparisons paper. The conclusion is the data indicate variable selection is bad, a zero coefficient with zero variance is too strong. My intuition is that the results are not surprising since the data favoring exactly 0 is so unlikely, but I assume the paper fleshes out the nuance (or explains why that intuition is wrong). Here is the abstract:
We compare sparse and dense representations of predictive models in macroeconomics, microeconomics, and finance. To deal with a large number of possible predictors, we specify a prior that allows for both variable selection and shrinkage. The posterior distribution does not typically concentrate on a single sparse or dense model, but on a wide set of models. A clearer pattern of sparsity can only emerge when models of very low dimension are strongly favored a priori.
I [Daula] haven’t read the paper yet, but noticed the priors while skimming.
(p.4) “The priors for the low dimensional parameters φ and σ^2 are rather standard, and designed to be uninformative.” The coefficient vector phi has a flat prior which you’ve shown is not uninformative and the probability of σ^2 is inversely proportional to σ^2 (no idea where that comes from, but nothing like what you recommend in the Stan documentation).
The overall setup seems reasonable, but I’m curious how you would set it up if you had your druthers.
My quick response is that I’m sympathetic to the argument of Giannone et al., as it’s similar to something I wrote a few years ago, Whither the “bet on sparsity principle” in a nonsparse world? Regarding more specific questions of modeling: Yes, I think they should be able to do better than uniform or other purportedly noninformative priors. When it comes to methods for variable selection and partial pooling, I guess I’d recommend the regularized horseshoe from Juho Piironen and Aki Vehtari.