Andrew suggested I cross-post these from the Stan forums to his blog, so here goes.
Maximum marginal likelihood and posterior approximations with Monte Carlo expectation maximization: I unpack the goal of max marginal likelihood and approximate Bayes with MMAP and Laplace approximations. I then go through the basic EM algorithm (with a traditional analytic example in the appendix). Only then do I get to the (Markov chain) Monte Carlo approach to the marginalization, stochastic averaging EM (SAEM), generalized EM, computing gradients of expectations with Monte Carlo (the trick used in Stan’s variational inference algorithm ADVI), and then I conclude with Andrew’s new algorithm, gradient-based marginal optimization (GMO). My goal is to define the algorithms well enough to be implemented. I was just trying to understand MML and the SAEM algorithm (from Monolix) so I could talk to the folks like Julie Bertrand and France Mentre here at Paris-Diderot. Eventually, it led me to a much better understanding of GMO and why Andrew thinks of MMAP not as a Bayesian-motivated estimator but as the basis of a posterior approximation.