Hey! Here’s what to do when you have two or more surveys on the same population!
This problem comes up a lot: We have multiple surveys of the same population and we want a single inference. The usual approach, applied carefully by news organizations such as Real Clear Politics and Five Thirty Eight, and applied sloppily by various attention-seeking pundits every two or four years, is “poll aggregation”: you take the estimate from each poll separately, if necessary correct these estimates for bias, then combine them with some sort of weighted average.
Whats new on arXiv
Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning
RATest. A Randomization Tests package is available on CRAN
This blog post introduces the RATest package we released a while back on CRAN with my colleague and good friend Mauricio Olivares-Gonzalez. The package contains a collection of randomization tests, data sets and examples. The current version focuses on two testing problems and their implementation in empirical work, mostly related to economics. First, it facilitates the empirical researcher to test for particular hypotheses, such as comparisons of means, medians, and variances from k populations using robust permutation tests, which asymptotic validity holds under very weak assumptions, while retaining the exact rejection probability in finite samples when the underlying distributions are identical. Second, it implements Canay and Kamat (2017) permutation test for testing the continuity assumption of the baseline covariates in the sharp regression discontinuity design (RDD).
Distilled News
Internet of Things and data mining: From applications to techniques and systems
R Packages worth a look
A Double Bootstrap Method for Analyzing Linear Models with Autoregressive Errors (DBfit)Computes the double bootstrap as discussed in McKnight, McKean, and Huitema (2000) <doi:10.1037/1082-989X.5.1.87>. The double bootstrap method pr …
On receiving the Community Leadership Award at the NumFOCUS Summit 2018
At the end of September I was honoured to receive the Community Leadership Award from NumFOCUS for my work building out the PyData community here in London and at associated events. This was awarded at the NumFOCUS 2018 Summit, I couldn’t attend the New York event and James Powell gave my speech on my behalf (thanks James!).
On helping to open the inaugural PyDataPrague meetup
A couple of weeks back I had the wonderful opportunity to open the PyDataPrague meetup – this is the second meetup I’ve opened after our PyDataLondon started back in 2014. The core organisers Ondřej Kokeš, Jakub Urban and Jan Pipek asked me to give two short talks on:
Deriving Expectation-Maximization
Consider a model with parameters (\theta) and latent variables (\mathbf{Z}); the expectation-maximization algorithm (EM) is a mechanism for computing the values of (\theta) that, under this model, maximize the likelihood of some observed data (\mathbf{X}).
Characterizing Online Public Discussions through Patterns of Participant Interactions
New Paper at CSCW 2018 with Justine Zhang, Cristian Danescu-Niculescu-Mizil, and Christy Sauper [link to paper]