Intercausal Reasoning in Bayesian Networks

This might look a little funky for those used to Bayesian parameter estimation for univariate systems - estimating $P(\text{heads})$ given the results of 15 coinflips, for example.

So, how do we do this? Do we filter the data for all unique combinations of (president, accident), i.e. $(0, 0)$, $(0, 1)$, $(1, 0)$, $(1, 1)$ then estimate in the same fashion as above? That way, we could estimate $P(\text{traffic} = 1\ \ \text{president} = 0, \text{accident} = 0)$, $P(\text{traffic} = 1\ \ \text{president} = 1, \text{accident} = 0)$, etc. In fact, this would work. But what if we added more variables? And what if the values were continuous? This would get messy quickly.

In fact, modeling $P(\text{Traffic}\

\ \text{President}, \text{Accident})$ is none other than logistic regression. Think about it!

Herein, we’ll formulate our model as a vanilla Binomial (logistic) regression, taking the form: $\text{traffic} \sim \text{Binomial}(1, p)\\ \log\bigg(\frac{p}{1 - p}\bigg) = \alpha + \beta_P P + \beta_A A\\ \alpha \sim \text{Normal}(0, 10)\\ \beta_P \sim \text{Normal}(0, 10)\\ \beta_A \sim \text{Normal}(0, 10)\\$

$P$ and $A$ represent president and accident respectively. The priors on their respective coefficients are meant to be uniformative - a mere guard against our sampler running off to check values really big or really small.

Finally, as this is one of my first times with PyMC3, I fit a baseline logistic regression model from scikit-learn on the same data just to check that our parameter estimates make sense. Note that scikit-learn’s implementation specifies an L2 penalty - whose strength is controlled by an additional parameter $C$ - on the cost by default, which is analagous to placing Gaussian priors on model coefficients in the Bayesian case. I did not take care to ensure that the hyper-parameters ($\mu_P$ and $\sigma_P$, for example) of our Normal priors are comparable with the analagous parameter $C$ in the scikit-learn model.