When I first started to learn about machine learning, specifically supervised learning, I eventually felt comfortable with taking some input $\mathbf{X}$, and determining a function $f(\mathbf{X})$ that best maps $\mathbf{X}$ to some known output value $y$. Separately, I dove a little into time series analysis and thought of this as a completely different paradigm. In time series, we don’t think of things in terms of features or inputs; rather, we have the time series $y$, and $y$ alone, and we look at previous values of $y$ to predict future values of $y$.
TSrepr - Time Series Representations in R
I’m happy to announce a new package that has recently appeared on CRAN, called “TSrepr” (version 1.0.0: https://CRAN.R-project.org/package=TSrepr).
Habits and Tools, Old and New
The Prelude
My notes on (Liang et al., 2017): Generalization and the Fisher-Rao norm
After last week’s post on the generalization mystery, people have pointed me to recent work connecting the Fisher-Rao norm to generalization (thanks!):
9 new pandas updates that will save you time
Since launching my Python pandas video series in 2016, there have been 10 new releases of the pandas library, including hundreds of new features, bug fixes, and API changes. I was running version 0.18 at the time, but I’ve finally upgraded to version 0.22 (the latest release).
Lessons learned in my first year as a data scientist
It’s been a year since I started my first job as a data scientist. In that time, I’ve learned a lot, but most of that learning hasn’t been the type I expected. I’ve certainly learned some things about new technologies and techniques, but much of what I’ve learned has been about how to actually make my skills useful to others in the company.
Kernel Feature Selection via Conditional Covariance Minimization
Feature selection is a common method for dimensionality reduction that encourages model interpretability. With large data sets becoming ever more prevalent, feature selection has seen widespread usage across a variety of real-world tasks in recent years, including text classification, gene selection from microarray data, and face recognition. We study the problem of supervised feature selection, which entails finding a subset of the input features that explains the output well. This practice can reduce the computational expense of downstream learning by removing features that are redundant or noisy, while simultaneously providing insight into the data through the features that remain.
Motivation in Academia vs Industry
I’ve been in industry working as a data scientist for a year now after leaving academia. By all measures I can think of, it’s been a good year. I got lucky with having a great manager, I’ve gotten a bunch of experience and learned a lot, I work at a place where doing good work means actually helping sick people, and I even lucked out with a promotion, allowing me to try out a slightly more managerial position so I get to see how I like those responsibilities while still making my own development contributions.
Java Autonomous driving – Car detection
Are youJava Developer and eager to learn more about Deep Learning and his applications, but you are not feeling like learning another language at the moment ? Are you facing lack of the support or confusion with Machine Learning and Java?
The Generalization Mystery: Sharp vs Flat Minima
I set out to write about the following paper I saw people talk about on twitter and reddit: