基于树(Tree based)的学习算法在数据科学竞赛中是相当常见的。这些算法给预测模型赋予了准确性、稳定性以及易解释性。和线性模型不同,它们对非线性关系也能进行很好的映射。常见的基于树的模型有:决策树(decision trees),随机森林(random forest)和提升树(boosted trees)。
Logistic Regression
We review binary logistic regression. In particular, we derive a) the equations needed to fit the algorithm via gradient descent, b) the maximum likelihood fit’s asymptotic coefficient covariance matrix, and c) expressions for model test point class membership probability confidence intervals. We also provide python code implementing a minimal “LogisticRegressionWithError” class whose “predict_proba” method returns prediction confidence intervals alongside its point estimates.
Random Dilation Networks for Action Recognition in Videos
Lately, we (TwentyBN) took a part in Activity Net trimmed action recognition challenge. The dataset is called Kinetics and recently released. It is a collection of 10 second YouTube videos. Each video has a single label among 400 different action classes. The dataset released by DeepMind with a baseline 61% Top-1 and 81.3% Top-5. For baseline models please refer to their dataset paper. But, it took 2 months for people to briskly hoist the bar high above.
More silliness
Back before I had so many followers, and it was less stressful to put goofy stuff “in the wild”, I wrote data science parody lyrics to “Summer of ’69” and “For the Love of Money”. Well, a while ago, another idea popped into my head..
My 10-step path to becoming a remote data scientist with Automattic
About two years ago, I read the book The Year without Pants, which describes the author’s experience leading a team at Automattic (the company behind WordPress.com, among other products). Automattic is a fully-distributed company, which means that all of its employees work remotely (hence pants are optional). While the book discusses some of the challenges of working remotely, the author’s general experience was very positive. A few months after reading the book, I decided to look for a full-time position after a period of independent work. Ideally, I wanted a well-paid data science-y remote job with an established distributed tech company that offers a good life balance and makes products I care about. Automattic seemed to tick all my boxes, so I decided to apply for a job with them. This post describes my application steps, which ultimately led to me becoming a data scientist with Automattic.
Diffusion of ISIS propaganda on Twitter
My latest work titled “Contagion dynamics of extremist propaganda in social networks” has been published on Information Sciences. The study aims at modeling and understanding the diffusion of extremist propaganda, in particular content in support of ISIS, on social media like Twitter.
Moving On, Looking Back
contact@andreykurenkov.com
发表于
For a while now, I have been living two lives at once: that of a full time software engineer working at Oracle, and that of a CS Masters grad student at Stanford. In one life, finishing the first release of a brand new product with a small team of esoteric engineers scattered across a few different state. In the other, trying to keep up with classwork and delve deeper into research in AI. Granted, I was officially only a grad student part time, but it certainly did not feel like it.
Web scraping the President's lies in 16 lines of Python
Note: This tutorial is available as a video series and a Jupyter notebook, and the dataset of lies is available as a CSV file.
How I Used Deep Learning To Train A Chatbot To Talk Like Me (Sorta)
2 Quick Announcements
- Sorry for the issues loading images and with logins on this site. I’ve had problems ever since I had bluehost set me up with HTTPS certificates, and apparently those certificates have expired or something and are causing issues with the images being able to load, etc. I’ll look into it, but I also have a busy week at work and wasn’t planning on site maintenance here this week, so it might be this way for a little bit. I am aware of it and a fix is on my to-do list, though! Thanks for your patience.