It’s okay to not be a data scientist

Everyone wants to be a data scientist

Being a data scientist has been hyped a lot in the past few years. Glassdoor has listed it as the #1 Best Job in America 3 years in a row, and it isn’t hard to find blog posts talking about how great being a data scientist is. It’s no surprise, then, that the main data science subreddit seems to be mostly people asking for advice on how to become a data scientist, rather than about data science.

I’ve been feeling pretty ambivalent about a lot of the advice I’ve seen (and previously given), and I worry it’s been doing people a disservice. A lot of the advice out there (and advertising from certain online courses/bootcamps) make it sound like the only things you need to be a data scientist are a few technical skills - so it’s no surprise that there are questions like this asking why the pay gap between data analysts and data scientists is so large if the lists of technical skills each need to have are the same.

Why isn’t everyone a data scientist?

The truth is, there is more to being a data scientist than learning how to import sklearn. I have to tread carefully here. I don’t want to be seen as gatekeeping. But I think there needs to be some reality check on the idea that anyone can do one machine learning project and put it on github to land a job paying a 6-figure income.

According to the 2017 Burtch Works ‘Salaries of Data Scientists’ report, about 90% of data scientists have advanced degrees (~40% PhDs, ~50% Masters). Of those 10% without a graduate degree, my guess (I don’t have stats on this) the majority have a lot of experience either with pretty high-level data analytics or software engineering.

There are reasons for the large proportion of graduate degrees. Graduate degrees (especially PhDs) in quantitative disciplines indicate that a person has had a certain amount of experience with exploring abstract concepts, developing intuitions for statistical relationships, devising ways of testing hypotheses, translating data into stories, communicating complex results, etc. It may only take a couple of months to learn Python and how to wrangle data in Pandas and build a model in sklearn, but that’s just the surface-level stuff. The real world has complications that require you to have an intuition behind the math and understand what will and won’t work - and how to show that it will or won’t work.

I don’t mean to imply a PhD is required to be a data scientist - that is empirically false (60% don’t, after all). There are many different paths to becoming a data scientist. But there’s no easy path.

The thing to realize about all of those data science bootcamps and online courses is this: for most of the people who get a data scientist job after them, the skills they learned in those courses were just the icing on the cake. If you have a quantitative PhD, or tonnes of software engineering experience, or years of working closely with data scientists as a data analyst, maybe all you need is a short course to pick up how to use a couple of Python packages and you’ll be competitive. If you don’t, most likely you have a longer road to being a data scientist.

You don’t have to be a data scientist

Gaining new skills is fun and probably good for your career regardless of what title you end up with. It’s worth picking up skills that allow you to do things you enjoy. If you currently don’t work as an analytics professional, learning online or doing a data science course could certainly help land that first data analyst position. Maybe if you’re a current data analyst working at a high level, learning some of those skills and incorporating them into your work could eventually lead to a data science position down the road.

But there are more ways ‘up’ than one. Being a data scientist doesn’t need to be your goal. You can end up on a management track and end up leading a team of data analysts. That may lead to managing a large data analytics group, or even to being Chief Analytics Officer one day. My point is, a lot of people are a bit too obsessed with the title ‘Data Scientist’. It’s a good job, but I think people are overvaluing it and underestimating the expected level of experience for it. There are lots of great analytics jobs (and lots of great non-analytics jobs, for that matter), and I think it would benefit some people to take their eyes off the shiny, hyped up ‘data scientist’ title and judge their options more objectively.