Developing effective data scientists

Last year I had the honor and pleasure of visiting my graduate school alma mater to speak at the kick-off event for the University of Michigan’s new Institute for Data Science. They invited me to share my work and teaching experiences, and offer guidance to students and educators of data science. I asked a bunch of my data scientist friends what they could have done differently to better prepare themselves for their careers, and there were a lot of similarities in the responses. I’ve attempted to distill them into three core competencies: communication, coding, and curiosity… and alliteration, when possible.

Effective data scientists communicate well

Communicating technical concepts to non-technical audiences is a huge part of a data scientist’s job. To do that well, you need a solid understanding of the techniques in your toolbox, and should be able to provide intuitions for why the techniques work. Also expect to spend a lot of time communicating technical concepts to other technical people from different disciplines, like software engineers. Communicating effectively with engineers requires that you think about data products in terms of their inputs and outputs. What inputs do your algorithms need in order to work and where will those inputs come from? What does the algorithm return as a result, and what’s the level of uncertainty?

Effective data scientists gesticulate wildly to drive home key points.

In addition to the technical aspects of communication, data scientists are often expected to evangelize and advocate for their work more than is typical in older disciplines like software engineering. It’s critical for data scientists to understand the business in a broader context than their team or department, and also be able to identify specifically how their contributions fit into the broader context. At Nordstrom we did this with dollars, the ideal unit of measurement. We worked primarily in the area of product recommendations, so we could directly connect recommendation activity with sales. Your work won’t always be easy to quantify, which is why it’s important to establish evaluation metrics before starting projects.

LEARN MORE CODING AND STATISTICS. Meet other people working on data sets not in your field. Learn something not in your field. Don’t sit in a box staring at your thesis project.- Wendy Grus, Technical Data Analyst @ Inrix

Developing communication skills as a student is pretty natural. Try presenting your work, blogging, and teaching others. When I was working on my PhD, my department had a student-run seminar series called Bistro. Bistro was a fantastic way for us to share our work and get feedback in a low-pressure setting. If you want to avoid the random and occasionally terrifying aspects of public speaking, blogging is an excellent way to practice written communication and advance your “personal brand.” Ahem.

Effective data scientists write high-quality code

At Nordstrom my team was split evenly between software developers and data scientists. Working in the Data Lab was the first time in my life that I was not the sole consumer of my own code. Instead I wrote code for a website that got millions of hits everyday. I initially didn’t have the skills to write production-quality code, so my first year as a data scientist was a year of intense learning and buggin’ out about how slow I felt.

The single most useful thing I learned in grad school was how to work with Python.- Sarah Guido, Data Scientist @ bit.ly

You don’t have to be a software engineer, but it’s critical that before graduation you’re proficient in at least one programming language, and the sooner you adopt good coding practices the easier the transition will be. Start using automated testing now, and invest in learning to use your tools better (or get better tools).

I wish there had been more emphasis on real data/messy data/getting my own data/cleaning data.- Trey Causey, Data Scientist @ ChefSteps

Unfortunately, demonstrating language proficiency isn’t enough. As a data scientist you’ll be expected to apply statistical skills in many domains throughout the business. For example, the Data Lab tackles problems ranging from personalization and marketing, to store display testing, to operations. Exposure to lots of different types of data in graduate school–especially large, messy bioinformatics data–prepared me to contribute to lots of different types of projects later.

Effective data scientists are curious, life-long learners

The tools and technologies of data science change at an incredible pace which is at once exciting and exhausting. To succeed in this field it’s important to have a mindset of life-long improvement and learning. Curiosity too is critical, because even if you work for the same company your entire career, you’ll be exposed to a breadth of problems over the years. In short, effective data scientists are usually people who can find something interesting in anything.

I took Naval Criminal Law & international relations. It was hard to stay awake. By comparison, I looked forward to vector calculus and loved Matlab. Lesson: Don’t be afraid to explore your options while you are still on the track of figuring out what comes next.- Amanda Casari, Senior Data Scientist @ Concur

While you’re a student, developing curiosity is hopefully something you don’t need to work too hard at (you’re so young to be this jaded). Try something simple like grabbing an Oprah Chai™ with someone outside your department and then think about how you might approach their research questions using your computational and quantitative tools. Also, go to a meet-up. Meet-ups can be a great place to learn about tools and techniques that are actually being applied out in the wild (and meet really fun and supportive women who program in Python in the Seattle area).

Boundaries between disciplines are becoming less-defined. There are general, powerful patterns of thinking that apply to many disciplines. A good way to become aware of these patterns is to broaden exposure to disciplines.- James Pestrak, Senior Data Scientist @ Nordstrom

I love the last two quotes because they advance the idea of casting a wide net in terms of academic interests. Ultimately, a data scientist is a person with a generalizable set of quantitative skills. As you grow and evolve, and your interests change, data science doesn’t lock you into any particular career. Instead, data science enables you to approach any problem that can be solved with quantitative tools, and that’s extremely powerful.