Most data scientists have to write code to analyze data or build products. While coding, data scientists act as software engineers. Adopting best practices from software engineering is key to ensuring the correctness, reproducibility, and maintainability of data science projects. This post describes some of our efforts in the area.
One of many data science Venn diagrams. Source: Data Science Stack Exchange
Data science is often defined as the intersection of many fields, including software engineering and statistics. However, as demonstrated by the above Venn diagram, viewing it as an intersection tends to be too exclusive – in reality, it’s a union of many fields. Hence, data scientists tend to come from various backgrounds, and it is common to encounter data scientists with no formal training in computer science or software engineering. According to Michael Hochster, data scientists can be classified into two types