Introducing Python for data scientists - Pt1

If you have decided you want to learn Python but your not sure where to start then this post will point you in the right direction.

Part 1 of Python for Data scientists talks about Python generally, before we dive into the specifics for data scientists in part 2.

What is Python? states ‘Python is a programming language that lets you work quickly and integrate systems more effectively.’ It is recommended as a language that is quick and easy for beginners to get to grips with programming. Python is also a very powerful tool for data science and is rapidly adding data science tools to its ecosystem. Here’s a short video that describes some of the tasks Python can be used for:

Python versions

There are two main types of Python; Python 2 and Python 3. Python 3.0 was released in 2008 and was not backwards compatible. Initially the Python community was not keen to start using Python 3, however fast forward to 2018 and the majority of people now recommend beginners learn Python 3. The main reason to use Python 3 is that many of the main Python packages will soon drop support for Python 2.

Python packages

Python packages are additional packages of tools that are built on top of the standard Python language. These tools allow a user to undertake complex tasks with far less effort and lines of code than would be required if just using standard Python. These tools are normally faster than equivalent standard Python code as they are written in faster languages such as C. The following is a list of some common Python packages:

  • pip - is used to install and manage software packages

  • Numpy - adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

  • Pandas - adds high-performance, easy-to-use data structures and data analysis tools

  • SciPy - adds support for science, maths and engineering

  • Jupyter notebooks - allows you to create easily shareable documents that combine computer code, code output and human readable text; here are some examples.

You will almost certainly want to make use of Python packages, the easiest way to install these additional packages is described in the Installation section below.

General learning resources

There a many resources for learning Python available on the web, the following is a list of my favorites:

  • Learn Python the Hard Way - This book has a cult like fan base and is a very popular learning resource.

  • Learn Python and and Code Cademy - These are some great online tools that allow you to learn Python without having to install it on your computer.

  • Coursera - Coursera offers world class (often free) tuition for learning Python

  • edX - edX is similar to Coursera and was setup by Harvard and MIT to offer quality online courses.

  • Udemy - Udemy uses content from online content creators to create courses.

  • Numpy for Matlab Users - For Matlab users, this site lists the key differences between using Matlab and Numpy.

  • Project Euler - this is a personal favorite of mine. Project Euler contains a variety of mathematical problems that you try to solve using programming languages.

Installing Python

There are a few ways to install Python on your computer but by far the easiest method for a beginner is to use a distribution called Anaconda. Anaconda is an open source package management system that installs Python and over 150 common packages automatically. This saves considerable time and confusion and only takes a few clicks. Anaconda can be downloaded here.