Top 10 Python Data Science Libraries

Python continues to lead the way when it comes to Machine Learning, AI, Deep Learning and Data Science tasks. According to builtwith.com, 45% of technology companies prefer to use Python for implementing AI and Machine Learning.

Because of this, we’ve decided to start a series investigating the top Python libraries across several categories:

Top 8 Python Machine Learning Libraries

Top 13 Python Deep Learning Libraries

Top 10 Python Data Science Libraries – this post

Top X Python Reinforcement Learning and evolutionary computation Libraries – COMING SOON!

Of course, these lists are entirely subjective as many libraries could easily place in multiple categories. As always, please feel free to vent your frustrations/disagreements/annoyance in the comments section below!

Top 10 Python Data Science Libraries by GitHub Contributors, Commits and Size (size of the circle)

Now, let’s get onto the list (GitHub figures correct as of November 16th, 2018):

1. pandas (Contributors – 1328, Commits – 18162, Stars – 16890)

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real worlddata analysis in Python.”

2. Matplotlib (Contributors – 771, Commits – 27937, Stars – 8224)

“Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shell (à la MATLAB or Mathematica), web application servers, and various graphical user interface toolkits.”

3. NumPy (Contributors – 708, Commits – 19241, Stars – 8666)

“NumPy is the fundamental package needed for scientific computing with Python. It provides a powerful N-dimensional array object, sophisticated (broadcasting) functions, tools for integrating C/C++ and Fortran code and useful linear algebra, Fourier transform, and random number capabilities.”

4. SciPy (Contributors – 670, Commits – 20080, Stars – 5096)

“SciPy (pronounced “Sigh Pie”) is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.”

5. Bokeh (Contributors - 325, Commits - 17365, Stars - 8439)

“Bokeh is an interactive visualization library for Python that enables beautiful and meaningful visual presentation of data in modern web browsers. With Bokeh, you can quickly and easily create interactive plots, dashboards, and data applications.”

6. Gensim (Contributors - 299, Commits - 3676, Stars - 8107)

“Gensim is a Python library for topic modellingdocument indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.”

7. Scrapy (Contributors – 295, Commits – 6802, Stars – 30014)

“Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.”

8. StatsModels (Contributors – 164, Commits – 10896, Stars – 3383)

“Statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.”

9. plotly.ly (Contributors – 62, Commits – 3291, Stars – 4218)

“plotly.ly is an interactive, open-source, and browser-based graphing library for Python. Built on top of plotly.js, plotly.py is a high-level, declarative charting library. plotly.js ships with over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts, and more.”

10. pydot (Contributors – 12, Commits – 169, Stars – 267)

“pydot is an interface to Graphviz, can parse and dump into the DOT language used by Graphviz and is written in pure Python.”

 

Keep an eye out for the final part of this series - which focuses on Reinforcement Learning and evolutionary computation libraries - that will be published over the next few weeks!

Resources:

Related: