DataRobot, the Boston-based Data Science company, enables business analysts to build predictive analytics with no knowledge of Machine Learning or programming. It uses automated ML to build and deploy accurate predictive models in a short span of time.
DataRobot was founded in 2012 in Boston by Jeremy Achin and Tom de Godoy. Both of them come with extensive experience in dealing with data science and ML models. In the most recent funding round, it has raised $54 million in Series C by New Enterprise Associates (NEA). The company has so far raised about $125 million from Accomplice, NEA, IA Ventures and Intel among other investors.
DataRobotSource: DataRobot
AutoML is revolutionizing data and AI domains by bringing the power of predictive analytics to businesses. An analyst who is familiar with mainstream business intelligence tools can leverage AutoML platforms to build and deploy highly sophisticated machine learning models. Experienced data scientists can go multiple levels deeper than a business analyst to customize and optimize the models.
The DataRobot platform has evolved during the last few years to take advantage of the innovations in the public cloud. Enterprises can choose to run the software either in the public cloud or on-premises data center. The hosted version dubbed as DataRobot Cloud Platform currently runs on AWS. At AWS re:Invent last year, the company achieved Amazon Web Services (AWS) ML Competency status. DataRobot claims that customers have built over 500,000,000 models on the DataRobot Cloud on AWS.
DataRobot delivers a wizard-style of user experience to generate Machine Learning models. In just six steps, businesses can deploy a real-time predictive analytics service backed by an accurate Machine Learning model.
It all starts by uploading the dataset to DataRobot platform, which accepts input from a file, a remote URL, a JDBC data source or HDFS.
Once data is ingested, the platform infers the schema by suggesting appropriate data types for each feature. Business analysts and data scientists can perform necessary to advanced data exploratory activities on the ingested dataset. Finally, they need to select the target label which is going be predicted by the model.
During the third step, all the user has to do is to click the start button to kick off the modeling process. This is when DataRobot creates a long queue of algorithms and trains each of them with the uploaded dataset. Depending on the dataset size and the number of algorithms, it may take a few minutes to hours for the job to get done.
Finally, the results of the modeling process are displayed in the model leaderboard, with the best models (based on the chosen performance metric) at the top of the list.
Users can explore each model to understand the methodology and the parameters used in the training process. They can test each model with a test dataset to measure the accuracy.
Once a model has been chosen from the leaderboard, it is deployed as an API. The endpoint becomes ready to deal with production data. Any developer with the API key can invoke this like any other RESTful service.
The beauty of DataRobot lies in its extensibility. Depending on the comfort level and the job function, users can go into the depth of the platform to take control of the workflow. While business analysts can use it as a wizard, experienced data scientists can tweak a lot of parameters to get precise models.
Based on my personal experience, DataRobot elegantly dealt with business problems like sales forecast, customer churn analysis, anomaly detection and time-series modeling.
There are a few limitations to the platform which may be addressed in the future releases. One of the limitations is that it cannot deal with images and unstructured data. I couldn’t train a Convolutional Neural Network (CNN) to classify images and objects. The model collection includes deep learning algorithms but they cannot be used for CNNs at this point.
Customers with premium option enabled can export a fully trained model as a self-contained Java JAR file to host it outside of DataRobot platform. This feature is important in scenarios where the ML models are running in offline mode.
With AutoML becoming the key to democratizing AI, every platform vendor is jumping the bandwagon. Google is all set to launch its AutoML APIs, and Microsoft has already gone live with its custom cognitive services.
DataRobot is one of the unique AutoML platforms that can run in the public cloud as well as on-premise data centers. Acute shortage of data scientists combined with evolving privacy regulations makes DataRobot very attractive for enterprises.
The future of AI is in AutoML. As an early mover in this space, DataRobot has the potential to become a leading AutoML platform.
Update at 01:20 ET - Updated the model export feature and support for deep learning algorithms.