If you want to do statistical analysis or machine learning with data in SQL Server, you can of course extract the data from SQL Server and then analyze it in R or Python. But a better way is to run R or Python within the database, using Microsoft ML Services in SQL Server 2017. Why?
-
It’s faster. Not only to you get to use the SQL Server instance (which is likely to be faster than your local machine), but it also means you no longer have to transport the data over a network, which is likely to be the biggest factor, particularly when working with large data sets.
-
It operationalizes your analysis. Rather than running an ad-hoc analysis on your desktop, you can instead publish it as a stored procedure, and make your analysis available on demand as a SQL query,
-
It’s more powerful. Not only can you use all of open-source R and Python within SQL Server, but you also get access to dedicated Microsoft libraries in R/Python including RevoScaleR/revoscalepy and MicrosoftML/microsoftml, which provide algorithms optimized for large data sets in SQL Server.
If you’d like to learn how to run R within SQL Server, a great resource is the book SQL Server 2017 Machine Learning Services with R, by Tomaž Kaštrun and Julie Koesmarno. At over 300 pages, the book covers every detail, including installation and configuration, an overview of data analysis in R, operationalizing R code in SQL Server and via Power BI, and using the aforementioned extension libraries for R. It’s mostly aimed at R-using data scientists, but does includes a chapter for database administrations to provide an introduction to R for DBAs as well. In short, it contains everything you need to know to build production applications with R and SQL Server 2017 (or 2016), backed up with a wealth of examples from industry.
You can purchase a paperback copy from your favorite bookseller, and you can also get a discounted digital edition from the publisher at the link below.
Packt Publishing: SQL Server 2017 Machine Learning Services with R