Monday, November 20, 2017

Python in SQL Server 2017

MS development team has been added Advanced Analytics Extension or Machine Learning Services in SQL Server 2017 by enabling SQL server to execute Python scripts within TSQL via ‘Machine Learning Services with Python’. In SQL Server 2017, it will allow us to process data in the database by using any Python function or package without needing to export the data from the database. We can use SQL Server itself as an operationalization platform for production applications using Python code.
The addition of Python builds on the foundation laid for R Services in SQL Server 2016 and extends that mechanism to include Python support for in-database analytics and machine learning. In this way, Microsoft development team renamed R Services to Machine Learning Services, where R and Python are two main options under this feature.
Now, Microsoft gives us a highly recommended option to use Python within the Machine Learning Services to showing that how a database can trigger an external process to perform an activity on the data which is provided as a parameter.
Python integration in SQL Server Advantages 
After integration of Python in SQL Server, we are getting the following advantages –
  • Enterprise-grade performance and scale: We can use SQL Server’s advanced capabilities like in-memory table and column store indexes with the high-performance scalable APIs in RevoScalePy package.
  • RevoScalePy is modeled after RevoScaleR package in SQL Server R Services. Using these with the latest innovations in the open source Python world allows us to bring unparalleled selection, performance, and scale to our SQL Python applications.
  • Rich extensibility: We can install and run any of the latest open source Python packages in SQL Server to build deep learning and AI applications on huge amounts of data in SQL Server. Installing a Python package in SQL Server is as simple as installing a Python package on our local machine.
  • Elimination of data movement: this is the biggest advantage of Python that we are no longer dependent to move data from the database to our Python application or model because we can build Python applications within the database.
  • This removes fences of security, compliance, governance, integrity, and a host of similar issues related to moving vast amounts of data around.
  • This new capability brings Python to the data and runs code inside secure SQL Server environment by using the proven extensibility mechanism built in SQL Server 2016.
  • Easy deployment: Now we have the Python model ready, deploying it in production is now as easy as implanting it in a T-SQL script and then any SQL client application can take advantage of Python-based models and intelligence by a simple stored procedure call.
  • Wide availability at no additional costs: Python integration is available in all editions of SQL Server 2017, including the Express edition.

R and Python already support loading data into data frame from SQL Server. This integration is about moving the R/Python compute to SQL Server machine to eliminate data movement across machines. If we move millions/billions of rows to the client for modeling or scoring then the network overhead will dominate end-to-end execution time.
Moreover the R/Python integration in SQL Server works with parallel query processing in SQL Server, security & resource governance.
The R / Python processes run outside of the SQL Server address space and share the machine resources because data security is the biggest distress to not allow running R / Python within the SQL Server process or memory space.
By default many of the data structures in R / Python are memory resident objects so the same limitations apply. However, Microsoft ships many algorithms as part of the R Server package (RevoScaleR or revoscalepy) that has a SQL Server data source object which can work with data that doesn’t fit in memory and supports parallel execution.

Conclusion
SQL Server 2017 takes in-database analytics to the next level with support for both Python and R; delivering unparalleled scalability and speed with new deep learning algorithms built in. Using SQL data source object, we can run a parallel query in SQL Server that sends data to many R / Python processes in parallel to compute say linmod/logit/tree model. This can also be used for scoring scenarios with streaming capability.
References- Microsoft

No comments:

Post a Comment