Thursday, November 23, 2017

SQL Server 2017 - Advanced Analytics with Python

SQL Server 2017 is bringing the new capability to deploy Python based machine learning application in enterprise production environment which is already running in SQL Server. This additional capability allows us to bring any open source Python packages in SQL server to execute within T-SQL based stored procedures or codes.
Architecture - Python in SQL Server
Microsoft introduced sp_execute_external_script which is a system stored procedure to execute external scripts. We can submit R or Python script to this stored procedure in the secure manner. SQL server runs this query with the help of external process called Launchpad which launches this query and this concept is known as runtime launcher. After executing, it’s connected back to SQL Server on the SQL Server satellite data channel. It’s working as additional post process.
Python integration in SQL Server gives us the following key advantages-
Eliminate data movement with in-database machine learning means you are free to do machine learning on the Python based compute on the large datasets and there is no need to move your data into another place.
Deploy your machine learning model and Python scripts right in T-SQL stored procedure with will talk more about in the breath. In this case, any app that can talk to SQL server and invoke with T-SQL stored procedures without being aware or knowing that Python package or Python model is running in the background.
It is allowing the performance and scale advantages also. Database analytics integration in SQL Server 2016 introduced new communication channel to exchange the data from SQL server to scripts which are running inside SQL Server which is like columnar based compression channel, it’s much faster than the ODBC. So, Python also uses the same integration because SQL Server also has the in-memory column store indexes and in-memory tables which are very highly recommended for large scale data analytics.
In addition of this integration, SQL Server 2017 is also bring the something called revoscalepy which is equalent to revoscaleR which are not bonded by the memory or scale like that if we are doing the liner regression on the billion rows of dataset then it’s really doable without any memory limiting which means enterprise level scale grade.

Key scenarios which are enables by Python integration in SQL Server
Data scientists are able to do data exploration and data modeling development on their local machines as well as SQL Servers machine. This is the main advantage of the revoscale Python package which makes the local or remote compute more transparent and there is no need to pushing data round because computed results come back to the local work-stations from the remote SQL Server.
Application developers are capable to operationalize the Python code because it’s running inside the stored procedures and they do not worry to care about that which version of Python is running inside T-SQL code. We can say that intelligence is truly with the data and application developers do not worry all about it.

In additional key benefit is for database administrator where admin can choose to say that this kind of logins cannot use 5-6% CPU. So, Python code cannot use 5-6% CPU. A dba can also see the database logs and views to verify that who is executing the Python code as the regular part of the database management.
Conclusion
Native integration of Python in SQL server for in-database machine learning and the high-performance and distributed statistical and machine learning functions from the RevoScaleR, RevoScalePy and MicrosoftML packages in Microsoft R will be available as Python functions for use within SQL Server.
Reference - Microsoft blogs

No comments:

Post a Comment

Popular Posts