MS development team has been added Advanced Analytics Extension or Machine Learning Services in SQL
Server 2017 by enabling SQL server to execute Python scripts within TSQL via
‘Machine Learning Services with Python’. In SQL Server 2017, it will allow us to
process data in the database by using any Python function or package without
needing to export the data from the database. We can use SQL Server itself as
an operationalization platform for production applications using Python code.
The addition of Python builds on the foundation
laid for R Services in SQL Server 2016 and extends that mechanism to include
Python support for in-database analytics and machine learning. In this way,
Microsoft development team renamed R Services to Machine Learning Services, where R and Python are two main options
under this feature.
Now, Microsoft gives us a highly recommended option
to use Python within the Machine
Learning Services to showing that how a database can trigger an external
process to perform an activity on the data which is provided as a parameter.
Python
integration in SQL Server Advantages
After integration of Python
in SQL Server, we are getting the following advantages –
- Enterprise-grade performance and scale: We can use SQL Server’s advanced capabilities like in-memory table and column store indexes with the high-performance scalable APIs in RevoScalePy package.
- RevoScalePy is modeled after RevoScaleR package in SQL Server R Services. Using these with the latest innovations in the open source Python world allows us to bring unparalleled selection, performance, and scale to our SQL Python applications.
- Rich extensibility: We can install and run any of the latest open source Python packages in SQL Server to build deep learning and AI applications on huge amounts of data in SQL Server. Installing a Python package in SQL Server is as simple as installing a Python package on our local machine.
- Elimination of data movement: this is the biggest advantage of Python that we are no longer dependent to move data from the database to our Python application or model because we can build Python applications within the database.
- This removes fences of security, compliance, governance, integrity, and a host of similar issues related to moving vast amounts of data around.
- This new capability brings Python to the data and runs code inside secure SQL Server environment by using the proven extensibility mechanism built in SQL Server 2016.
- Easy deployment: Now we have the Python model ready, deploying it in production is now as easy as implanting it in a T-SQL script and then any SQL client application can take advantage of Python-based models and intelligence by a simple stored procedure call.
- Wide availability at no additional costs: Python integration is available in all editions of SQL Server 2017, including the Express edition.
R and Python already support loading data into
data frame from SQL Server. This integration is about moving the
R/Python compute to SQL Server machine to eliminate data movement across
machines. If we move millions/billions of rows to the client for modeling or
scoring then the network overhead will dominate end-to-end execution time.
Moreover the R/Python integration in SQL Server
works with parallel query processing in SQL Server, security & resource
governance.
The R / Python processes run outside of the SQL
Server address space and share the machine resources because data security is
the biggest distress to not allow running R / Python within the SQL Server
process or memory space.
By default many of the data structures in R /
Python are memory resident objects so the same limitations apply. However,
Microsoft ships many algorithms as part of the R Server package (RevoScaleR or
revoscalepy) that has a SQL Server data source object which can work with data
that doesn’t fit in memory and supports parallel execution.
Conclusion
SQL Server 2017 takes in-database analytics to
the next level with support for both Python and R; delivering unparalleled
scalability and speed with new deep learning algorithms built in. Using SQL data source object, we can run a parallel query in SQL Server that sends data to many R / Python processes in parallel to compute say linmod/logit/tree model. This can also be used for scoring scenarios with streaming capability.
References- Microsoft
No comments:
Post a Comment