Thursday, November 23, 2017

External Scripts Enabled server configuration option

This configuration setting is applied for SQL Server 2016 R Services (In-Database) and SQL Server 2017 Machine Learning Services (In-Database).

Use the external scripts enabled option to enable the execution of scripts  with certain remote language extensions (in this case R or Python language). This property is OFF by default. When Advanced Analytics Services is installed, setup can optionally set this property to true. 

To enable normal function of this external stored procedure, we must have administrator access to your SQL Server instance in order to run sp_configure command (and set following configuration):
EXECUTE sp_configure;
GO
----To enable execution of external script add an argument:
EXECUTE sp_configure 'external scripts enabled', 1;
GO
-----And after that run the reconfiguration as:
RECONFIGURE;
GO
----- After configuration the value sould be like as
name
minimum
maximum
config_value
run_value
external scripts enabled
0
1
1
1

Note: We must have to restart SQL Server to make this change effective. It would be great if we could restart SQL Server Launchpad (SQLSERVER) services also.
SP_EXECUTE_EXTERNAL_SCRIPT is a system stored procedure in SQL Server 2016 onward which is responsible to execute the provided script as argument on external script to a provided language.

Conclusion

To External Scripts Enabled server configuration option, we must have the administrator privileges on the SQL Server and after configuration it, we should restart the SQL Server services and have to check if SQL Server Launchpad services is still running or not. 

SQL Server 2017 - Advanced Analytics with Python

SQL Server 2017 is bringing the new capability to deploy Python based machine learning application in enterprise production environment which is already running in SQL Server. This additional capability allows us to bring any open source Python packages in SQL server to execute within T-SQL based stored procedures or codes.
Architecture - Python in SQL Server
Microsoft introduced sp_execute_external_script which is a system stored procedure to execute external scripts. We can submit R or Python script to this stored procedure in the secure manner. SQL server runs this query with the help of external process called Launchpad which launches this query and this concept is known as runtime launcher. After executing, it’s connected back to SQL Server on the SQL Server satellite data channel. It’s working as additional post process.
Python integration in SQL Server gives us the following key advantages-
Eliminate data movement with in-database machine learning means you are free to do machine learning on the Python based compute on the large datasets and there is no need to move your data into another place.
Deploy your machine learning model and Python scripts right in T-SQL stored procedure with will talk more about in the breath. In this case, any app that can talk to SQL server and invoke with T-SQL stored procedures without being aware or knowing that Python package or Python model is running in the background.
It is allowing the performance and scale advantages also. Database analytics integration in SQL Server 2016 introduced new communication channel to exchange the data from SQL server to scripts which are running inside SQL Server which is like columnar based compression channel, it’s much faster than the ODBC. So, Python also uses the same integration because SQL Server also has the in-memory column store indexes and in-memory tables which are very highly recommended for large scale data analytics.
In addition of this integration, SQL Server 2017 is also bring the something called revoscalepy which is equalent to revoscaleR which are not bonded by the memory or scale like that if we are doing the liner regression on the billion rows of dataset then it’s really doable without any memory limiting which means enterprise level scale grade.

Key scenarios which are enables by Python integration in SQL Server
Data scientists are able to do data exploration and data modeling development on their local machines as well as SQL Servers machine. This is the main advantage of the revoscale Python package which makes the local or remote compute more transparent and there is no need to pushing data round because computed results come back to the local work-stations from the remote SQL Server.
Application developers are capable to operationalize the Python code because it’s running inside the stored procedures and they do not worry to care about that which version of Python is running inside T-SQL code. We can say that intelligence is truly with the data and application developers do not worry all about it.

In additional key benefit is for database administrator where admin can choose to say that this kind of logins cannot use 5-6% CPU. So, Python code cannot use 5-6% CPU. A dba can also see the database logs and views to verify that who is executing the Python code as the regular part of the database management.
Conclusion
Native integration of Python in SQL server for in-database machine learning and the high-performance and distributed statistical and machine learning functions from the RevoScaleR, RevoScalePy and MicrosoftML packages in Microsoft R will be available as Python functions for use within SQL Server.
Reference - Microsoft blogs

Wednesday, November 22, 2017

Azure Cosmos DB

Microsoft aims to help us better manage our data with Microsoft’s globally distributed multi-model database service like Azure Cosmos DB which transparently scales and replicates your data wherever our users are. We can quickly create and query document, key/value, and graph databases, all of which benefit from the global distribution and horizontal scale capabilities at the core of Azure Cosmos DB.
Azure Cosmos DB provides native support for NoSQL choices, offers multiple well-defined consistency models, and guarantees single-digit-millisecond latencies at the 99th percentile, and guarantees high availability with multi-homing capabilities and low latencies anywhere in the world.
Multiple API and data model- Graphs, documents, columnar, key-value are all natively supported because of the ARS (atoms, records and sequences) design that Cosmos DB is built on. Microsoft team has been exposed the APIs for a subset of models right now (DocumentDB, MongoDB, Table, and Graph APIs) and we are working to expose other APIs in a similar way –
  1. SQL API - Azure Cosmos DB natively supports document data model with familiar SQL API. 
  2. MongoDB - A MongoDB database service built on top of Cosmos DB. Compatible with existing MongoDB libraries, drivers, tools and applications.
  3. Azure Table API - Azure Cosmos DB offers native support for key-value pairs data with Azure Table API
  4. Graph API - Azure Cosmos DB offers native support for graphs with Apache Gremlin API.

Cosmos DB has a schema agnostic indexing engine capable of automatically indexing all the data it ingests without requiring any schema or secondary indexes from the developer. The engine relies on a set of logical index layouts (inverted, columnar, tree) which decouple the storage layout from the index and query processing subsystems.
Key capabilities - As a globally distributed database service, Azure Cosmos DB provides the following capabilities to help you build scalable, highly responsive applications:
Feature
Description
Turnkey global distribution
Established globally distributed database system driven by the cloud-computing power of Azure. Data can be automatically replicated in Azure data centers around the world.
Multi-model and multi-API
Customers can use key-value, graph, and document data without worrying about schema or index management.
Data is automatically indexed, which allows customers to access their data with the API of their choice including SQL, JavaScript, Gremlin, MongoDB, and Azure Table Storage.
Limitless elastic scale around the globe
Customers can independently scale storage and throughput. Azure Cosmos DB allows you to independently and elastically scale storage and throughput at anytime, anywhere across the globe, making it a perfect ally for your server less applications.
Multiple, well-defined consistency choices
Azure Cosmos DB offers five well-defined consistency levels—strong, bounded staleness, consistent-prefix, session, and eventual—for an intuitive programming model with low latency and high availability for your planet-scale app.
Guaranteed low latency at 99th percentile
Serve read and write requests from the nearest region while simultaneously distributing data across the globe. With its latch-free and write optimized database engine, Azure Cosmos DB guarantees less than 10-ms latencies on reads and less than 15-ms latencies on (indexed) writes at the 99th percentile.
Industry-leading, enterprise-grade SLAs
Offered service level agreements (SLAs) include 99.99% high availability, with latency at the 99th percentile.
DocumentDB API supports cross-document transactions expressed as JavaScript stored procedures and triggers. Transactions are scoped to a single partition within each collection and executed with ACID semantics as all or nothing isolated from other concurrently executing code and user requests. If exceptions are thrown through the server-side execution of JavaScript application code, the entire transaction is rolled back. 
Conclusion
A database for extremely low latency and massively scalable applications anywhere in the world, with native support for NoSQL. With Microsoft Azure Cosmos DB enterprises of any size can establish a globally distributed database system that takes advantage of already existing data centers and infrastructures in a matter of minutes. Combine that with the administrative tools provided by Azure and Microsoft's intelligent cloud system, and enterprises can establish a powerful global presence quickly and easily without a major initial expenditure.
References - Microsoft

Monday, November 20, 2017

Python in SQL Server 2017

MS development team has been added Advanced Analytics Extension or Machine Learning Services in SQL Server 2017 by enabling SQL server to execute Python scripts within TSQL via ‘Machine Learning Services with Python’. In SQL Server 2017, it will allow us to process data in the database by using any Python function or package without needing to export the data from the database. We can use SQL Server itself as an operationalization platform for production applications using Python code.
The addition of Python builds on the foundation laid for R Services in SQL Server 2016 and extends that mechanism to include Python support for in-database analytics and machine learning. In this way, Microsoft development team renamed R Services to Machine Learning Services, where R and Python are two main options under this feature.
Now, Microsoft gives us a highly recommended option to use Python within the Machine Learning Services to showing that how a database can trigger an external process to perform an activity on the data which is provided as a parameter.
Python integration in SQL Server Advantages 
After integration of Python in SQL Server, we are getting the following advantages –
  • Enterprise-grade performance and scale: We can use SQL Server’s advanced capabilities like in-memory table and column store indexes with the high-performance scalable APIs in RevoScalePy package.
  • RevoScalePy is modeled after RevoScaleR package in SQL Server R Services. Using these with the latest innovations in the open source Python world allows us to bring unparalleled selection, performance, and scale to our SQL Python applications.
  • Rich extensibility: We can install and run any of the latest open source Python packages in SQL Server to build deep learning and AI applications on huge amounts of data in SQL Server. Installing a Python package in SQL Server is as simple as installing a Python package on our local machine.
  • Elimination of data movement: this is the biggest advantage of Python that we are no longer dependent to move data from the database to our Python application or model because we can build Python applications within the database.
  • This removes fences of security, compliance, governance, integrity, and a host of similar issues related to moving vast amounts of data around.
  • This new capability brings Python to the data and runs code inside secure SQL Server environment by using the proven extensibility mechanism built in SQL Server 2016.
  • Easy deployment: Now we have the Python model ready, deploying it in production is now as easy as implanting it in a T-SQL script and then any SQL client application can take advantage of Python-based models and intelligence by a simple stored procedure call.
  • Wide availability at no additional costs: Python integration is available in all editions of SQL Server 2017, including the Express edition.

R and Python already support loading data into data frame from SQL Server. This integration is about moving the R/Python compute to SQL Server machine to eliminate data movement across machines. If we move millions/billions of rows to the client for modeling or scoring then the network overhead will dominate end-to-end execution time.
Moreover the R/Python integration in SQL Server works with parallel query processing in SQL Server, security & resource governance.
The R / Python processes run outside of the SQL Server address space and share the machine resources because data security is the biggest distress to not allow running R / Python within the SQL Server process or memory space.
By default many of the data structures in R / Python are memory resident objects so the same limitations apply. However, Microsoft ships many algorithms as part of the R Server package (RevoScaleR or revoscalepy) that has a SQL Server data source object which can work with data that doesn’t fit in memory and supports parallel execution.

Conclusion
SQL Server 2017 takes in-database analytics to the next level with support for both Python and R; delivering unparalleled scalability and speed with new deep learning algorithms built in. Using SQL data source object, we can run a parallel query in SQL Server that sends data to many R / Python processes in parallel to compute say linmod/logit/tree model. This can also be used for scoring scenarios with streaming capability.
References- Microsoft

Monday, November 13, 2017

Azure SQL Database vs SQL Server on Azure VMs

Before differentiate between Azure SQL database and SQL Server on Azure VM, we have to understand that each option fits into the Microsoft data platform and get help matching the right option to our business requirements. Microsoft’s data platform leverages SQL Server technology and makes it available across physical on-premises machines, private cloud environments, third-party hosted private cloud environments, and public cloud. 
Azure SQL Database is the right solution for cloud-designed applications when developer productivity and fast time-to-market are critical whereas SQL Server running on Azure VMs is perfect if our existing or new applications require large databases, interrelated databases, or access to all features in SQL Server or Windows. 
What is Azure SQL Database?
SQL Azure Database is a cloud-based relational database service built on SQL Server technologies which is known as a platform as a service (PaaS) database or a database as a service (DBaaS) that is optimized for software-as-a-service (SaaS) app development.  It offers compatibility with most SQL Server features and provides a highly available, scalable, multi-tenant database service hosted by Microsoft in the cloud. SQL Azure Database helps to ease provisioning and deployment of multiple databases. When using SQL Database, we have to pay-as-we-go with options to scale up or out for greater power with no interruption.
What is SQL Server on Azure VMs?
On the other hand, SQL Server on Azure VMs falls into the industry category Infrastructure-as-a-Service (IaaS). It allows us to run SQL Server inside a virtual machine in the cloud. Just like to SQL Database, SQL Server on Azure VM is built on standardized hardware that is owned, hosted, and maintained by Microsoft where we can either pay-as we-go for a SQL Server license already included in a SQL Server image or easily use an existing license. We can also easily scale-up/down and pause/resume the VM as needed.
Main features of Azure SQL Database and SQL Server on Azure VMs
In most cases, the software development costs are lower with Azure SQL Database than with SQL Server on VM. However, these are not the only costs to factor in the total cost of developing and deploying your application. The purpose of the formulas referenced in this article is to list all the costs involved so that you can make a better decision. Ultimately, the choice of a solution over another is specific to your needs and wishes.
Best for:
SQL Server in an Azure Virtual Machine
Configure and manage
New cloud-designed applications that have time constraints in development and marketing.

Teams that need built-in high availability, disaster recovery, and upgrade for the database.

Teams that do not want to manage the underlying operating system and configuration settings.
Databases of up to 4 TB, or larger databases that can be horizontally or vertically partitioned using a scale-out pattern.
Existing applications that require fast migration to the cloud with minimal changes. Rapid development and test scenarios when we do not want to buy on-premises non-production SQL Server hardware.

Teams that can configure and manage high availability, disaster recovery, and patching for SQL Server. Some provided automated features dramatically simplify this. We need a customized environment with full administrative rights.

SQL Server instances with up to 64 TB of storage. The instance can support as many databases as needed.
Resources:
We do not want to employ IT resources for configuration and management of the underlying infrastructure, but want to focus on the application layer.
We have some IT resources for configuration and management. Some provided automated features dramatically simplify this.
Total cost of ownership:
Eliminates hardware costs and reduces administrative costs.
Eliminates hardware costs.
Business continuity:
In addition to built-in fault tolerance infrastructure capabilities, Azure SQL Database provides features, such as automated backups, Point-In-Time Restore, geo-restore, and active geo-replication to increase business continuity.
SQL Server on Azure VMs lets us set up a high availability and disaster recovery solution for our database’s specific needs. Therefore, we can have a system that is highly optimized for our application. We can test and run failovers by ourselves when needed.
Hybrid cloud:
Our on-premises application can access data in Azure SQL Database.
With SQL Server on Azure VMs, you can have applications that run partly in the cloud and partly on-premises. In addition, you can store on-premises data files in Azure Storage using SQL Server Data Files in Azure.
Supports
Supports SQL Server transactional replication as a subscriber to replicate data.
Fully supports SQL Server transactional replication, AlwaysOn Availability Groups, Integration Services, and Log Shipping to replicate data. Also, traditional SQL Server backups are fully supported
Conclusion
Azure SQL Database is the right solution for cloud-designed applications when developer productivity and fast time-to-market are critical. With programmatic DBA-like functionality, it is perfect for cloud architects and developers as it lowers the need for managing the underlying operating system and database where as SQL Server running on Azure VMs is perfect if our existing or new applications require large databases, interrelated databases, or access to all features in SQL Server or Windows.
References - https://docs.microsoft.com

Sunday, November 12, 2017

Azure SQL Database

Azure SQL Database is a fully-managed relational cloud database service using the Microsoft SQL Server Engine which supports structures such as relational data, JSON, spatial, and XML.
Azure SQL Database is a managed database service to take care of scalability, backup, and high availability of the database. It is differ from AWS RDS which is a container service.
We know that SQL Database is a high-performance, reliable, and secure database which can be used to build data-driven applications and websites in the programming language of our choice, without needing to manage infrastructure. 
SQL Database delivers predictable performance at multiple service levels which provides the following with near-zero administration -
1.       Dynamic scalability with no downtime
2.      Built-in intelligent optimization
3.      Global scalability and availability
4.      Advanced security options  
Azure SQL Databases is currently in 38 data centers around the world, with more data centers coming online regularly, which enable us to run our database in a data center near us.
Scalable performance and pools
With Azure SQL Database, each database is isolated from each other and portable, each with its own service tier with a guaranteed performance level. SQL Database provides different performance levels for different needs, and enables databases to be pooled to maximize the use of resources and save money. Basically, it offers four service tiers to support lightweight to heavyweight database workloads such as Basic, Standard, Premium, and Premium RS. In this way, we have an option to adjust performance without downtime to our application or to our customers. So, dynamic scalability enables our database to transparently respond to rapidly changing resource requirements and enables us to only pay for the resources that we need when you need them.
Built-in intelligence 
We have stated that Azure SQL database is built-in intelligence to protect and optimize our database to help us dramatically reduce the costs of running and managing databases and maximizes both performance and security of your application. Running n number of customer workloads around-the-clock, SQL Database collects and processes a massive amount of telemetry data, while also fully respecting customer privacy behind the scenes. Apart from this, various algorithms are continuously evaluating the telemetry data so that the service can learn and adapt with our application.


Availability capabilities
Azure's industry leading 99.99% availability service level agreement (SLA), powered by a global network of Microsoft-managed datacenters, helps keep our app running 24/7. In addition, SQL Database provides built-in business continuity and global scalability features, including Automatic backups, Point-in-time restores, Active geo-replication and Failover groups.
Advanced security and compliance
Azure SQL Database provides a range of built-in security and compliance features to help our application meet various security and compliance requirements such as –
1.  Data encryption at rest - all newly created Azure SQL databases are automatically protected with transparent data encryption which  is SQL’s proven encryption-at-rest technology that is required by many compliance standards to protect against theft of storage media
2. Auditing for compliance and security – with the help of Azure SQL Database Auditing tracks database events, writes them to an audit log in our Azure storage account. In this way, auditing can help us to maintain regulatory compliance, understand database activity, and gain insight into discrepancies and anomalies that could indicate business concerns or suspected security violations.
3. Dynamic data masking - it limits sensitive data exposure by masking it to non-privileged users and helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a policy-based security feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.
4.  Row-level security – It enables customers to control access to rows in a database table based on the characteristics of the user executing a query (such as by group membership or execution context). Row-level security (RLS) simplifies the design and coding of security in your application. 

Questions & Answers
How to secure data and prevent employees to share it with other companies ?
In general, the best way is to protect the data either through encryption (e.g., Always Encrypted) and limit access to the key; or use data masking/row-level security to limit the data employees can access in a db and to obfuscate those data values they can query and access.

How to properly restrict access to Azure SQL to specific Azure Web Site?

There are two scenarios here: 1. If you use Azure SQL Database Managed Instance, you can use VNETs/Subnets to restrict access to your SQL databases to only those Web apps that are in the same VNET or in a subnet in the same VNET. 2. If you use singleton/pools where we don't have VNET/Subnet support yet, you can use AAD with conditional access to limit the users/principles who can access databases (incl. the devices from where they connect).

Conclusion –
Azure SQL Database includes built-in intelligence that learns app patterns and adapts to maximize performance, reliability, and data protection. It enables us to centrally manage identities of database user and other Microsoft services with Azure Active Directory integration. Azure SQL Database participates in regular audits and has been certified against several compliance standards. It supports building applications with Python, Java, Node.js, PHP, Ruby, and .NET on the Mac OS, Linux, and Windows. 

Referenceshttps://docs.microsoft.com/en-in/azure/sql-database/sql-database-technical-overview

Monday, November 6, 2017

What is Engineering data

Engineering data is the foundation for all of the recent, current, and future data hypes: machine learning, deep learning, big data, data science, etc. The success and adoption of these hypes is predicated on data being structured properly and available. However, when customers (internal and external) are not clear on what their expectations are and/or the big picture of what they are trying to use the data for, data engineers are often blamed. 

Machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; These analytical models allow researchers, data scientists, engineers, and analysts to produce reliable, repeatable decisions and results.

Communication is key! Couple of points, "big data" has always been there, just ask Statisticians, as for the data science hype, now every one calls themselves data scientist, somehow knowing SQL makes a person data scientist now. Data Science graduates on a daily basis and many are lacking basic analytical skills and believe Data Science and Data Analysis are all about having basic end-user level knowledge of a new fancy software. It is a hype for a fact.
We can have the best people using the data for analytics or modeling, but if we don’t have people that know how to build the systems to make our data available in a consistent reliable manner then you will just be part of the hype. Having multiple data science teams leads to friction between the teams. While they, mainly concerned about product delivery, has an open door policy across the Enterprise, the other, more concerned about, well we don't know what, treats everyone else with a sense of superiority and thinks that their work is super secret somehow.

The reason for it's crashing is most likely because the software you are using is not a sever software/service, eg. if you use excel everything is processed in memory(like MS products is doing it all the time) if you had a db then it had been paging to disk when then the assign memory get full (short cut explanation).  

My understanding of Data scientist is they try to hard code/program different scenarios to get an answer.  If you know some about philosophy you cans tart there. You say what if condition 1 =1, then they add different condition with different variables.  It's not actually that complicated from my understanding. 

The data science hype now is focused on AI. After the dust settles they will realize that not all big data and data science is what's going on in Google or Facebook. Corporate needs another big data and data science. They need to make sense of their own data and solve their own problems using whatever techniques. I believe Optimization and statistical models are more important than AI for most of corp-orates. They can buy the hard stuff like speech and text analytics from Google or Amazon or anyone of this scale and capacity. Yet they have to retain a team to solve their specific problems using science.

That the industry does not want to reveal the pressing questions they want to get answered - obviously due to competition and conflict of interest. There are some good concepts behind data science and modeling (both mathematical and statistical) lies at the heart of obtaining directed insights. But yes, academia-industry partnership is very much lacking, which has the potential to make data science a hope or a hype.