Monday, July 11, 2016

DW - Microsoft Modern Data Warehouse in SQL Server 2016

In current era of cloud and data virtualisation, data is coming from every directions which has been put tremendous pressure on the traditional data warehouse. So, we must have to understand the data warehouse to handle the exponentially growing volume of data, the variety of semi-structured and unstructured data types, and the velocity of real-time data processing.  The Microsoft modern data warehouse solution can easily integrate traditional data warehouse with unstructured big data and capable to handle data of all sizes and types, with real-time performance.

Traditional Data Warehouse
The traditional data warehouse acts as a central repository to store all data from transactional systems, ERP, CRM, and LOB applications could be cleansed by ETL process. Now a days, it is under pressure from the growing weight of explosive volumes of data, the expansive variety of data types, and the real-time processing velocity of how data is being used to grow and operate the business.
The traditional data warehouses are based on Symmetric Multi-Processing (SMP) technology which allows adding more capacity involved procuring larger, more powerful hardware and then forklifting the prior data warehouse into it. Whenever theses warehouses approached capacity, its architecture experienced performance issues at a scale where no room to add incremental processor power or enable synchronisation of the cache between processors.

Key trends breaking the traditional data warehouse
There are mainly four key trends in the business environment which are responsible to put tremendous pressure on the traditional data warehouse and these trends are listed below:
  1. Increasing Data Volumes
  2. Real-Time Data
  3. New Data Sources, Data Types
  4. Cloud-Born Data
Increasing Data Volume
The data volumes are exploding, more data has been created in the past five years than in the entire previous history of the human race which is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.By 2020, at least a third of all data will pass through the cloud (a network of servers connected over the Internet). As sensors become connected to the internet, the data they generate becomes increasingly important to every aspect of business, transforming old industries into new, relevant entities.
In support of the modern business, the prospect of bigger, more powerful hardware and ever-larger forklift migrations is not a viable return-on-investment scenario. Enterprises are looking for an alternative to volume growth that does not break the budget.

Real-Time Data (RTD)
The traditional data warehouse was designed to store and analyze historical information on the assumption that data would be captured now and analysed later.
Real-time data (RTD) is information that is delivered immediately after collection. Real-time data is often used for navigation or tracking. The idea of real-time data handling is now popular in new technologies such as those that deliver up-to-the-minute information in convenience apps to mobile devices such as phones, laptops and tablets. Real-time data is enormously valuable in things like traffic GPS systems that show drivers what is going on around them. It is helpful for all sorts of analytics projects and for keeping people informed about their natural environment through the power of instant data delivery.
It is important to note that real-time data does not mean that the data gets to the end user instantly. There may be any number of bottlenecks related to the data collection infrastructure, the bandwidth between various parties, or even just the slowness of the end user's computer.
Companies are using real-time data to change, build, or optimise their businesses as well as to sell, transact, and engage in dynamic, event-driven processes like market trading.

New Data Sources and Data Types
Databases are the most traditional kind of data source in BI and traditional data warehouse was based on a strategy of well-structured, sanitised and trusted repository. Now a days, more than 80% of data volume comes from a variety of new data types proliferating from mobile and social channels, scanners, sensors, devices, feeds, and other sources outside the business but these data types do not easily fit the business schema model and may not be cost effective to ETL into the relational data warehouse.

Cloud-Born Data
While cloud used to be an overused marketing term, the real value of leveraging commoditized pooled infrastructure to deliver compute and storage at incredible scale to businesses has surfaced, bringing cloud computing into the mainstream. The proliferation of cloud-based applications continues across all industries, and for all kinds of reasons: time, money, ease-of-deployment, functionality, and maintenance. An increasing share of the new data is “cloud-born,” such as click-streams; videos, social feeds, GPS, and market, weather, and traffic information. In addition, the prominent trend of moving core business applications like messaging, CRM, and ERP to cloud-based platforms is also growing the amount of cloud-born relational business data. Simply stated, cloud-born data is changing business and IT strategies about where data should be accessed, analysed, used, and stored.

Evolve to a Modern Data Warehouse
The modern data warehouse lives up to the promise of business intelligence from all data for business that is growing explosively, changing data types and sources and processing in real-time, with a more robust ability to deliver the right data at the right time. 



Modern Data Warehouse is the combination of the following layers- 
  1. Infrastructure
  2. Data Management & Processing
  3. Data Enrichment & Federated Query
  4. BI & Analytics
Benefits of Modern Data Warehouse: There are following benefits of Modern Data Warehouse-
  1. It provides a trusted infrastructure that gives users confidence in the credibility and consistency of the data
  2. It incorporates a wider variety of data sources and data types which include mobile, social, scanners, photos, videos, sensors, devices, RFID, web logs, advanced analytics, click streams, machine learning, and third-party data sources
  3. Ploybase technology helps to Query both traditional relational data and these new data types with common T-SQL commands and queries that took hours can be reduced to minutes or seconds through in-memory
  4. It scale from tens of terabytes up to multiple petabytes by incrementally adding nodes to our existing infrastructure
  5. It enables users to get results from their queries in near real-time with streaming technologies. 
References: Microsoft

No comments:

Post a Comment

Popular Posts