Wednesday, November 16, 2016

Azure Data Lake Store

The data lake is essential for any organization who wants to take full advantage of its data. The data lake arose because new types of data needed to be captured and exploited by the enterprise.


Data Lake is a storage repository for a vast amount of raw data in its native/natural/in-built format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed because data is stored in as-is. We can say that Data Lake is a more organic store of data without regard for the perceived value or structure of the data.
Azure Data Lake is the technology for hyper scale data repository for any data for big data analytics workloads. This technology is based on Bottoms-Up approach for any data. Any data means the underlying storage system is not imposing the limitation and we can store un-structured data, semi structure data and fully structured data in Azure Data lake store. It also enables us to capture data of any size, type and ingestion speed in one single place for operational and exploratory analytics. 
Azure Data Lake comprises three cloud-based services such as HDInsight, Data Lake Analytics, and Data Lake Store that make it easy to handle store an analyze any kind of data in Azure.


Azure Data Lake Store is an Apache Hadoop Distributed File System for the cloud which is compatible with Hadoop Distributed File System (HDFS) and works with the Apache Hadoop ecosystem. The biggest advantage of Azure Data Lake is high durability, availability and reliability and there are no fixed limits on file size as well as any fixed limits on account size. It is fully capable for unstructured and structured data in their native format and massive throughput to increase analytic performance.


The data lake serves as an alternative to multiple information silos typical of enterprise environments and does not care where the data came from or how it was used. It is indifferent to data quality or integrity. It is concerned only with providing a common repository from which to perform in-depth analytics. Only then is any sort of structure imposed upon the data.

Azure Data Lake Store is secured, massively scalable, and built to the open HDFS standard, allowing us to run massively-parallel analytics.

Petabyte size files and Trillions of objects

With the help of Azure Data Lake Store, we are able to analyze all kind of the data (unstructured, semi-structured, and structured data) in a single place where no need of artificial constraints. Interesting and amazing thing is that Data Lake Store supports to store trillions of files where any single file can be greater than a petabyte in size which is 200 times larger than other cloud stores. This specification makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.



Performance-tuned for big data analytics

Another big advantage of Azure Data Lake Store is that it is built for running large scale analytic systems that require massive throughput to query and analyze large amounts of data. The data lake spreads parts of a file over a number of individual storage servers. This improves the read throughput when reading the file in parallel for performing data analytics. Automatically optimise for any throughput and parallel computation over PBs of data.


Always encrypted, Role-based security & Auditing
In term of security, Data Lake Store protects our data assets and extends our on-premises security and governance controls to the cloud easily. Azure Data Lake Store containers for data are essentially folders and files. Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. We can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, we can meet security and regulatory compliance needs by auditing every access or configuration change to the system.


Please visit to learn more on -
  1. Collaboration of OLTP and OLAP systems
  2. Major differences between OLTP and OLAP
  3. Data Warehouse
  4. Data Warehouse - Multidimensional Cube
  5. Data Warehouse - Multidimensional Cube Types
  6. Data Warehouse - Architecture and Multidimensional Model
  7. Data Warehouse - Dimension tables.
  8. Data Warehouse - Fact tables.
  9. Data Warehouse - Conceptual Modeling.
  10. Data Warehouse - Star schema.
  11. Data Warehouse - Snowflake schema.
  12. Data Warehouse - Fact constellations
  13. Data Warehouse - OLAP Servers.
Conclusion
Data Lake Store is a hyper-scale repository for big data analytics workloads. It supports unstructured, semi-structured, and structured data with the ability to run massively parallel analytics. It is secure, massively scalable and built to the open HDFS standard. Data Lake Store does not require a schema to be defined before the data is loaded, leaving it up to the individual analytic framework to interpret the data and define a schema at the time of the analysis. Data Lake Store does not perform any special handling of data based on the type of data it stores.

Reference: https://azure.microsoft.com/en-in/services/data-lake-store/

No comments:

Post a Comment