Monday, May 18, 2020

Technological Benefits of Data Lakes






Data is the business asset for every organisation which is audited and protected. Data can be any form such as structured, semi-structured and unstructured. To handle any kind of the data, Data Lake comes in the picture as a centralized repository to store the data as-is (relational data from line of business applications, and non-relational data from mobile apps, IoT devices, and social media). 

The types of raw data that are stored in a data lake can include:

  • Audio, images and video
  • Communications (blogs, emails, social media, click-streams)
  • Operational data (inventory, sales, tickets, tourism)
  • Machine-generated data (log files, IoT sensor readings)
The most importantly, data lakes are specifically designed to run large scale analytics workloads in a cost-effective way. Within Data Lake, the necessary data is made available to all levels of employees, irrespective of their level or the designation.

All-around Availability of Data — This is the biggest advantage of the Data Lake implementation for any organisation because it gives a surety that all the employees, irrespective of their designation and roles, can have access to data and this term is known as data democratization.
Fetches Quality Data — Data lakes implementation supports many tools and technologies which gives a tremendous data processing power for fetching quality data such as —

Real-time decision analysis — Data lakes take advantage of large quantities of consistent data and deep learning algorithms to arrive at real-time decision analytics by the help of many supportive languages.




Supports SQL and other languages — Conventional data-warehouse technologies support SQL which is good enough for simple analytics. For advanced analytics, other languages are PIG, Hive, Tachyon, Impala and for machine learning, Spark MLlib is over there also.




Operational Analytics Monitoring— Data lakes have all kinds of great benefits for companies, data managers, and data processors. However, with a Data Lake, the necessary data is made available to all levels of employees, irrespective of their level or the designation. Search, explore, filter, aggregate, and visualize business data in near real-time for application monitoring, log analytics, and click stream analytics are easy tasks in Data lake. Just as in the case of Twitter, business user decides whom he wants to connect with or not to connect with, likewise in the case of Data Lakes, a user could choose the required data to meet different business objectives.


Scalable, Versatile and Schema Flexibility- This is the another biggest advantages of Data Lake that data volumes are growing exponentially day by day and unlike traditional data warehouse, Data Leaks offers scalability and is inexpensive as well. There are many technologies (AWS, Azure, Google Cloud etc.) now a days to help you to reduce the cost of your compute usage, like auto-scaling and integration. A data lake can store your versatile data such as XML, logs, multimedia, sensor data, chat, social data, binary, and people data from diverse sources. Hadoop Data Lake enables us to be schema free, or we could come up with multiple schemas for the same data. Meanwhile we can easily separate schema from data, which is good for analytics.

Essential elements of a Data Lake and Analytics solution

Data is the business asset for every organisation which is audited and protected. Data can be any form such as structured, semi-structured and unstructured. To handle any kind of the data, Data Lake comes in the picture as a centralized repository to store the data as-is (relational data from line of business applications, and non-relational data from mobile apps, IoT devices, and social media). The types of raw data that are stored in a data lake can include:

  • Audio, images and video
  • Communications (blogs, emails, social media, click-streams)
  • Operational data (inventory, sales, tickets, tourism)
  • Machine-generated data (log files, IoT sensor readings)
The most importantly, data lakes are specifically designed to run large scale analytics workloads in a cost-effective way. Within Data Lake, the necessary data is made available to all levels of employees, irrespective of their level or the designation.




Essential Elements of a Data Lake are:
Data Lake Analytics allow various roles in your organization like data scientists, data developers, and business analysts to access data with their choice of analytic tools and frameworks. 
Data movement & Governance such as moving data analytics to the source , the data lake and the edge. An interesting development in this sense is that you see the applications (or big data analytics) moving to the edge rather than to a storage repository to move even faster and take away the burden from networks, among others.
Security, Data Quality and Storage in a data lake allows to store relational data like operational databases and data from line of business applications, and non-relational data like mobile apps, IoT devices, and social media. Stored data doesn’t need to be moved or transformed before you perform data analysis, and the total cost of ownership is further lowered because of the hierarchical namespace of stored data. 
Data lakes are highly scalable and flexible. That doesn’t need too much elaboration. The system and processes can easily be scaled to deal with ever more data. Data quality is a necessary condition for consumers to get business value out of the lake.
Machine Learning run real-time analytics and machine learning to your data to produce better, actionable insights including reporting on historical data, and doing machine learning where models are built to forecast likely outcomes, and suggest a range of prescribed actions to achieve the optimal result.

Sunday, May 10, 2020

Primary key Vs Foreign Key in SQL

As we know that relationship between two or more tables in SQL Server is the basic concept of any relational database. For an example. a family always starts with Parent and Children relationship, same as is a database always starts with product-item or customers-regions relationship. 
So, we will try to understand that how can we differentiate a relationship between two or more data tables in a database. You can suppose that Parent equals to Primary key and Children equal to Foreign key in a database.


In the above diagram, you can see the Primary Key and Foreign Key relationship between Students, Enrollments and Classes data tables. 
In Students data table, Student ID is the primary key and it is establishing a relationship with Enrollments tables and acts as a Foreign key. 

In Classes data table, Class ID is the primary key and it is establishing a relationship with Enrollments tables and acts as a Foreign key.

Primary Key - In a database, a table can have only one primary key which cannot have a NULL value. It always represents a clustered index in a database table and helps to organize the sequence of clustered index. 
Primary key can be related to another tables as a Foreign Key and you can apply Auto Increment value for a Primary key but auto increment is not mandatory.  
We can define Primary key constraint on temporary table and table variable.
Note: you can't delete primary key value from the parent table which is used as a foreign key in child table. To delete the primary key in the main table, you have to delete that primary key value from the child tables.

Foreign Key - A foreign key is just a referential constraint between two or more tables. If a Primary key is used in the another table then it would be know in the another table as Foreign key which can accept multiple null value. Foreign key always generates after Primary key. Thus they do not automatically increments. It supports  clustered or non-clustered indexes and you can have more than one foreign key in a table.
If you want to create some indexes on Foreign key then you must manually create an index on foreign keys. 
Please keep in your mind, you can’t create foreign key constraint on temporary table or table variable as compared to Primary key constraint.
You can delete the foreign key value from the child table even though that refers to the primary key of the parent table.

Foreign keys are almost always "Allow Duplicates," which would make them unsuitable as Primary Keys.
It is perfectly fine to use a foreign key as the primary key if the table is connected by a one-to-one relationship, not a one-to-many relationship.