Friday, November 16, 2018

SSIS - Conditional Split Transformation

In this tutorial, we are going to learn how does Conditional Split transformation work in SSIS package. The Conditional Split transformation can route data rows to different outputs depending on the content of the data. The implementation of the Conditional Split transformation is similar to a CASE decision structure in a programming language. 
The transformation evaluates expressions, and based on the results, directs the data row to the specified output. This transformation also provides a default output, so that if a row matches no expression it is directed to the default output.

We can configure the Conditional Split transformation in the following ways:
* Provide an expression that evaluates to a Boolean for each condition you want the transformation to test.
* Specify the order in which the conditions are evaluated. Order is significant, because a row is sent to the output corresponding to the first condition that evaluates to true.
* Specify the default output for the transformation. The transformation requires that a default output be specified.

To live demo, please visit my YouTube Channel at -


Each input row can be sent to only one output, that being the output for the first condition that evaluates to true. 
The Conditional Split transformation includes the FriendlyExpression custom property. This property can be updated by a property expression when the package is loaded. 

Note - The Conditional Split transformation directs each input row to one output only. If you enter multiple conditions, the transformation sends each row to the first output for which the condition is true and disregards subsequent conditions for that row. If you need to evaluate several conditions successively, you may need to concatenate multiple Conditional Split transformations in the data flow

Wednesday, November 14, 2018

SSIS - Expression Task

In this tutorial, we are going to learn how does Expression Task work in SSIS package. The SSIS Expression task will allow you to set variable values at runtime without the need for a script task. This was a new feature when SSIS 2012 rolled out a few years ago. It’s a pretty easy task to use and also helps you avoid having to write any .net code as well. The expression language includes functions and operators.

Things to be Noted: 

  • You can only set 1 variable at a time with this task. If you need to do more you can add additional expression tasks or use one script task to set all of them at once.
  • You must have SSIS 2012 or higher. If you have SSIS 2008R2 and below you can still use this functionality but it’s slightly more complicated because you will have to create a script task to do this for you.
One common usage for the expression task in SSIS is to implement conditional flow of the constraints. In the below screenshot you can see that we implement an expression tasks which follows an error checking activity and evaluates if there were any errors , if yes the execution move to the right else to the left.

To watch the live demo, please visit my YouTube Channel -

Tuesday, November 13, 2018

SSIS - Foreach Loop Container

In this tutorial, we are going to learn how does For Loop Container work in SSIS package. The For Loop container defines a repeating control flow in a package. The loop implementation is similar to the For looping structure in programming languages. For Loop container groups set of tasks together and allows you to looping through the tasks in your package which is similar like for loop in any programming language. In case until a fixed count than you will use the “For Loop container”

Configuration editor has only two options For Loop and Expressions. 
  • InitExpression: This will initialize the loop with specified values.
  • EvalExpression: specifies an expression to stop the loop when the expression evaluates to false. This expression must evaluate to a Boolean. 
  • AssignExpression: specifies an expression that changes a condition in the same way each time the loop iterates.Which is optional one.
Drag and drop “For loop” container and configure as shown below



Difference between For Loop and Foreach Loop Container?
A For Loop container loops till a constant value. So for example if you want to loop until a value is 12 or less than 12 so for up-to fixed count you have to use For Loop container.
Whereas Foreach Loop container loops through collection so this collection can files in a folder or collection can be in record set.
Foreach loop - Foreach loop container actually loop through a collection. Now this collection row in a record set files in a folder.
Foreach Loop Container Task: When there is dependency of number of time task should execute on data such as files, variables we will use this.The Foreach Loop container defines a repeating control flow in a package. The loop implementation is similar to Foreach looping structure in programming languages. In a package, looping is enabled by using a Foreach enumerator. The Foreach Loop container repeats the control flow for each member of a specified enumerator.

In short, a For Loop Container uses a loop condition that is defined by expressions whereas Foreach Loop Container iterates through a collection of objects.

With a Foreach Loop Container, we can perform variable mapping as well, which would be useful for tasks within the loop.

Monday, November 12, 2018

SSIS - How does Aggregate Transformation work

In this tutorial, we are going to learn how does Aggregate Transformation work in SSIS package. The Aggregate transformation is used to perform aggregate operations/functions on groups in a dataset. The aggregate functions available are- Count, Count Distinct, Sum, Average, Minimum and Maximum. The Aggregate transformation has one input and one or more outputs.

The Aggregate transformation handles null values in the same way as the SQL Server relational database engine. The behaviour is defined in the SQL-92 standard. The following rules apply:
  • In a GROUP BY clause, nulls are treated like other column values. If the grouping column contains more than one null value, the null values are put into a single group.
  • In the COUNT (column name) and COUNT (DISTINCT column name) functions, nulls are ignored and the result excludes rows that contain null values in the named column.
  • In the COUNT (*) function, all rows are counted, including rows with null values.
Aggregate Transformation Configuration - You configure the Aggregate transformation at the transformation, output, and column levels. At the transformation level, you configure the Aggregate transformation for performance by specifying the following values:
  1. The number of groups that are expected to result from a Group by operation. They are mostly are string values.
  2. The number of distinct values that are expected to result from a Count distinct operation. These values could be string or numeric.
  3. The percentage by which memory can be extended during the aggregation.
  4. The Aggregate transformation can also be configured to generate a warning instead of failing when the value of a divisor is zero.

For more on Aggregate Transformation - Watch a Live demo here


ConclusionThe Aggregate transformation is asynchronous, which means that it does not consume and publish data row by row. Instead it consumes the whole rowset, performs its groupings and aggregations, and then publishes the results.

Sunday, September 30, 2018

Build and rebuild clustered columnstore indexes online in SQL Server 2019


Microsoft development team works around the SQL Server to make it as a central part of the Microsoft data platform and we cannot deny the truth about SQL Server that is an industry leader in operational database management systems (ODBMS). Microsoft has been introduced SQL Server 2019 preview which builds on previous releases to grow SQL Server as a platform that gives us choices of development languages, data types, on-premises or cloud, and operating systems. 
If we are talking for optimizing database performance then the indexes are one of the most powerful tools for it. This is the biggest fact for the indexes that they can become more fragmented and less effective in case we did any insert, update, and delete operations on the data table. This is a pain for DBAs to take care of indexes and on the regularly basis, index rebuild operations is required.
With the growing sizes of databases, index rebuilds can take a very long time. Build and rebuild clustered columnstore indexes online is the feature of SQL Server 2019 preview where creating clustered columnstore indexes (CCI) was an offline process in the previous versions of SQL Server - requiring all changes stop while the CCI is created. 
With SQL Server 2019 preview and Azure SQL Database you can create or re-create CCI online. Workload will not be blocked and all changes made on the underlying data are transparently added into the target columnstore table.
Examples of new Transact-SQL statements that can be used are:
---- Create Clustered Columnstore Index

CREATE CLUSTERED COLUMNSTORE INDEX cci
  ON <tableName>
 WITH (ONLINE = ON);

---- Altered Clustered Columnstore Index
ALTER INDEX cci
  ON <tableName>
  REBUILD WITH (ONLINE = ON);
Combine that with the business needs for our applications to be always available and performant and this can be an issue. Big OLTP environments with busy workloads often have very short maintenance windows with some too short to execute large index rebuild operations. 

Wednesday, September 26, 2018

Azure Data Studio

Microsoft Development team is going to be introduced Azure Data Studio in SQL Server 2019 which was released under the preview name SQL Operations Studio in previous version. Azure Data Studio is a lightweight, modern, open source, cross-platform database tool for data professionals using the Microsoft family of on-premises and cloud data platforms on Windows, MacOS, and Linux.
Azure Data Studio offers a modern editor experience with Intellisense, code snippets, source control integration, and an integrated terminal. It is engineered with the data platform user in mind, with built in charting of query result sets and customizable dashboards.
With the help of this tool, now that SQL Server can run on all the major platforms, the Mac and Linux users were having to use third party tools to manage SQL server. Currently you can edit whole table, but not limited data that you need to run query, and edit only the results of that query. 
  1. Edit and run queries in a modern development environment with lightning fast Intellisense, code snippets, and source control integration.
  2. Quickly visualize data with built-in charting of your result sets.
  3. Create custom dashboards for your servers and databases using customizable widgets.
  4. Easily manage your broader environment with the built-in terminal.
  5. Analyze data in an integrated notebook experience built on Jupyter.
  6. Enhance your experience with custom theming and extensions.
  7. And explore your Azure resources with a built-in subscription and resource browser.
  8. Supports scenarios using SQL Server Big Data Cluster.
Those of us who don't want to run Microsoft's OS in a VM are really appreciative of the efforts they're making with things like this and VS code in making them cross-platform. Keep in mind, this is in preview, and provides an open source extension platform, so if you want that functionality, you can likely build it if it ends up lacking that in the final release.
Reference:https://docs.microsoft.com

Monday, September 3, 2018

SSRS - Filling blank cells in matrix

In Reporting Services, tables, matrices, and lists are data regions that display paginated report data in cells that are organized into rows and columns. The cells typically contain text data such as text, dates, and numbers but they can also contain gauges, charts, or report items such as images. Collectively, tables, matrices, and lists are frequently referred to as tablix data regions.

Sometimes, when we have created a matrix/tablix/list with data in it, but there are quite a few blank cells where there is no data. In this case, we don't want to show the blank values on the report and we have to replace the blanks with zeroes.
There are some conditional expressions which can be helpful to full-fill this requirements-
  1. Just right click on the text cells and go to the expression
  2. In expression window, check if the cell contains Nothing then set 0 value else fill the actual value as given below- 

    =IIF(Fields!MyColumn.Value IS NOTHING , 0 , Fields!MyColumn.Value)
    OR
    =IIf(IsNothing(Fields!MyColumn.Value),0,Fields!MyColumn.Value)

    If the field is empty or blank then it puts 0, otherwise it uses the actual data from the dataset.
Please watch a live demo at our YouTube Channel-


Friday, August 31, 2018

SSRS - Export SSRS report into multiple excel sheets name

SSRS is the most amazing data visualisation tool which provides many amazing features. One of the best feature of this tools is to export data into multiple sheets in excel and provides Name the worksheets according to group value. We can assign dynamically sheet name / tab name. When SSRS report is exported into excel format then the report data will be divided into multiple sheets and each sheet must have Tablix Header.


To create multi tabbed report, we have to follow the following steps -
  1. Add Row Group to the report and choose Parent Group
  2. In Group Properties window, choose Page Break and checked Between each instance of a group
  3. In Row Group section, click on Group details and go to property window
  4. In property window, open expression window for PageName and assign the group name from the report dataset
  5. To fix tablix header in every new sheet/tab, Click on Advance mode form Row and Column Group and set true property of KeepTogether and RepeatOnNewPage
  6. Preview of the report ---- Done
For actual implementation, please watch the below video -




Tuesday, August 28, 2018

SQL Server - Import raw data file into SQL Server database


In SQL Server Management Studio, SQL Server Import and Export Wizard option is helpful to into your raw data into SQL Server database. Import Flat File provides us to create simple packages that import and export data between many popular data formats including databases, spreadsheets, and text files. The import and export wizard also create the destination database and the tables into which the data is inserted.

Import Text/CSV file into SQL Server follow below mention steps:
  1. In SQL Server Management Studio, right click on the database and choose Import Flat File from tasks options
  2. Specify Data Source as Flat File Source and Text/CSV file Location
  3. Specify Export location Details
  4. Mapping Source and Destination tables (Leave this option as it is)
  5. Summary page to display our current configuration
  6. RUN THE PACKAGE
To watch a demo here-

SSRS - How to creating Dynamic Tabs in SSRS Report

In this video of SSRS Tutorial, we will learn that how to create a SSRS Report with dynamic Tabs and pull the data according to selected tab.

As you know that there is no straight forward way to include tabs in SSRS report builder. We have to achieve this by using different tricks. Here is what we will learn while creating this report by using some normal steps and some logical tricks.

Few steps to follow- 
  1. Create Empty report in SSRS in SSRS Project
  2. Add Report Header so can display the Report Title on each of the Page
  3. Add dataset for dynamic tabs
  4. Add Matrix and assign dataset as dataset name property
  5. Drop tabs value from dataset into matrix column header
  6. Add parameter for dynamic tab to filter the data 
  7. Set colour formatting for tabs based on the condition as per in the below video 
  8. Add action to matrix column and provide the parameter from dataset column
  9. Hide the report parameter
  10. Add another dataset to pull the data from the database and filter the data by passing the value from the parameter
  11. Add table to generate the tablix report and set the dataset name
  12. Drop columns from the report dataset
  13. Do needful formatting to the report
  14. Preview the report







Thursday, July 19, 2018

Preparation for a successful Data Lake in the cloud


A data lake is conceptual data architecture which is not based on any specific technology. So, the technical implementation can vary technology to technology, which means different types of storage can be utilized, which translates into varying features.
The pillars of a data lake also include scalable and durable storage of data, mechanisms to collect and organise that data, and tools to process and analyze the data and share the findings.
If we are talking about architectural point of views for a well-developed cloud-based data lake then it must be capable to serve many corporate audiences, including IT applications, infrastructure, and operations teams, data scientists and even line of business groups.
If you are planning to develop a successful data lake then you should have to consider the cloud service providers which allow organizations to avoid the cost and hassle of managing an on-premises data center by moving storage, compute, and networking to hosted solutions. Cloud services also offer many other advantages such as ease of provisioning, elasticity, scalability, and reduced administration. 
Apart from this, we should have to consider the following things:
Type of storage: Data Lake storage does matter for any organisation because it is directly link with cost and efforts.
The most common data lake implementations utilize:
HDFS (Hadoop Distributed File System)
Proprietary distributed file systems with HDFS compatibility (ex: Azure Data Lake Store)
Object storage (ex: Azure Blob Storage or Amazon S3)
The following options for a data lake are less commonly used due to greatly reduced flexibility:
Relational databases (ex: SQL Server, Azure SQL Database, Azure SQL Data Warehouse)
NoSQL databases (ex: Azure Cosmos DB)

Security capabilities- We have already stated that Data Lake is not based any specific technology. So, implementation of securities, privacy, and governance must be differed to technology to technology.
For example, service such as Azure Data Lake Store implements hierarchical security based on access control lists, whereas Azure Blob Storage implements key-based security. These capabilities are continually evolving in the cloud, so be sure to verify on a frequent basis.
Other hand, AWS has a number of ready-to-roll services here, including AWS Identity and Access Management (IAM) for roles, AWS Key Management Service (KMS) to create and control the encryption keys used to encrypt our data.
Data management services – Data is the most important component for any organisation which is used in different platforms. The data lake analogy is conceived to help bring a common and visual understanding to the benefits of distributed computing systems able to handle multiple types of data, in their native formats, with a high degree of flexibility and scalability.

With the right data captured from a variety of sources, we should be capable to expose that information to data professionals and business decision makers without an oppressive amount of red tape, or bureaucracy from IT.
For example, AWS is introducing AWS Glue as an ETL engine to easily understand data sources, prepare the data, and load it reliably to data stores. Azure Data Lake (ADL) integrations, developers who are required to manage information in those services can use Data Lake Explorer within ADL Tools for Visual Studio Code to get a better and quicker grasp of their cloud-based big data environments.
Data Efficiency and Business Execution - One of the most powerful features of cloud-based deployments is elasticity, which refers to scaling resources up or down depending on demand. Data lakes should be made easily accessible to a wide range of users, and their efforts in implementing and supporting core applications, for any line of business or function and business users are able to utilize this internal data efficiency to help perform core activities more effectively
Disaster recovery - The most critical data from a disaster recovery standpoint is our raw data. The ability to recover our data after a damaging weather event, system error, or human error is crucial.
Azure Data Lake Store provides locally-redundant storage (LRS). Hence, the data in our Azure Data Lake Store account is resilient to transient hardware failures within a region through automated replicas. This ensures durability and high availability, meeting the Azure Data Lake Store SLA.
AWS offers all the tools and capabilities we need to transfer data into the cloud and build comprehensive backup & restore solutions that are compatible with your IT environment. 
Please visit us to learn more on -
  1. Collaboration of OLTP and OLAP systems
  2. Major differences between OLTP and OLAP
  3. Data Warehouse - Introduction
  4. Data Warehouse - Multidimensional Cube
  5. Data Warehouse - Multidimensional Cube Types
  6. Data Warehouse - Architecture and Multidimensional Model
  7. Data Warehouse - Dimension tables.
  8. Data Warehouse - Fact tables.
  9. Data Warehouse - Conceptual Modeling.
  10. Data Warehouse - Star schema.
  11. Data Warehouse - Snowflake schema.
  12. Data Warehouse - Fact constellations
  13. Data Warehouse - OLAP Servers.
  14. Preparation for a successful Data Lake in the cloud
  15. Why does cloud make Data Lakes Better?

Wednesday, July 18, 2018

Why does cloud make Data Lakes better?

A data lake is conceptual data architecture which is not based on any specific technology. So, the technical implementation can vary technology to technology, which means different types of storage can be utilized, which translates into varying features.
The main focus of a data lake is that it is not going to replace a company’s existing investment in its data warehouse/data marts. In fact, they complement each other very nicely. With a modern data architecture, organizations can continue to leverage their existing investments, begin collecting data they have been ignoring or discarding, and ultimately enable analysts to obtain insights faster. Employing cloud technologies translates costs to a subscription-based model which requires much less up-front investment for both cost and effort.

The most of the organizations are enthusiastically considering cloud for functions like Hadoop, Spark, data bases, data warehouses, and analytics applications. This makes sense to build their data lake in the cloud for a number of reasons such as infinite resources for scale-out performance, and a wide selection of configurations for memory, processors, and storage. Some of the key benefits include:
  1. Pervasive security - A cloud service provider incorporates all the aggregated knowledge and best practices of thousands of organizations, learning from each customer’s requirements.
  2. Performance and scalability - Cloud providers offer practically infinite resources for scale-out performance, and a wide selection of configurations for memory, processors, and storage.
  3. Reliability and availability - Cloud providers have developed many layers of redundancy throughout the entire technology stack, and perfected processes to avoid any interruption of service, even spanning geographic zones.
  4. Economics - Cloud providers enjoy massive economies of scale, and can offer resources and management of the same data for far less than most businesses could do on their own.
  5. Integration - Cloud providers have worked hard to offer and link together a wide range of services around analytics and applications, and made these often “one-click” compatible.
  6. Agility - Cloud users are unhampered by the burdens of procurement and management of resources that face a typical enterprise, and can adapt quickly to changing demands and enhancements. 
Advantages of a Cloud Data Lake – it is already proved that a data lake is a powerful architectural approach to finding insights from untapped data, which brings new agility to the business. The ability to harness more data from more sources in less time will directly lead to a smarter organization making better business decisions, faster. The newfound capabilities to collect, store, process, analyze, and visualize high volumes of a wide variety of data, drive value in many ways. Some of the advantages of cloud data lake is given below –
  • Better security and availability than you could guarantee on-premises
  • Faster time to value for new projects
  • Data sources and applications already cloud-based
  • Faster time to deploy for new projects
  • More frequent feature/functionality updates
  • More elasticity (i.e., scaling up and down resources)
  • Geographic coverage
  • Avoid systems integration effort and risk of building infrastructure/platform
  • Pay-as-you-go (i.e., OpEx vs. CapEx)
A basic premise of the data lake is adaptability to a wide range of analytics and analytics-oriented applications and users, and clearly AWS has an enormous range of services to match any. Many engines are available for many specific analytics and data platform functions. And all the additional enterprise needs are covered with services like security, access control, and compliance frameworks and utilities.
Please visit us to learn more on -
  1. Collaboration of OLTP and OLAP systems
  2. Major differences between OLTP and OLAP
  3. Data Warehouse - Introduction
  4. Data Warehouse - Multidimensional Cube
  5. Data Warehouse - Multidimensional Cube Types
  6. Data Warehouse - Architecture and Multidimensional Model
  7. Data Warehouse - Dimension tables.
  8. Data Warehouse - Fact tables.
  9. Data Warehouse - Conceptual Modeling.
  10. Data Warehouse - Star schema.
  11. Data Warehouse - Snowflake schema.
  12. Data Warehouse - Fact constellations
  13. Data Warehouse - OLAP Servers.
  14. Preparation for a successful Data Lake in the cloud
  15. Why does cloud make Data Lakes Better?

Tuesday, July 17, 2018

Data Lake Vs Data Warehouse


We know that data is the business asset for any organisation which always keeps secure and accessible to business users whenever it required. 
In current era, two techniques are very popular to store the data for the business insights. Hence, we are going to differentiate them based on some technical terms.

One is Data Warehouse which is highly structured store of the data that is requiring a significant amount of discovery, planning, data modeling, and development work before the data becomes available for analysis by the business users.

Second one is a Data Lake which is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed. We can say that Data Lake is a more organic store of data without regard for the perceived value or structure of the data.

Data Warehouses compared to Data Lakes - Depending on the business requirements, a typical organization will require both a data warehouse and a data lake as they serve different needs, and use cases.
Characteristics
Data Warehouse
Data Lake
Type of data stored
Structured data (most often in columns & rows in a relational database) from transactional systems, operational databases, and line of business applications
Any type of data structure,
any format, including structured, semi-structured, and unstructured data from IoT devices, web sites, mobile apps, social media, and corporate applications
Best way to ingest data
Batch processes
Streaming, micro-batch, or
batch processes
Schema
Designed prior to the DW implementation (schema-on-write)
define the structure of the data at the time of analysis , referred to as schema on reading (schema-on-read)
Typical load pattern
ETL - (Extract, Transform, then Load)
ELT - (Extract, Load, and Transform at the time the data is loaded)
Price/Performance
Fastest query results using higher cost storage
Query results getting faster using low-cost storage
Data Quality
Highly curated data that serves as the central version of the truth
Any data that may or may not be curated (ie. raw data)
Users
Business analysts
Data scientists, Data developers, and Business analysts (using curated data)
Analytics pattern
Determine structure, acquire data, then analyze it; iterate back to change structure as needed.
Batch reporting, BI and visualizations
Acquire data, analyze it, then iterate to determine its final structured form.
Machine Learning, Predictive analytics, data discovery and profiling
During the development of a traditional data warehouse, we should decide a considerable amount of time which is going to spend analyzing data sources, understanding business processes, profiling data, and modeling data.
In contrast, the default expectation for a data lake is to acquire all of the data and retain all of the data.
Please visit us to learn more on -
  1. Collaboration of OLTP and OLAP systems
  2. Major differences between OLTP and OLAP
  3. Data Warehouse - Introduction
  4. Data Warehouse - Multidimensional Cube
  5. Data Warehouse - Multidimensional Cube Types
  6. Data Warehouse - Architecture and Multidimensional Model
  7. Data Warehouse - Dimension tables.
  8. Data Warehouse - Fact tables.
  9. Data Warehouse - Conceptual Modeling.
  10. Data Warehouse - Star schema.
  11. Data Warehouse - Snowflake schema.
  12. Data Warehouse - Fact constellations
  13. Data Warehouse - OLAP Servers.
  14. Preparation for a successful Data Lake in the cloud
  15. Why does cloud make Data Lakes Better?

Popular Posts

Get Sponsored by Big Brands