Friday, April 26, 2024

TSQL - How to Send Email/Alert with attached files in SQL Server

In this tutorial, you will learn "How to Send Email/Alert with attached files By Using Gmail SMTP Server and T-SQL" in SQL Server.

Database mail is the most useful feature of SQL Server which allows us to send email though the database. This feature can be used in reporting services, integration services and in our application also.

How to get Gmail Password for APP

TSQL - Database Email Configuration || By Using Gmail SMTP Server


By using this feature, we can easily tracks the email logs also and schedule these email alerts by the help of SQL Server Agent.

Well, we will learn how to enable and configure database mail in SQL Server using T-SQL codes without using Database Mail Configuration Wizard.

Sending an email with an attachment directly from SQL Server can be useful in certain scenarios where automated notifications or reports need to be delivered to stakeholders or systems. Here are a few reasons why you might want to send emails with attachments from SQL Server:

Automated Reporting: SQL Server can generate reports or data extracts based on scheduled tasks or triggers. These reports can be automatically attached to emails and sent to relevant recipients, allowing for streamlined distribution of information.

Alerts and Notifications: SQL Server can monitor database events or conditions and trigger alerts when specific criteria are met (e.g., low disk space, failed jobs, exceeding thresholds). These alerts can be sent via email with relevant information or diagnostic data attached as an attachment.

Data Export: SQL Server may need to export data in a specific format (e.g., CSV, Excel) for consumption by external systems or users. This exported data can be attached to an email and sent to designated recipients or systems for further processing.

Integration with External Systems: SQL Server may need to interact with other systems or processes that require data exchange via email. Attaching files to emails allows SQL Server to seamlessly integrate with these external systems or processes.

Automated Workflows: Emails with attachments can be part of automated workflows or processes within an organization. For example, exporting data from SQL Server, attaching it to an email, and sending it to a specific department for review or analysis.

Auditing and Compliance: In regulated industries, such as finance or healthcare, sending data via email with attachments may be necessary for auditing and compliance purposes. SQL Server can automate this process to ensure data integrity and security.


The below steps will be defined step by steps as given below–
🔍Activate & Authorize Email Account
🔍Configured Database Email with Profile
🔍Enable Advanced Option for CMD
🔍Generate CSV file with BCP Command
🔍Send Email with Attachment CSV file

Live demo on YouTube -


⭐To learn more, please follow us -
⭐To Learn more, please visit our YouTube channel at -
⭐To Learn more, please visit our Instagram account at -
⭐To Learn more, please visit our twitter account at -
⭐To Learn more, please visit our Medium account at -

Saturday, March 23, 2024

DataBricks ⏩How to remove NULL values from PySpark arrays?

In this tutorial, you will learn "How to remove NULL values from PySpark arrays?" in DataBricks.

In PySpark, the array_compact function is used to remove null elements from an array. It returns a new array with null elements removed. This function is useful when dealing with arrays in DataFrame columns, especially when you want to clean up or filter out null values from array-type columns.


💎You will frequently want to throw out the NULL values in a PySpark array rather than write logic to deal with these values. array_compact makes getting rid of NULL values quite easy.

Thursday, March 21, 2024

Spark Interview Question 2 - Difference between Transformation and Action in Spark?

In Apache Spark, transformations and actions are two fundamental concepts that play crucial roles in defining and executing Spark jobs. Understanding the difference between transformations and actions is essential for effectively designing and optimizing Spark applications.

What are Transformations in Spark?

👉Transformations in Spark are operations that are applied to RDDs (Resilient Distributed Datasets) to create a new RDD.
👉When a transformation is applied to an RDD, it does not compute the result immediately. Instead, it creates a new RDD representing the transformed data but keeps track of the lineage (dependencies) between the original RDD and the transformed RDD.
👉Transformations are lazy evaluated, meaning Spark delays the actual computation until an action is triggered.
👉Examples of transformations include map(), filter(), flatMap(), groupByKey(), reduceByKey(), sortByKey(), etc.



What are Actions in Spark?

👉Actions in Spark are operations that trigger the computation of a result from an RDD and return a non-RDD value.
👉When an action is invoked on an RDD, Spark calculates the result of all transformations leading to that RDD based on its lineage and executes the computation.
👉Actions are eager evaluated, meaning they kick off the actual computation in Spark.
👉Examples of actions include collect(), count(), reduce(), saveAsTextFile(), foreach(), take(), first(), etc.


Wednesday, March 20, 2024

Spark Interview Question 1 - Why is Spark preferred over MapReduce?

Apache Spark is an open-source distributed computing system meant for large data processing and analytics. It offers a single engine for distributed data processing that prioritizes speed, simplicity of use, and customization. Spark was developed at UC Berkeley's AMPLab and eventually submitted to the Apache Software Foundation.


Saturday, March 16, 2024

SQL - How to Use Qualify Clause

In this tutorial, you will learn "How to Use Qualify Clause" in SQL.

The QUALIFY clause in SQL is used in combination with window functions to filter rows based on the results of window functions. It allows you to apply filtering conditions to the result set after window functions have been calculated.



Sunday, March 3, 2024

Scala - How to Calculate Running Total Or Accumulative Sum in DataBricks

In this tutorial, you will learn "How to calculate Running Total Or Accumulative Sum by using Scala" in DataBricks.

Scala is a computer language that combines the object-oriented and functional programming paradigms. Martin Odersky invented it, and it was initially made available in 2003. "Scala" is an abbreviation for "scalable language," signifying the language's capacity to grow from simple scripts to complex systems.

Scala is a language designed to be productive, expressive, and compact that can be used for a variety of tasks, from large-scale corporate applications to scripting. It has become more well-liked in sectors like banking, where its robust type system and expressive syntax are very helpful.

To compute a running total in Scala using a DataFrame in Apache Spark, you can use the Window function along with sum aggregation.

To compute a running total within groups in a DataFrame using Scala and Apache Spark, you can still utilize the Window function, but you'll need to partition the data by the group column.

Steps to be followed -
💎 Import necessary classes and functions from Apache Spark.

SQL Window Functions - How to Calculate Running Total || Accumulative Sum

In this tutorial, you are going to learn "How to Calculate Running Total Or Accumulative Sum in SQL" by using SQL Window Functions.

SQL window functions are a powerful feature that allows you to perform calculations across a set of rows related to the current row, without collapsing the result set. These functions operate on a "window" of rows, which is defined by a specific partition of the data and an optional order.

Window functions are commonly used for analytical and reporting tasks. Window functions have a similar syntax to regular aggregate functions, but they include an additional OVER clause that defines the window specification.

Monday, February 26, 2024

Azure Part 2- Interview Questions and Answers

 

Part 1 - Interview Questions and Answers

Scala — Transpose or Pivot | Rows to Columns in Dataframe | Databricks

In this tutorial, you will learn "How to Transpose or Pivot | Rows to Columns in Dataframe by using Scala" in Databricks.

Data integrity refers to the quality, consistency, and reliability of data throughout its life cycle. Data engineering pipelines are methods and structures that collect, transform, store, and analyse data from many sources.

Scala is a computer language that combines the object-oriented and functional programming paradigms. Martin Odersky invented it, and it was initially made available in 2003. "Scala" is an abbreviation for "scalable language," signifying the language's capacity to grow from simple scripts to complex systems.

Scala is a language designed to be productive, expressive, and compact that can be used for a variety of tasks, from large-scale corporate applications to scripting. It has become more well-liked in sectors like banking, where its robust type system and expressive syntax are very helpful.

If you want Transpose or Pivot | Rows to Columns in Dataframe by using Scala in Databricks, then you have to follow the following steps - 💎 Import necessary Spark classes for DataFrame operations.

Sunday, February 25, 2024

Azure Part 1- Interview Questions and Answers


❓What are the important Azure Storage services? 

🎤Answer - Azure offers several storage services that cater to different use cases and requirements. Some of the important Azure Storage services include:

  1. Azure Blob Storage: Blob storage is optimized for storing massive amounts of unstructured data, such as text or binary data. It is suitable for serving images or documents directly to a browser.
  2. Azure Files: Azure Files offers fully managed file shares in the cloud, accessible via the industry-standard Server Message Block (SMB) protocol. It's suitable for migrating existing applications that use file shares to Azure without significant changes.
  3. Azure Disk Storage: Disk Storage provides durable and high-performance block storage for Azure Virtual Machines. It includes managed disks, which simplify disk management for VMs by handling disk availability and reliability.
  4. Azure Queue Storage: Queue Storage is a messaging store for reliable messaging between application components. It is often used to decouple application components and to provide asynchronous communication.
  5. Azure Table Storage: Table Storage is a NoSQL key-value store that's suitable for semi-structured data. It's well-suited for applications that require scalable and flexible data storage.
  6. Azure Data Lake Storage: Data Lake Storage is a scalable and secure data lake solution for big data analytics. It integrates with Azure Synapse Analytics, Azure Databricks, and other analytics services.
  7. Azure Archive Storage: Archive Storage is a low-cost, secure, and durable storage solution for rarely accessed data. It's suitable for data that needs to be retained for compliance or regulatory reasons.
These are some of the key Azure Storage services, each designed to address specific storage needs and scenarios. Depending on your requirements, you may use one or more of these services in combination to build robust and scalable storage solutions on Azure.

Primarily used containers as blob storage and data lake(ADLS Gen2) 

Scala — Retrieve matched rows from two Dataframes in Databricks

In this tutorial, you will learn "How to Retrieve matched rows from two Dataframes by using Scala" in Databricks.

Data integrity refers to the quality, consistency, and reliability of data throughout its life cycle. Data engineering pipelines are methods and structures that collect, transform, store, and analyse data from many sources.


Scala is a computer language that combines the object-oriented and functional programming paradigms. Martin Odersky invented it, and it was initially made available in 2003. "Scala" is an abbreviation for "scalable language," signifying the language's capacity to grow from simple scripts to complex systems.

Scala is a language designed to be productive, expressive, and compact that can be used for a variety of tasks, from large-scale corporate applications to scripting. It has become more well-liked in sectors like banking, where its robust type system and expressive syntax are very helpful.

If you want to retrieve matched rows from two DataFrames based on two or more columns, you can still use the join method in Spark DataFrame API but you'll need to specify multiple columns in the join condition. 💎 Import necessary Spark classes for DataFrame operations.

Saturday, February 24, 2024

DataBricks - Change column names from CamelCase to Snake_Case by Scala

In this tutorial, you will learn "How to Change column names from CamelCase to Snake_Case by using Scala" in Databricks.

💡Imagine we have an input Dataframe (as in the image). Our goal is to achieve the desired output Dataframe (also in the image).

Basically, you have to change the names of column as follows-
Age -> Age , 
FirstName -> First_Name, 
CityName -> City_Name, 
CountryName -> Country_Name


To create a Dataframe in Scala, you can use Apache Spark's Dataframe API.

In this example: 💎Import necessary Spark classes for Dataframe operations. 💎Create a SparkSession which is the entry point to Spark SQL functionality. 💎Define a schema for our Dataframe using StructType and StructField. 💎Define the data as a sequence of rows, where each row represents a record in the Dataframe. 💎Create the Dataframe using createDataFrame method of SparkSession, passing in the data and schema. 💎Display the Dataframe using show() method. 💎Create Variable to store Regex Pattern 💎Create Variable to store new Snake Case columns 💎Create new Dataframe with Snake Case Columns 💎Finally, display the data from the Dataframe

SQL - Data cleaning | Handling Missing Values

In this tutorial, you will learn "How to Handling Missing Values? " in SQL as a part of Data Cleaning techniques.

Data cleaning is an essential part of the data preprocessing pipeline to ensure that the data used for analysis or modeling is accurate, consistent, and reliable.



Handling missing values: Missing values can impact the accuracy of analysis. You can identify and handle missing values using SQL functions like IS NULL or COALESCE, and then decide whether to replace them with a default value or remove them. IS NULL and COALESCE are SQL functions used for handling null values, but they serve different purposes.

Friday, February 23, 2024

DataBricks - How to find duplicate records in Dataframe by Scala

In this tutorial, you will learn " How to find duplicate records in Dataframe by using Scala?" in Databricks.





In Databricks, you can use Scala for data processing and analysis using Spark. Here's how you can work with Scala in Databricks: 💎Interactive Scala Notebooks: Databricks provides interactive notebooks where you can write and execute Scala code. You can create a new Scala notebook from the Databricks workspace. 💎 Cluster Setup: Databricks clusters are pre-configured with Apache Spark, which includes Scala API bindings. When you create a cluster, you can specify the version of Spark and Scala you want to use. 💎Import Libraries: You can import libraries and dependencies in your Scala notebooks using the %scala magic command or by specifying dependencies in the cluster configuration. 💎Data Manipulation with Spark: Use Scala to manipulate data using Spark DataFrames and Spark SQL. Spark provides a rich set of APIs for data processing, including transformations and actions. 💎 Visualization: Databricks supports various visualization libraries such as Matplotlib, ggplot, and Vega for visualizing data processed using Scala and Spark. 💎 Integration with other Languages: Databricks notebooks support multiple languages, so you can integrate Scala with Python, R, SQL, etc., in the same notebook for different tasks.

Read CSV file into Dataframe

DataBricks - How to Read CSV into Dataframe by Scala

In this tutorial, you will learn "How to Read CSV into Dataframe by Scala?" in Databricks.
In Databricks, you can use Scala for data processing and analysis using Spark. Here's how you can work with Scala in Databricks:

💎Interactive Scala Notebooks: Databricks provides interactive notebooks where you can write and execute Scala code. You can create a new Scala notebook from the Databricks workspace.

💎 Cluster Setup: Databricks clusters are pre-configured with Apache Spark, which includes Scala API bindings. When you create a cluster, you can specify the version of Spark and Scala you want to use.

💎Import Libraries: You can import libraries and dependencies in your Scala notebooks using the %scala magic command or by specifying dependencies in the cluster configuration.

💎Data Manipulation with Spark: Use Scala to manipulate data using Spark DataFrames and Spark SQL. Spark provides a rich set of APIs for data processing, including transformations and actions.

💎 Visualization: Databricks supports various visualization libraries such as Matplotlib, ggplot, and Vega for visualizing data processed using Scala and Spark.

💎 Integration with other Languages: Databricks notebooks support multiple languages, so you can integrate Scala with Python, R, SQL, etc., in the same notebook for different tasks.

Thursday, February 22, 2024

SQL - Data Cleaning Techniques | How to remove duplicates records

In this tutorial, you will learn "How to remove duplicates records " in SQL as a part of Data Cleaning techniques.
Data cleaning is an essential part of the data preprocessing pipeline to ensure that the data used for analysis or modeling is accurate, consistent, and reliable.

Removing duplicates: Duplicates in a dataset can skew analysis results. You can remove duplicates using the DISTINCT keyword or by using the GROUP BY clause.

Wednesday, February 21, 2024

SQL - How to find out employees and their manager

In this tutorial, you will learn "How to find out employees and their manager" in SQL.


To achieve this in SQL, we are using self join technique to combine rows from two or more tables based on a related column between them.




Self Join:

A self join is a join operation where a table is joined with itself. It's typically used when you want to compare rows within the same table.
This is achieved by aliasing the table with different names in the query so that it can be treated as two separate tables.
Self joins are useful when you need to compare different rows within the same table, such as finding hierarchical relationships or comparing values across different rows.

Please go through our Youtube tutorial video -



To learn more, please follow us -
🔊 http://www.sql-datatools.com

To Learn more, please visit our YouTube channel at —
🔊 http://www.youtube.com/c/Sql-datatools

To Learn more, please visit our Instagram account at -
🔊 https://www.instagram.com/asp.mukesh/

To Learn more, please visit our twitter account at -
🔊 https://twitter.com/macxima

DataBricks - Interview Questions and Answers

Originally, Databricks was a Notebook interface for using Spark without having to worry with the infrastructure for distributed computing. All you had to do was provide the desired cluster size, and Databricks took care of the rest. Before distributed computing became the norm, this was really big.

It's grown a lot since then (though I'm not sure in what order), especially to produce a front-end SQL interface like to that (which, incidentally, runs Spark underneath). Since your files are saved as files rather than tables, you can now treat Databricks like a database or data warehouse thanks to the virtual data warehouse interface that they have designed. However, once Delta Lake was revealed, your files became tables and could be utilized outside of Databricks in other contexts. Additionally, Databricks Workflows, which are integrated directly into Databricks, allow you to organize your Databricks work.

Unity Catalogue in Databricks, which allows Databricks to manage and abstract your data access through a single lens, thereby acting as a stand-alone data platform.

 

Friday, February 16, 2024

SQL - SQL Execution Order | 𝗛𝗼𝘄 𝗦𝗤𝗟 𝗤𝘂𝗲𝗿𝗶𝗲𝘀 𝗔𝗿𝗲 𝗘𝘅𝗲𝗰𝘂𝘁𝗲𝗱

In this tutorial, you are going to learn very basic concept "SQL Execution Order | 𝗛𝗼𝘄 does 𝗦𝗤𝗟 statement execute?" in SQL.

The SQL execution order is the sequence in which the different clauses of a SQL query are executed. This order is important because it determines how the data is filtered, grouped, and ordered.



The following is a simplified explanation of the SQL execution order:

• 𝐅𝐑𝐎𝐌/𝐉𝐎𝐈𝐍: The FROM clause specifies the tables from which the data will be retrieved. The JOIN clause can be used to combine data from multiple tables.
• 𝐖𝐇𝐄𝐑𝐄: The WHERE clause is used to filter the data based on specific conditions.
• 𝐆𝐑𝐎𝐔𝐏 𝐁𝐘: The GROUP BY clause is used to group the data based on one or more columns.
• 𝐇𝐀𝐕𝐈𝐍𝐆: The HAVING clause is used to filter the grouped data based on specific conditions.
• 𝐒𝐄𝐋𝐄𝐂𝐓: The SELECT clause specifies the columns that will be returned in the result set.
• 𝐃𝐈𝐒𝐓𝐈𝐍𝐂𝐓: The DISTINCT keyword can be used to ensure that only distinct rows are returned in the result set.
• 𝐎𝐑𝐃𝐄𝐑 𝐁𝐘: The ORDER BY clause is used to sort the result set in ascending or descending order.
• 𝐋𝐈𝐌𝐈𝐓: The LIMIT clause can be used to restrict the number of rows returned.


To learn more, please follow us - 🔊 http://www.sql-datatools.com To Learn more, please visit our YouTube channel at — 🔊 http://www.youtube.com/c/Sql-datatools To Learn more, please visit our Instagram account at - 🔊 https://www.instagram.com/asp.mukesh/ To Learn more, please visit our twitter account at -
🔊 https://twitter.com/macxima

Wednesday, February 14, 2024

SQL - Questions for some job interviews

As you are aware, the majority of competent interviewers will be more interested in your approach and problem-solving skills than in meticulously examining your syntax. 

Instead of obsessing over understanding every scenario that may occur, concentrate more on the resources in your toolkit. 

In most recent interview, we examined a schema and answered a few queries about joins, filtering, aggregations (including running totals), CTEs, and rank/windows.



The most common things, you should be needed in the SQL interview:
  • Simple things like Select Distinct, Count, Count Distinct, Min/Max, Sum.
  • Windows functions like Lag, Over, Partition by
  • Joins, using the filter clause in join to filter to one result instead of using where.
  • Group by, Order By

1. Can you explain the difference between where and having?

Answer - When it comes to the sequence in which executable operations are performed, "WHERE" is used to filter out things from the table, while "HAVING" is used in conjunction with aggregate functions like "SUM" or "AVG" to filter out certain aggregated items.

Example - Fruits Data Table

OrderDate

Fruit

Price

11 Feb. 2024

Apple

$10.00

12 Feb. 2024

Banana

$5.00

13 Feb. 2024

Banana

$7.50

14 Feb. 2024

Mango

$12.00

 SQL Query -

"SELECT Fruit, Sum(Price) as Total 
FROM Fruits
WHERE Fruit = 'Banana' 
GROUP BY Fruit 
HAVING  Sum(Price) > 10.00;"

Google Cloud Platform - How to Create Cloud Functions with Pub/Sub Trigger

In this article, you will learn "How to Create Cloud Functions with Pub/Sub Trigger based events" whenever a new Pub/Sub message is sent to a specific topic in Google Cloud Platform.

The Cloud Function is triggered when a message is published to a Pub/Sub topic so event is: google.pubsub.topic.publish




What is Cloud Pub/Sub? Google Cloud Pub/Sub is a fully-managed, scalable, global and secure messaging service that allows you to send and receive messages among applications and services. You can use Cloud Pub/Sub’s to integrate decoupled systems and components hosted on Google Cloud Platform or elsewhere on the Internet. What is Cloud Functions? Cloud Functions are a serverless computing service offered by Google Cloud Platform (GCP) which are an easy way to run your code in the cloud. It supports Java, Python, Ruby, Node.js, Go, PHP and .Net. Currently, Google Cloud Functions support events from the following providers- HTTP, Cloud Storage, Cloud Firestore, Pub/Sub, Firebase, and Stackdriver. Gen1 is more lightweight, one concurrency per instance, simple features and less knob to tweak, cheaper, it's pretty much deploy and forget, it is actually an AppEngine standard, while gen2 is on Cloud Run (on GKE), you have more control, up to 1k concurrency per instance, larger resources, longer timeouts, etc, If you don't need it, just use gen1.

For the actual demo, please visit us at our YouTube channel at -

To learn more, please follow us - 🔊 http://www.sql-datatools.com To Learn more, please visit our YouTube channel at — 🔊 http://www.youtube.com/c/Sql-datatools To Learn more, please visit our Instagram account at - 🔊 https://www.instagram.com/asp.mukesh/ To Learn more, please visit our twitter account at -

Monday, February 12, 2024

Cloud Function - Load data into Big Query tables against GCS events

In this article, you will learn "How to Fire Cloud Functions on GCS object events to pull the CSV file into BigQuery Data Table" in Google Cloud Platform. Cloud Functions are a serverless computing service offered by Google Cloud Platform (GCP) which are an easy way to run your code in the cloud.


It supports Java, Python, Ruby, Node.js, Go, and .Net. Currently, Google Cloud Functions support events from the following providers- HTTP, Cloud Storage, Cloud Firestore, Pub/Sub, Firebase, and Stackdriver. Gen1 is more lightweight, one concurrency per instance, simple features and less knob to tweak, cheaper, it's pretty much deploy and forget, it is actually an AppEngine standard, while gen2 is on Cloud Run (on GKE), you have more control, up to 1k concurrency per instance, larger resources, longer timeouts, etc, If you don't need it, just use gen1. To complete the tasks outlined above, you must have a GCP account and appropriate access.

To accomplish this task, you can use Google Cloud Functions to trigger on Google Cloud Storage (GCS) object events and then pull the CSV file into a BigQuery data table. Here's a general outline of how you can do this:

  • Set up Google Cloud Functions: Create a Cloud Function that triggers on GCS object events. You can specify the event types (e.g., google.storage.object.finalize) to trigger the function when a new file is uploaded to a specific bucket.
  • Configure permissions: Ensure that your Cloud Function has the necessary permissions to access both GCS and BigQuery. You'll likely need to grant the Cloud Function service account permissions to read from GCS and write to BigQuery.
  • Write the Cloud Function code: Write the Cloud Function code to handle the GCS object event trigger. When a new CSV file is uploaded to GCS, the function should read the file, parse its content, and then insert the data into a BigQuery table.
  • Create a BigQuery table: Before inserting data into BigQuery, make sure you have a table created with the appropriate schema to match the CSV file structure.
  • Insert data into BigQuery: Use the BigQuery client library within your Cloud Function code to insert the data parsed from the CSV file into the BigQuery table.
For the actual demo, please visit us at our YouTube channel at -


To learn more, please follow us - 🔊 http://www.sql-datatools.com To Learn more, please visit our YouTube channel at — 🔊 http://www.youtube.com/c/Sql-datatools To Learn more, please visit our Instagram account at - 🔊 https://www.instagram.com/asp.mukesh/ To Learn more, please visit our twitter account at - 🔊 https://twitter.com/macxima