Source to AWS Official documentation is here: https://aws.amazon.com/lambda/ In this video I will use the "Author from scratch" option to demonstrate how AWS Lambda is working by passing a string argument to the function and returning a specified output based on the input value. It is like a Hello World example. But if you are learning AWS, this could be a good start. Finally, you will learn how to test your Lambda function by simulating various scenarios by changing input parameters. Enjoy! 🚀AWS Lambda is a serverless computing solution offered by Amazon Web Services. It enables you to run code without setting up or handling servers, helping you to focus only on application logic rather than infrastructure management. It lets you run code without thinking about servers. 🔍Serverless Computing: AWS Lambda uses the serverless computing model, which means you only pay for the compute time you need and there are no charges while your code is not running. This makes it extremely cost-effective, particularly in applications with irregular or unexpected workloads. 🔍Event-Driven Architecture: Lambda functions are triggered by events such as data changes in Amazon S3 buckets, Amazon DynamoDB table updates, HTTP requests through Amazon API Gateway, or custom events from other AWS or third-party services. This event-driven architecture enables you to create responsive, scalable apps. 🔍 Support for Multiple Programming Languages: Lambda supports several programming languages, including Node.js, Python, Java, Go, Ruby, and .NET Core. You can write your Lambda functions in the language of your choice, making it flexible for developers with different skill sets. 🔍Auto Scaling: AWS Lambda automatically adjusts your functions based on incoming traffic. It can handle thousands of requests per second and does not require manual scaling configurations. Lambda scales resources transparently, ensuring that your functions are highly accessible and responsive. 🔍Integration with AWS Ecosystem: Lambda seamlessly connects with other AWS services, allowing you to construct sophisticated and efficient processes. For example, you may design serverless applications that process data from Amazon S3, generate notifications via Amazon SNS, and store results in Amazon DynamoDB—all without maintaining servers or infrastructure. 🔍Customization and Control: While Lambda abstracts away server management, it still allows you to customize your runtime environment, define memory and timeout settings, and configure environment variables. This lets you fine-tune your functionalities to satisfy specific needs. 🔍You pay only for the compute time that you consume — there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service, all with zero administration. 🔍Lambda responds to events : Once you create Lambda functions, you can configure them to respond to events from a variety of sources. Try sending a mobile notification, streaming data to Lambda, or placing a photo in an S3 bucket. 🔍 AWS Lambda streamlines the process of developing and deploying applications by automating infrastructure management responsibilities, allowing developers to concentrate on creating code and providing business value. ⭐To learn more, please follow us - http://www.sql-datatools.com ⭐To Learn more, please visit our YouTube channel at - http://www.youtube.com/c/Sql-datatools ⭐To Learn more, please visit our Instagram account at - https://www.instagram.com/asp.mukesh/ ⭐To Learn more, please visit our twitter account at - https://twitter.com/macxima ⭐To Learn more, please visit our Medium account at -
Easy way to learn and implement the Microsoft technologies.
Monday, June 10, 2024
RedShift — How to Import CSV/JSON Files into RedShift Serverless
Source to AWS Official documentation is here: https://aws.amazon.com/lambda/ In this video I will use the "Author from scratch" option to demonstrate how AWS Lambda is working by passing a string argument to the function and returning a specified output based on the input value. It is like a Hello World example. But if you are learning AWS, this could be a good start. Finally, you will learn how to test your Lambda function by simulating various scenarios by changing input parameters. Enjoy! 🚀AWS Lambda is a serverless computing solution offered by Amazon Web Services. It enables you to run code without setting up or handling servers, helping you to focus only on application logic rather than infrastructure management. It lets you run code without thinking about servers. 🔍Serverless Computing: AWS Lambda uses the serverless computing model, which means you only pay for the compute time you need and there are no charges while your code is not running. This makes it extremely cost-effective, particularly in applications with irregular or unexpected workloads. 🔍Event-Driven Architecture: Lambda functions are triggered by events such as data changes in Amazon S3 buckets, Amazon DynamoDB table updates, HTTP requests through Amazon API Gateway, or custom events from other AWS or third-party services. This event-driven architecture enables you to create responsive, scalable apps. 🔍 Support for Multiple Programming Languages: Lambda supports several programming languages, including Node.js, Python, Java, Go, Ruby, and .NET Core. You can write your Lambda functions in the language of your choice, making it flexible for developers with different skill sets. 🔍Auto Scaling: AWS Lambda automatically adjusts your functions based on incoming traffic. It can handle thousands of requests per second and does not require manual scaling configurations. Lambda scales resources transparently, ensuring that your functions are highly accessible and responsive. 🔍Integration with AWS Ecosystem: Lambda seamlessly connects with other AWS services, allowing you to construct sophisticated and efficient processes. For example, you may design serverless applications that process data from Amazon S3, generate notifications via Amazon SNS, and store results in Amazon DynamoDB—all without maintaining servers or infrastructure. 🔍Customization and Control: While Lambda abstracts away server management, it still allows you to customize your runtime environment, define memory and timeout settings, and configure environment variables. This lets you fine-tune your functionalities to satisfy specific needs. 🔍You pay only for the compute time that you consume — there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service, all with zero administration. 🔍Lambda responds to events : Once you create Lambda functions, you can configure them to respond to events from a variety of sources. Try sending a mobile notification, streaming data to Lambda, or placing a photo in an S3 bucket. 🔍 AWS Lambda streamlines the process of developing and deploying applications by automating infrastructure management responsibilities, allowing developers to concentrate on creating code and providing business value. ⭐To learn more, please follow us - http://www.sql-datatools.com ⭐To Learn more, please visit our YouTube channel at - http://www.youtube.com/c/Sql-datatools ⭐To Learn more, please visit our Instagram account at - https://www.instagram.com/asp.mukesh/ ⭐To Learn more, please visit our twitter account at - https://twitter.com/macxima ⭐To Learn more, please visit our Medium account at -
Monday, November 6, 2023
Data Engineering — Best ETL Solution
Data engineering is fighting over standards and governance, and it is not easy to align a large organization to a set of governing standards. You must choose the technology stack and tools that are appropriate for you, the company, and your requirements. If you are searching for ETL solutions for the enterprise, the following are some extra considerations-
- Market availability of skill set
- No code or low code
- Monitoring your pipelines has never been easier
- Model of licensing. It depends on the number of automobiles, memory, and so on.
Note: In your tooling, make a split in data
ingestion, data transformation and data storage and look for those 3 parts
separately.
For
example, you can work on a full open-source data platform with-
1.
Airbyte or Airflow for
data ingestion,
2.
dbt or DataForm for transformation and
3.
you can use a combination of Postgress, minio and clickhouse
for storage.
To
learn more, please follow us -
http://www.sql-datatools.com
To
Learn more, please visit our YouTube channel at —
http://www.youtube.com/c/Sql-datatools
To
Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To
Learn more, please visit our twitter account at -
https://twitter.com/macxima
Tuesday, October 3, 2023
Data Engineering — How to Data Pipeline Scaling
If you work as a data engineer, data analytics, or data scientist in an organization that needs you on a project and are using a pretty standard ELT architecture to extract data from several sources into on-premise or cloud-based systems, this is a good fit.

Data Curiosity: Data curiosity is essential for a successful company that values data before you begin creating your data pipeline. It’s a constantly changing part of data culture that pushes you to seek out new or current data, challenge it, and utilize it to make more accurate decisions about data patterns within source systems, such as —
- How much data in the DB?
- How much in the API?
- Are queries to the API deterministic?
- Do they have cases of combinatorial explosion, or is it fairly straightforward?
You could clarify the data curiosity by assuming that the data in the database consists of customer-level aggregates at multiple dimensions, which are already quite large in Snowflake/On-premise databases or cloud based databases and will grow linearly with customer growth. The API access consists of both point and range queries; paginated responses for range queries are required. Moving this data to an RDBMS at regular periods is an option, but it adds complexity in terms of frequency of loads, database pressure, and adding another layer for us to reconcile, etc.
To read the full story, please reach out to my Medium article here.
Saturday, November 19, 2022
Data Engineering — Scala or Python
Ifyou are a Data Engineer, you will most
likely need to know python anyways. This really depends on what you want to do
within data engineering and where you want to work. I agree that SQL and Python
are the most important for starting out and give you access to a lot more
opportunities than Scala. The Scala market is super niche and dominated by
Spark, which is pretty unpleasant to work for.
Spark
runs at the same pace in Scala and Python (save for UDFs), thus it is
meaningless.
You must keep in mind that both are vastly different in
terms of learning. Python is incredibly simple, and instead of learning it, you
basically just pick it up. Scala, on the other hand, is a “Scalable Language”
and has depths that are worth exploring that will keep you on your heels for
years. Then again, if you only learn it to write Spark code, there is not much
to learn apart from Spark DSL.
Practically,
Python is an interlanguage and one of the fastest-growing programming
languages. Whether it’s data manipulation with Pandas, creating visualizations
with Seaborn, or deep learning with TensorFlow, Python seems to have a tool for
everything. I have never met a data engineer who doesn’t know Python.
Apache
Beam - a data processing framework that’s gaining popularity because it can
handle both streaming and batch processing and runs on Spark.
Scala is the superior language; it can
do everything Python does and provides type checking during compile time, but
it’s not used nearly as much as Python and Java.
Scala is built on the JVM and should be relatively easy to
get started with. so, Scala might be a bit more comfortable for a Java dev
within the Spark workflow, but only just a bit.
As you know that Scala isn’t used everywhere. Also, you
should know that in Apache Beam (a data processing framework that’s gaining
popularity because it can handle both streaming and batch processing and runs
on Spark), the language choices are Java, Python, Go, and Scala. So, even if
you “only” know Java, you can get started with data engineering through Apache
Beam.
Some of the technical differences
between Python and Scala:
1.
Scala
is typed; Python is untyped.
2.
Scala
is expression-oriented; Python has expressions and statements.
3.
Partly
as a consequence of 2) lambdas in Python are “broken.”
4.
Python’s
OO-based metaprogramming only allows one metaclass per class (I ran into this
the one time I used Python professionally).
5.
Python
has FP-pretensions, and the itertools module is nice, but it’s full of corner
cases and hard to use consistently with the whole range of modules you probably
want to use.
Our
recommendation and suggestions — These are fit based on your requirements or
business needs —
1.
If
you have time and want to improve your software engineering skill set, choose
Scala, but go beyond the Spark DSL. Scala is a statically typed programming
language, and the compiler knows each variable or expression at runtime.
2.
If
you just want another tool in your data engineering tool belt, choose Python.
Python is a dynamically typed programming language, where variables are
interpreted during runtime and don’t follow a predefined structure for defining
variables.
3.
Python
is an excellent choice if you want to migrate into other industries such as
machine learning or web applications because it is relatively simple to master
if you have no prior expertise in coding.
4.
Scala,
on the other hand, is a natural next step and may serve as an entry point to
more complicated languages if you wish to improve your coding skills.
It is strongly
suggested to go the Python route because you can utilize Python for other use
cases besides inside Databricks in the future. In a normal term, Python is like
learning English, you’ll find it in most places in the world, whereas Scala
will be more like learning German.
It depends on
the situation. Means, if you are a beginner then Python is easy to learn, and
you can easily find out the learning materials over the internet.
1.
Python
is the fastest growing language with the biggest communities.
2.
Python
can be easily connected with any technology to bring or push the data by using
various APIs.
3.
Python
can easily fit in almost every requirement and make your life easier in your
career path if you are in DE, DA or DS roles.
4.
Python
can easily run in almost every environment after installing some supportive
libraries or packages.
In my job, I
have always found it to bring the data from any sources such as Salesforce,
Salesforce Marketing Cloud, SharePoint, Cloud Technologies (Azure, AWS, GCP),
data sources (SQL Server, MySQL, Postgres, Client-house, Oracle, or Teradata
etc.), Amazon Marketplace, Any Social Media Platforms, and can scrap the data
from any websites.
If
you have the time, you might also start with pure Scala to study functional
programming, particularly immutability and sloppy evaluation, as well as the
fundamentals of Spark. Of course, Python is required for job possibilities, but
if you are familiar with Scala Spark, the transition to PySpark should be
rather simple.
The following
are the most significant Python disadvantages that
are Scala advantages:
·
The
classification system: Python is fine if you can remember all the kinds. It
becomes extremely difficult to iterate and rework on a big project without
encountering type-related runtime issues.
·
Python
threads are only parallelized in rare circumstances where the GIL may be
avoided. Processes are parallelized, however the amount of memory that can be
shared/serialized among processes is limited. Async/await is fantastic, but
only if there is no local processing. Scala contains some well-established
primitives that completely outperform Python.
If you have any
experience with C# or Java language, then you can also choose Scala.
Furthermore, Python is more popular than Scala, especially
in data engineering, where Scala excels. When you use the majority language,
you don’t notice the others; when you use a more niche language, seeing and
hearing about the mainstream language everywhere might be bothersome.
To
learn more, please follow us -
http://www.sql-datatools.com
To
Learn more, please visit our YouTube channel at —
http://www.youtube.com/c/Sql-datatools
To
Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To
Learn more, please visit our twitter account at -
https://twitter.com/macxima