Monday, October 30, 2023

AWS Lambdas are bad ideas for running Memory Intensive Computations

In today's data-driven world, businesses of all sizes must be able to properly acquire, analyse, and interpret data in order to make intelligent decisions and stay ahead of the curve. 

Digital transformation refers to the use of digital technology to improve company operations and create new value for customers. The importance of data in digital transformation cannot be overstated. Businesses must be able to collect and evaluate data in need to identify trends, opportunities, and areas of improvements.

AWS Lambda is a serverless computing service provided by Amazon Web Services (AWS). It allows you to run code without provisioning or managing servers, automatically scaling based on the number of requests or events. 

Key features of AWS Lambda - 

  1. Serverless Computing: AWS Lambda follows the serverless computing paradigm, where you only pay for the compute time that you consume, without the need to manage server infrastructure.
  2. Event-Driven Execution: Functions in AWS Lambda are triggered by events. Events can include HTTP requests via API Gateway, changes in Amazon S3, updates in Amazon DynamoDB, messages from Amazon Simple Notification Service (SNS), etc.
  3. Supported Runtimes: AWS Lambda supports multiple programming languages and runtimes, including Node.js, Python, Java, Go, Ruby, .NET (C#), and custom runtimes.
  4. Stateless Execution: Functions in AWS Lambda are designed to be stateless. They don't retain information between invocations. If you need to maintain state, you can use other AWS services like Amazon RDS or DynamoDB.
  5. Automatic Scaling: AWS Lambda automatically scales based on the number of incoming requests or events. It can handle varying workloads without manual intervention.
  6. Memory and Execution Time Limits: Lambda functions have memory and execution time limits. The amount of memory allocated to a function ranges from 128 MB to 3,008 MB (3 GB), and the maximum execution time per invocation is 15 minutes.
  7. Event Sources: Lambda functions can be triggered by various event sources, including HTTP requests, file uploads to Amazon S3, changes in DynamoDB tables, updates in Amazon Kinesis, and more.
  8. Integration with AWS Services: Lambda integrates seamlessly with other AWS services, allowing you to build fully serverless applications. It can be used in conjunction with services like Amazon S3, Amazon DynamoDB, Amazon API Gateway, AWS Step Functions, and more.
  9. Cold Starts: AWS Lambda has a concept known as "cold starts," where the first invocation of a function after a period of inactivity may experience additional latency. This is something to consider for time-sensitive workloads.
  10. Environment Variables and Configuration: You can configure environment variables for Lambda functions, allowing you to parameterize your code and adjust behavior without changing code.
AWS Lambda is widely used for various use cases, including real-time file processing, data transformations, backend services for mobile and web applications, and more. Its serverless nature allows developers to focus on writing code and building features without managing the underlying infrastructure.

Benefits of using AWS Lambda:
  • Scalability: Lambda scales automatically, handling any number of image uploads without manual intervention.
  • Cost-Efficiency: XYZ Imagery only pays for the compute time consumed during image processing, avoiding the costs of maintaining idle servers.
  • Low Maintenance: With a serverless architecture, XYZ Imagery doesn't need to manage server infrastructure, operating systems, or application runtime environments.
  • Real-time Processing: Image processing occurs in near-real-time as soon as the image is uploaded, enhancing user experience.
  • Flexibility: The serverless architecture allows XYZ Imagery to focus on application logic and features without worrying about infrastructure management.
Although there are several advantages to employing AWS Lambda, it continues to be a terrible decision to use Lambda for undertaking memory-intensive calculations. 

To identify the issues, consider the following case study:
You have a lambda function that takes in big datasets and utilises ML to scan for particular characteristics. The present processing takes around 650 seconds and uses more than 8GB of memory. After that, you output the result to an S3 file and then implement a separate API to feed the results to a bespoke front end.
Please keep in mind that, Lambdas are relatively costly in terms of GB/CPU/hour. The advantage of Lambdas is their ability to run swiftly and successfully manage bursty demands. This is ultimately the trade-off you must strike.
  • If your workload is heavy and predictable enough, choosing another provider will save you money. 
  • If the process is small and uncertain, Lambda may be more cost-effective. 
On paper, it appears that the former is correct, hence placing this in EKS (or something more controllable/cheaper compute) is definitely the preferable solution.

Lambda is best suited for tasks that are certain to take 15 minutes and must be completed in near real time. However, if you have a computationally intensive activity that you don't require in real time or that might take more than 15 minutes to complete, AWS Batch is the way to go.
Remember that at 8GB, you'll have around 4.5 vcpus. We assume your process makes use of multithreading, therefore if you run this on a standard computer with, say, 8 vcpus, you should anticipate your performance to suffer.

Solutions: Lambdas are designed for ease of use. They are, nevertheless, pricey. It's OK if you can afford it. If you cannot or believe it is too expensive, there may be less priced alternatives.
  1. For long-running applications, use ECSs fargate. The calculations will be significantly cheaper.
  2. The SageMaker processing task is also noteworthy.
  3. AWS Batch is underutilized yet excellent for a variety of applications. People are enthusiastic about not utilising it because of the occasionally sluggish spin up time.

No comments:

Post a Comment