From the course: Complete Guide to Serverless Web App Development on AWS
Understand Lambda scaling for syncronous invocations - Amazon Web Services (AWS) Tutorial
From the course: Complete Guide to Serverless Web App Development on AWS
Understand Lambda scaling for syncronous invocations
- [Instructor] One question you might have now is about cold starts. You've probably heard about them and worry they might affect your web app. Let's look at what cold starts are and how Lambda behaves when handling synchronous invocation. Synchronous invocations are the ones coming from API Gateway, Application Load Balancer, or directly from SDK calls. In these cases, the caller waits for a response. This is different from asynchronous invocations, where the caller should send the request and moves on. So if the Lambda is behind an API Gateway, the client is waiting for a response, and you need it fast. Before diving into how synchronous invocations scale, let's first understand the lifecycle of a Lambda execution environment. A Lambda function goes through three main stages. First, initialization or init phase. This is where the function is set up. The development package is pulled from S3 or as a container image, extensions are initialized, the runtime starts, and the function code inits run. This phase can be optimized using smaller packages, faster runtimes, and avoiding heavy init logic. Then we have invocation. This is when the function is actually triggered, for example, by an API Gateway, an S3 event, or something else. Lambda keeps the execution environment alive for a while, so future invocations can reuse the same environment. This avoids running in the init phase again. When your function is reused like this, it's called a warm invocation. If the environment is not available and needs to be created from scratch, it's called a cold start. Cold starts take longer because they include both the init and invoke phases. Finally, we have the shutdown phase. Lambda eventually shuts down the environment when it's not longer needed. You cannot control when this happens, and it really doesn't bothers you. Now let's talk about how Lambda scales for synchronous calls. When a synchronous invocation hits a cold function, the environment might first initialize, and the caller has to wait for that. If the environment is warm, then the response is much faster. That part is easy to understand. But to fully understand how Lambda scales, there are a few more concepts you need to know. Lambda can handle multiple concurrent invocations for the same functions or different ones. Each parallel request will spin a new execution environment or reuse an existing one if it's available. There's also something called account concurrency limit. By default, a new AWS account gets 1,000 concurrent executions. If you hit this limit, your function invocations will be throttled. You can request a higher limit through AWS support. Now, here is a key for synchronous calling. A Lambda function can scale by 1,000 concurrent executions every 10 seconds, until the account concurrency limit is reached. Each function in your account scales independently, no matter how it's invoked. So let's go through an example. Imagine you have a function that receives requests every 10 seconds. Your account concurrency limit is set to 7,000 concurrent executions. This limit is shared across all functions in the account. The scaling rate is 1,000 concurrent executions every 10 seconds per function. Here is how it looks. At 9:00 in the morning, the function is already running with 1,000 concurrent executions. At 9:00 and 10 seconds, a burst of 1,000 new requests arrive. The function handles them. It scales by another 1,000. 10 seconds later, another 1,000 requests arrive, no problem. 10 seconds later, now 1,500 new requests arrive. Lambda handles the first 1,000, but 500 are throttled. Nine o'clock, one minute, the function is handling now 4,500 requests at the same time. A burst of 3,000 more arrives. Lambda processed the first 1,000, and 2,000 gets throttled. 10 seconds after, another 2,000 requests, Lambda can handle 1,000 more, and the rest gets throttled. 10 seconds later, the function now is at 6,500 concurrent executions. A new burst of 1,000 requests comes. But only 500 are processed. The other 500 are throttled because the account limit is set at 7,000 and it has been reached. If you have more than one function, each one can scale at the same 1,000 per 10 second rate independently, but all functions share the same account concurrency limit. Once that limit is hit, all new invocations across all functions will be throttled. Now, you have a better idea of how your application scales with Lambda. Even in a serverless world, you need to be aware of limits and performance under load. Next, I will show you how to run load tests for your serverless application so you can be ready when traffic comes.
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Migrating to a serverless architecture4m 2s
-
(Locked)
Challenges of migrating to serverless4m 39s
-
(Locked)
Lambda Functions URL3m 27s
-
(Locked)
Challenge: Create a new Lambda function1m 2s
-
(Locked)
Solution: Create a new Lambda function4m 3s
-
(Locked)
Creating a function URL2m 23s
-
(Locked)
Introduction to Lambda Web Adapter3m 13s
-
(Locked)
Lift and shift an existing webapp with Lambda web adapter6m 52s
-
Understand Lambda scaling for syncronous invocations5m 33s
-
(Locked)
Running load tests to understand how our web app performs13m 26s
-
(Locked)
Progress check: Migrating an existing web app to lambda2m 5s
-
(Locked)
-
-