5 Common Amazon Kinesis Issues

Taavi Rehemägi - May 26 '22 - - Dev Community

Amazon Kinesis is the real-time stream processing service of AWS. Whether you got video, audio, or IoT streaming data to handle, Kinesis is the way to go.

Kinesis is a serverless managed service that integrates nicely with other services like Lambda or S3. Often, you will use it when SQS or SNS is too low-level.  

But as with all the other services on AWS, Kinesis is a professional tool that comes with its share of complications. This article will discuss the most common issues and explain how to fix them. So, let's get going!

This article is written by Kay Plößer and originally posted to Dashbird blog

1. What Limits Apply when AWS Lambda is Subscribed to a Kinesis Stream?

If your Kinesis stream only has one shard, the Lambda function won't be called in parallel even if multiple records are waiting in the stream. To scale up to numerous parallel invocations, you need to add more shards to a Kinesis Stream.

Kinesis will strictly serialize all your invocations per shard. This is a nice feature for controlling your parallel Lambda invocations. But it can slow down overall processing if the function takes too long to execute.

If you aren't relying on previous events, you can use more shards, and Lambda will automatically scale up to more concurrent invocations. But keep in mind that Lambda itself has a soft limit on 1,000 concurrent invocations. You can reach out to AWS to get this limit lifted. There isn't an explicitly defined hard limit above that, but AWS mentions its multiples of 10,000.

2. Data Loss with Kinesis Streams and Lambda

If you call put_record in a loop to publish records from a Lambda function to a Kinesis stream, this can fail mid-loop. To fix this, make sure you catch any errors the put_record method throws; otherwise, your function will crash and only partially publish the list of records.

If one Lambda invocation is responsible for publishing multiple records to a Kinesis stream, you have to make sure a crash of the Lambda function doesn't lose data. Depending on your use case, this could mean you need to use retries or another queue in front of your Lambda function. 

You can also try to catch any errors instead of crashing and then put the missing records somewhere else to ensure they don't get lost.

3. InvokeAccessDenied Error When Pushing Records from Firehose to Lambda

You're trying to push a record from Kinesis Firehose to a Lambda function but get an error. This is usually a permission issue with IAM roles. To fix this, make sure to assign your firehose the correct IAM role.

In the Resource section of your policy document, you need to make sure all your Lambda functions' ARNs are listed. You achieve this with either a wildcard in the ARN or an array of ARNs. 

But there can be many other permission problems that prevent invocation. Some of them are:

  • Missing the "Action": ["lambda:InvokeFunction"]
  • Having an "Effect": "Deny" somewhere
  • Assigning the wrong role to the firehose

4. Error When Trying to Update the Shard Count

You tried to update the shard count too often in a given period. The UpdateShardCount method has rather tight limits. To get around this issue, you can call other functions like SplitShard and MergeShards, with more generous quotas.

Often, you don't know how many shards are sufficient to handle your load, so you have to update their numbers over time. AWS limits how you meddle with the shard count. To quote the docs here, you can't

  • Scale more than ten times per rolling 24-hour period per stream
  • Scale up to more than double your current shard count for a stream
  • Scale down below half your current shard count for a stream
  • Scale up to more than 10000 shards in a stream
  • Scale a stream with more than 10000 shards down unless the result is less than 10000 shards
  • Scale up to more than the shard limit for your account

If you use other methods, you can get around some of the limitations, which give you more flexibility around sharding.

5. Shard is Not Closed

You interacted too soon after you created a new stream. Creating a new stream can take up to 10 minutes to complete. You can set timeouts after creating a stream or ensure that you retry a few times to fix this.

Creating new streams or shards isn't an instant action. It happens very quickly, but you might have to wait for minutes in the worst case. As with any distributed system, you have to keep latencies in mind. Otherwise, your logs will be littered with errors.

Summary 

If you have to process your data or media in real-time, it's best to go for Kinesis on AWS

Sadly, it's not as straightforward as SQS and SNS, but it's also more flexible than those services.

Your best course of action is to learn about the limitations of the service so you aren't littered with avoidable error messages. Also, make sure to program your Lambda functions robustly so they don't crash with half your data not processed yet.

Monitoring Kinesis with Dashbird

Dashbird will monitor all your Kinesis streams out of the box. Additionally, Dashbird will evaluate all your Kinesis logs according to the Well-Architected Framework. So, it's not just metrics and errors en masse, but actionable information to improve your architecture with AWS best practices. 

Try Dashbird now for free, or check out our product tour!

At Dashbird, we understand that serverless's core idea and value is to focus on the customer and the ability to avoid heavy lifting. That's precisely what we provide. Finally, we allow developers to t*hink about the end-user *again and not be distracted by debugging and alarm management or worry about whether something is working.


Further reading:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .