Introduction:

EC2 instances are virtual servers for running applications on the AWS infrastructure. It is crucial for providing scalable computing capacity, allowing users to deploy and manage applications efficiently in the cloud. EC2 instances are widely used for hosting websites, running databases, and handling various computing workloads.

Managing EC2 instances manually can be a daunting task, especially when dealing with multiple instances and varying usage patterns. Automating this process not only saves time but also ensures that your resources are used efficiently, leading to significant cost savings. By leveraging AWS Lambda, EventBridge, and Terraform, you can create an automated solution that starts and stops your EC2 instances based on a schedule, ensuring optimal resource utilization and cost efficiency.

In this guide, we'll take you through the entire process of setting up this automation, from creating the EC2 instances to configuring the Lambda functions and EventBridge rules using Terraform. Let's dive in and unlock the potential of automated cloud resource management!

Architecture:

EC2: An EC2 instance is a virtual server which is used for running applications on the AWS infrastructure.

Lambda: AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. It automatically scales applications by running code in response to events. Lambda is widely used for event-driven applications, real-time file processing, and backend services.

EventBridge: Amazon EventBridge is a serverless event bus service that makes it easy to connect applications using data from your own apps, SaaS apps, and AWS services. It simplifies event-driven architecture by routing events between services and allowing you to build scalable, event-driven workflows for various use cases such as application integration, automation, and observability.

IAM Role: An IAM (Identity and Access Management) role in AWS defines permissions for entities like AWS services or users, ensuring secure access to AWS resources without needing long-term credentials. Roles are used to delegate permissions across AWS services and are integral for managing security and access control within cloud environments.

Pre-requisites:
Before we dive into the steps, let's ensure you have the following prerequisites in place:

AWS Account: If you don't have one, sign up for an AWS account.
Terraform Installed: Download and install Terraform from the official website.
AWS CLI Installed: Install the AWS CLI by following the instructions here.
AWS Credentials Configured: Configure your AWS CLI with your credentials by running aws configure.

Step-By-Step Procedure:

We'll walk you through the entire process of setting up this automation using Terraform. The steps include configuring the AWS provider, creating the EC2 instances, setting up IAM roles and policies, defining the Lambda functions, and creating the EventBridge rules.

Step-1: Create a main.tf file. This file contains the configuration for creating three instances, IAM role for the lambda function to access the EC2 instance, lambda functions for starting the EC2 instances and stopping the EC2 instances, EventBridge rules for triggering the startec2instance lambda function and stopec2instance lambda function.



provider "aws" {
  region = "ap-south-1"
}

resource "aws_instance" "ec2" {
  count                  = var.instance_count
  ami                    = "ami-02a2af70a66af6dfb"  
  instance_type          = "t2.micro"  # Update with your desired instance type
  vpc_security_group_ids = [var.security_group_id]
  subnet_id              = var.subnet_id
  key_name               = var.key
  tags = merge(var.default_ec2_tags,
    {
      Name = "${var.name}-${count.index + 1}"
    }
  )
}

resource "aws_iam_role" "lambda_role" {
  name = "lambda_role"

  # Terraform's "jsonencode" function converts a
  # Terraform expression result to valid JSON syntax.
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Sid    = ""
        Principal = {
          Service = "lambda.amazonaws.com"
        }
      },
    ]
  })

  tags = {
    tag-key = "tag-value"
  }
}

resource "aws_iam_policy" "lambda_policy_start_stop_instance" {
  name        = "lambda_policy_start_stop_instance"
  path        = "/"
  description = "My test policy"


  # Terraform expression result to valid JSON syntax.
  policy = jsonencode({
    Version = "2012-10-17"
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:Start*",
                "ec2:Stop*",
                "ec2:Describe*"
            ],
            "Resource": "*"
        }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "test-attach" {
  role       = aws_iam_role.lambda_role.name
  policy_arn = aws_iam_policy.lambda_policy_start_stop_instance.arn
}


resource "aws_lambda_function" "stop_ec2_instance" {
  # If the file is not in the current working directory you will need to include a
  # path.module in the filename.
  filename      = "stopec2instance.zip"
  function_name = "stop_ec2_instance"
  role          =  aws_iam_role.lambda_role.arn
  handler       = "stopec2instance.lambda_handler"
  source_code_hash = filebase64sha256("stopec2instance.zip")

  runtime = "python3.11"
}

resource "aws_lambda_function" "start_ec2_instance" {
  # If the file is not in the current working directory you will need to include a
  # path.module in the filename.
  filename      = "startec2instance.zip"
  function_name = "startec2instance"
  role          =  aws_iam_role.lambda_role.arn
  handler       = "startec2instance.lambda_handler"
  source_code_hash = filebase64sha256("startec2instance.zip")

  runtime = "python3.11"
}


resource "aws_cloudwatch_event_rule" "stop_ec2_schedule" {
    name                = "stop_ec2_schedule"
    description         = "Schedule to trigger Lambda to stop EC2 instances every 2 minutes"
    schedule_expression = "rate(2 minutes)"
  }


resource "aws_cloudwatch_event_target" "stop_ec2_target" {
  rule      = aws_cloudwatch_event_rule.stop_ec2_schedule.name
  target_id = "lambda"
  arn       = aws_lambda_function.stop_ec2_instance.arn
}

resource "aws_lambda_permission" "allow_cloudwatch_stop" {
  statement_id  = "AllowExecutionFromCloudWatch"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.stop_ec2_instance.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.stop_ec2_schedule.arn
}

resource "aws_cloudwatch_event_rule" "start_ec2_schedule" {
    name                = "start_ec2_schedule"
    description         = "Schedule to trigger Lambda to start EC2 instances every 1 minute"
    schedule_expression = "rate(1 minute)"
  }

resource "aws_cloudwatch_event_target" "start_ec2_target" {
  rule      = aws_cloudwatch_event_rule.start_ec2_schedule.name
  target_id = "lambda"
  arn       = aws_lambda_function.start_ec2_instance.arn
}

resource "aws_lambda_permission" "allow_cloudwatch_start" {
  statement_id  = "AllowExecutionFromCloudWatch"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.start_ec2_instance.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.start_ec2_schedule.arn
}

Step-2: Create variables.tf file

variable "instance_count" {
  description = "Number of EC2 instances to create"
  default     = 3
}

variable "security_group_id" {
  description = "ID of the security group for EC2 instances"

}

variable "subnet_id" {
  description = "ID of the subnet for EC2 instances"

}

variable "key" {
  description = "Name of the SSH key pair for EC2 instances"

}

variable "name" {
  description = "Name prefix for EC2 instances"

}

variable "default_ec2_tags" {
  type        = map(string)
  description = "(optional) default tags for EC2 instances"
  default = {
    managed_by   = "terraform"
    Environment  = "Dev"
  }
}

Step-3: Create terraform.tfvars file which contains configuration such as number of instance, security group id, subnet id, key pair name, name of the instance.

instance_count     = 3
security_group_id  = "sg-0944b5d5471b421fb"
subnet_id          = "subnet-0582feff6651618d4"
key                = "mynewkeypair"
name               = "EC2-Test-Instance"

Step-4: Create two python files stopec2instance, startec2instance this files contain the code for the lambda function. Make sure the python files are zipped and they lie in the same directory.

#stopec2instance
import boto3

def is_dev(instance):
    is_dev = False
    if 'Tags' in instance:
        for tag in instance['Tags']:
            if tag['Key'] == 'Environment' and tag['Value'] == 'Dev':
                is_dev = True
                break
    return is_dev

def is_running(instance):
    return instance['State']['Name'] == 'running'

def lambda_handler(event, context):
    ec2 = boto3.client('ec2', region_name='ap-south-1')

    try:
        response = ec2.describe_instances()
        reservations = response['Reservations']

        for reservation in reservations:
            for instance in reservation['Instances']:
                if is_dev(instance) and is_running(instance):
                    instance_id = instance['InstanceId']
                    ec2.stop_instances(InstanceIds=[instance_id])
                    print(f'Stopping instance: {instance_id}')

    except Exception as e:
        print(f'Error stopping instances: {str(e)}')

    return {
        'statusCode': 200,
        'body': 'Function executed successfully'
    }

#startec2instance
import boto3

def is_dev(instance):
    is_dev = False
    if 'Tags' in instance:
        for tag in instance['Tags']:
            if tag['Key'] == 'Environment' and tag['Value'] == 'Dev':
                is_dev = True
                break
    return is_dev

def is_stopped(instance):
    return instance['State']['Name'] == 'stopped'

def lambda_handler(event, context):
    ec2 = boto3.client('ec2', region_name='ap-south-1')

    try:
        response = ec2.describe_instances()
        reservations = response['Reservations']

        for reservation in reservations:
            for instance in reservation['Instances']:
                if is_dev(instance) and is_stopped(instance):
                    instance_id = instance['InstanceId']
                    ec2.start_instances(InstanceIds=[instance_id])
                    print(f'Starting instance: {instance_id}')

    except Exception as e:
        print(f'Error starting instances: {str(e)}')

    return {
        'statusCode': 200,
        'body': 'Function executed successfully'
    }

terraform init: — To initialize the backend that means terraform will check in this step what is the provider used here and correspondingly download all the dependencies of that provider (AWS in our case) if everything is fine the output will show somewhat like this :

terraform plan: In this step terraform will show you how many resources it will create like this :

terraform apply: In this step it wil actually create the resources based on the previous step

Once all the resources are created the output will be like this:

EC2 instance

Lambda Function

EventBridge Rules

Whenever the lambda function is triggered by EventBridge rules the output will be like this

If you want to delete the resources you have to give terraform destroy command.

Conclusion:
By automating the start and stop of EC2 instances using Lambda, EventBridge, and Terraform, we've created an efficient and cost-effective solution for managing our cloud resources. This setup can be easily adapted to suit different schedules and requirements.

Happy automating!

Automating EC2 Instance Management with AWS Lambda and EventBridge Using Terraform