5 tips to efficiently manage AWS security groups  using Terraform

5 tips to efficiently manage AWS security groups using Terraform

Discover 5 proven strategies for scalable and stress-free security rule group management on AWS using Terraform.

Introduction

As a DevOps professional, I've helped numerous customers build landing zones on AWS and develop Infrastructure as Code (IaC) using Terraform. One of the questions I hear most often is on how to manage infrastructure security following the least privilege principle. For network security such as firewalls and security groups, this means having to define very specific rules for ports and CIDR ranges, which can quickly get out of hand as the number of resources and applications increase. Over the years, I have learned about using different strategies to minimize the maintenance overhead. In this blog post, I share these tips with a focus on security groups in hope that it helps my fellow AWS engineers and architects. Let's dive right in!

Tip 1: Use the standalone resources to manage security group rules

While the aws_security_group resource supports in-line rule definitions using the ingress and egress configuration blocks, this usage is considered deprecated due to various legacy limitations. The recommended approach nowadays is to define ingress and egress rules using the aws_vpc_security_group_ingress_rule and aws_vpc_security_group_egress_rule resources respectively. These standalone resources allow more fine-grained control over the security group rules. The following is a basic example for managing a security group for Linux-based bastion hosts:

resource "aws_security_group" "bastion" {
  name        = "app-prod-sg-use1-bastion"
  description = "Security group for bastion hosts"
  vpc_id      = aws_vpc.this.id
}

# Ingress rule that allows SSH access from the office's external gateway IP
resource "aws_vpc_security_group_ingress_rule" "bastion_ssh_office" {
  security_group_id            = aws_security_group.bastion.id
  cidr_ipv4                    = "1.2.3.4/32"
  from_port                    = 22
  ip_protocol                  = "tcp"
  to_port                      = 22
}

# Egress rule that allows all access
resource "aws_vpc_security_group_egress_rule" "bastion_all" {
  security_group_id            = aws_security_group.bastion.id
  cidr_ipv4                    = "0.0.0.0/0"
  from_port                    = -1
  ip_protocol                  = -1
  to_port                      = -1
}
💡
If you are curious about the naming convention I for resources in the examples such as app-prod-sg-use1-bastion, check out my blog post My quest to finding the perfect AWS resource naming scheme!

Tip 2: Utilize map variables and for_each to minimize boilerplate configuration

Any sane person would find it unmanageable to maintain one big Terraform configuration file with tens and hundreds of security group rule resource blocks that look largely the same. This is where the for_each meta-argument can drastically reduce boilerplate code.

for_each takes a map or string set and creates an instance of a resource for each map or set item. For managing security group rules, you can store the rule information in a map and use for_each with the aws_vpc_security_group_ingress_rule and aws_vpc_security_group_egress_rule resources. Here is an example for managing a security group for a web server which is accessed by bastion hosts and Application Load Balancer that resides in a public subnet:

# Terraform configuration (.tf)

variable "web_security_group_rules" {
  description = "The security group rules for the web servers."
  type = object({
    ingress = optional(map(object({
      cidr_ipv4   = string
      from_port   = number
      ip_protocol = string
      to_port     = number
    })), {})
    egress = optional(map(object({
      cidr_ipv4   = string
      from_port   = number
      ip_protocol = string
      to_port     = number
    })), {})
  })
}

resource "aws_security_group" "web" {
  name        = "app-prod-sg-use1-web"
  description = "Security group for web servers"
  vpc_id      = aws_vpc.this.id
}

resource "aws_vpc_security_group_ingress_rule" "web" {
  for_each          = var.web_security_group_rules.ingress
  security_group_id = aws_security_group.web.id
  cidr_ipv4         = each.value.cidr_ipv4
  from_port         = each.value.from_port
  ip_protocol       = each.value.ip_protocol
  to_port           = each.value.to_port
}

resource "aws_vpc_security_group_egress_rule" "web" {
  for_each          = var.web_security_group_rules.egress
  security_group_id = aws_security_group.web.id
  cidr_ipv4         = each.value.cidr_ipv4
  from_port         = each.value.from_port
  ip_protocol       = each.value.ip_protocol
  to_port           = each.value.to_port
}
# Variable definition (.tfvars)

web_security_group_rules = {
  ingress_rules = {
    "http-public-subnet" = {
      cidr_ipv4   = "10.0.0.0/24"
      from_port   = 80
      ip_protocol = "tcp"
      to_port     = 80
    }
    "https-public-subnet" = {
      cidr_ipv4   = "10.0.0.0/24"
      from_port   = 443
      ip_protocol = "tcp"
      to_port     = 443
    }
    "ssh-public-subnet" = {
      cidr_ipv4   = "10.0.0.0/24"
      from_port   = 22
      ip_protocol = "tcp"
      to_port     = 22
    }
  }
  egress_rules = {
    "all" = {
      cidr_ipv4   = "0.0.0.0/0"
      from_port   = 0
      ip_protocol = -1
      to_port     = 0
    }
  }
}

The example can be extended to support other rule attributes as follows:

resource "aws_vpc_security_group_ingress_rule" "web" {
  for_each                     = var.web_security_group_rules.ingress
  security_group_id            = aws_security_group.web.id
  cidr_ipv4                    = try(each.value.cidr_ipv4, null)
  cidr_ipv6                    = try(each.value.cidr_ipv6, null)
  prefix_list_id               = try(each.value.prefix_list_id, null)
  referenced_security_group_id = try(each.value.referenced_security_group_id, null)
  from_port                    = each.value.from_port
  ip_protocol                  = each.value.ip_protocol
  to_port                      = each.value.to_port
}

This way, you can provide any one of the four source/destination types (IPv4 CIDR, IPv6 CIDR, prefix list, security group) in your map variable and have the rules dynamically created.

Meanwhile, those with more Terraform experience may have realized that this won't work very well in practice because variable values defined in tfvars files are static (that is, you cannot use a variable in another variable). This means you cannot specify, for instance, the ID of a prefix list resource created in the same configuration in your map variable. To address this, we could instead use a local value to define the rule configuration like this (replace var. with local. in the above configuration):

locals {
  web_security_group_rules = {
    ingress_rules = {
      "http-alb" = {
        referenced_security_group_id = aws_security_group.alb.id
        from_port   = 80
        ip_protocol = "tcp"
        to_port     = 80
      }
      "https-alb" = {
        referenced_security_group_id = aws_security_group.alb.id
        from_port   = 443
        ip_protocol = "tcp"
        to_port     = 443
      }
      "ssh-public-subnet" = {
        cidr_ipv4   = aws_subnet.public.cidr_block
        from_port   = 22
        ip_protocol = "tcp"
        to_port     = 22
      }
    }
    egress_rules = {
      "all" = {
        cidr_ipv4   = "0.0.0.0/0"
        from_port   = 0
        ip_protocol = -1
        to_port     = 0
      }
    }
  }
}

With this design, all rule details are compactly defined in one place, thus improving maintenance and readability.

Tip 3: Use managed prefix lists to group CIDR blocks

In an enterprise environment, an AWS landing zone may have more sophisticated setup with:

  • Multiple availability zones to support high availability

  • Hybrid connectivity via VPN or Direct Connect

  • IP whitelisting for workload access

These features could increase the number of CIDR blocks that apply to security group rules and the complexity of your Terraform configuration. To combat this, you can take a consolidation approach using an often-overlooked VPC feature called the managed prefix list. A managed prefix list is a set of one or more CIDR blocks. You can use prefix lists to make it easier to configure and maintain your security groups and route tables.

The following is an example for a managed prefix list that groups the CIDR blocks for the public subnets that span multiple AZs:

resource "aws_ec2_managed_prefix_list" "public" {
  name           = "app-prod-pl-use1-public"
  address_family = "IPv4"
  max_entries    = length(var.azs)
}

resource "aws_ec2_managed_prefix_list_entry" "public" {
  # var.azs contains the list of AZs where public subnets exists (us-east-1a, us-east-1b, etc.)
  for_each       = toset(var.azs)
  cidr           = aws_subnet.public[each.key].cidr_block
  description    = "CIDR block for the public subnet in AZ ${each.key}"
  prefix_list_id = aws_ec2_managed_prefix_list.public.id
}

You can then define security group ingress or egress rule resources with the prefix_list_id attribute set to aws_ec2_managed_prefix_list.public.id. As you can imagine, this is more manageable than having to define different sets of rules for however many AZs you are using. The solution is also useful for other aforementioned use cases such as keeping track of on-premises CIDR blocks or IP whitelists for restricted workloads.

Be aware of service quota issues related to the use of managed prefix lists. As described in the aws_ec2_managed_prefix_list resource documentation, the managed prefix list size is defined by the max_entries attribute, which counts towards the number of entries in a security group regardless of the actual number of entries. Ensure that you set max_entries to the exact number of entries or at least to a more conservative number, otherwise you may quickly hit the "Inbound or outbound rules per security group" service quota limit.

Tip 4: Consider roles and responsibilities when organizing security group resources

Your Terraform stack will inevitably become more complex and involve more cross-functional collaboration over time. In the context of security groups, you may encounter the following scenarios:

  • An IT security team manages and scrutinizes all that relates to firewalls, including security groups.

  • Different teams manage different aspects of a workload as a vertical. For example, DBAs may manage RDS resources while application development team manage EC2 and EKS resources.

Each scenario may necessitate different organizational strategies for your Terraform configuration for optimal efficiency. For example, you might want to organize resources by type (EC2, RDS, etc.) or by workload (Tableau, in-house application, etc.) In both cases, it makes sense to maintain the security group resources alongside the resources to which they are tied. Meanwhile, you might want to separate security group resources into its own file, so that the IT security team can focus on just that one file. Or you can combine both strategies to define a file structure like:

ec2.tf
ec2-sg-rules.tf
rds.tf
rds-sg-rules.tf
fs.tf
fs-sg-rules.tf

Having a good Terraform structure will drastically improve the developer experience for the end-users who are often not as well-versed in Terraform as a typical DevOps engineer.

Tip 5: Use a Terraform module from the community

For those who are looking for a turn-key solutions to better manage Terraform configurations, you may fancy the Terraform Registry and the plethora of modules that are available in it. Simply put, modules are containers for multiple resources that are used together. The following are the most popular community-developed modules for managing security groups on AWS:

These modules offer quality of life improvements by abstracting common configurations into simpler constructs and providing support for more complex scenarios. Chances are, these modules already employ some of the design patterns described in the earlier tips. Documentation for these modules are also decent, so you should be able to figure out how they work quickly.

On the flip side, you might find over time that these modules are not flexible enough to support your requirements because they are either too limited or are too opiniated. You could extend or work around module limitations, but that adds to the complexity which negates the benefits of using a module in the first place.

Due to the caveats, experienced Terraform practitioners may prefer building their own modules using vanilla Terraform constructs and resources. In practice, many of them have accumulated enough experience and reusable artifacts that developing custom modules is not terribly time-consuming.

I would recommend keeping an open mind and giving these modules a try first to see if they fit your need. They may very well be a big timesaver if you have mostly typical requirements or don't have a lot of R&D bandwidth. The key is to find the right balance and more importantly, figure out what works best for you and your team.

Bonus tip: Use a CSV-based solution

If you are interested in adopting a CSV-based solution to manage security groups in Terraform, take a look at my blog post Building a dynamic AWS security group solution with CSV in Terraform.

Summary

Effective use of Terraform features such as for_each and community modules, and AWS features such as managed prefix lists, will help you better manage security groups in AWS using Terraform.

As you gain more experience, you will also identify better ways to structure and develop your Terraform configuration based on your team's and organization's needs. Many of these tips also apply to general Terraform usages, so I hope you find this blog post helpful and can put the tips into practice.

Please also check out my other blog posts or let me know what you'd like to learn more about!