Building a dynamic AWS security group solution with CSV in Terraform

Building a dynamic AWS security group solution with CSV in Terraform

Learn step-by-step how to build a solution to dynamically manage AWS security groups in Terraform using a CSV file.

Introduction

In my recent blog post about effective AWS security group management in Terraform, I delved into valuable tips based on my experiences. This exploration reignited my interest for a shelved side project from a past DevOps engagement.

The concept revolves around using a comma-separated value (CSV) file to manage security group settings, which offers a streamlined approach to deploying security groups via Terraform. This not only facilitates a centralized GitOps deployment model but also caters to security analysts less familiar with Terraform, simplifying their operational workflow.

Over the weekend, I spent some time to design and develop this solution. The journey was not without its challenges which required some creative problem-solving and experimentation. As I believe fellow DevOps engineers can benefit from this approach, it motivated me to document and share my insights in this blog post.

To enhance the post's readability, I'll explain concepts using code snippets. You'll find the fully runnable Terraform configurations in the accompanying GitHub repository, with each step/section conveniently highlighted. Ready to embark on this mini journey towards a CSV-based solution for security group management? Let's dive right into the design and implementation!

Defining the solution requirements

The main goal of the solution is to provide a simpler way of managing AWS security groups (and specifically the rules) with a CSV file instead of Terraform variables and configuration. Here are some specific requirements to ensure flexibility of the solution:

  • Use a single CSV file to manage all security groups of the Terraform stack.

  • Support for all source/destination types - IPv4 CIDRs, IPv6 CIDRs, prefix lists, and security groups.

  • Support for dynamic values (for example, using the ID from a security group resource that is provisioned in the same Terraform configuration).

For the purpose of explaining the solution, let's consider a target workload that is a traditional three-tier Linux, Apache, MySQL, PHP (LAMP) web application running on AWS with the following architecture:

Infrastructure architecture for the example workload

To keep things simple, each security group for the resources (ALB, web server, MySQL instance) will have an egress rule that allows all outbound traffic. As for ingress rules, the requirements are as follows:

ResourceSourceProtocol and port
ALBInternet (0.0.0.0/0)TCP 443 (HTTPS)
Web serverALB (10.0.0.0/24)TCP 80 (HTTP)
MySQL instanceWeb server (10.0.1.0/24)TCP 3306 (MySQL)

As we develop the Terraform configuration in this blog post, we will focus only on creating the security group resources and a VPC resource which the security groups can be associated with. If you are interested in seeing a full solution in action, feel free to add the configuration to provision the network and workload resources at your leisure.

Defining the security group rule CSV file format

As per the requirements, the CSV file must support both ingress and egress rule definitions of all types for resources in the stack. The file format must adhere to the RFC 4180 specification, which is required by the Terraform csvdecode function that we will use in our Terraform configuration later. Considering the required attributes for the aws_vpc_security_group_ingress_rule and aws_vpc_security_group_egress_rule resources, the CSV file schema can be defined as follows:

ColumnDescriptionExample
resource_nameThe logical name of the resource to which the security group rules apply to.web
typeOne of: ingress, egressingress
nameThe name of the rule.http-alb
descriptionThe description of the rule.Allow HTTP access from the public subnet CIDRs
cidr_ipv4The source or destination IPv4 CIDR range.10.0.0.0/24
cidr_ipv6The source or destination IPv6 CIDR range.2001:db8::
prefix_list_idThe ID of the source or destination prefix list.pl-07b7b831714d4596a
referenced_security_group_idThe source or destination security group that is referenced in the rule.sg-0023839dc98251128
ip_protocolThe IP protocol name or number.-1 (all protocols), tcp
from_portThe start of port range for the TCP and UDP protocols, or an ICMP/ICMPv6 type.443
to_portThe end of port range for the TCP and UDP protocols, or an ICMP/ICMPv6 code.443

Since a security group rule expects only one of the four source or destination types, three of them would be optional for each rule. In the CSV file, we will leave those values empty, which will resolve to an empty string as you will see later.

To test the solution, let's define a file called sg-rules.csv with the following content that specifies all required rules for the ALB, web server, and MySQL instance. We will also start simple and use static values, specifically the subnet CIDR ranges, for the inbound rule sources.

resource_name,type,name,description,cidr_ipv4,cidr_ipv6,prefix_list_id,referenced_security_group_id,ip_protocol,from_port,to_port
db,ingress,postgres-web,Allow MySQL access from the private (web) subnet CIDRs,10.0.1.0/24,,,,tcp,3306,3306
db,egress,all,Allow all outgoing traffic,0.0.0.0/0,,,,-1,-1,-1
web,ingress,http-public,Allow HTTP access from the public subnet CIDRs,10.0.0.0/24,,,,tcp,80,80
web,egress,all,Allow all outgoing traffic,0.0.0.0/0,,,,-1,-1,-1
alb,ingress,https-all,Allow HTTPS access from the the internet,0.0.0.0/0,,,,tcp,443,443
alb,egress,all,Allow all outgoing traffic,0.0.0.0/0,,,,-1,-1,-1
Terraform will not parse a CSV file correctly if it has a UTF-8 BOM format. Software such as Excel may create files with a byte order mark (BOM). Ensure that you convert the CSV file to ANSI format without BOM before providing it to Terraform.

Developing the basic Terraform configuration

Now that we have defined the CSV file format, let's write the Terraform configuration. For the base resource definitions, we will use something similar to what is explained in tip #2 of my previous blog post, for example:

# TODO: Adapt this to CSV input
resource "aws_vpc_security_group_ingress_rule" "web" {
  for_each                     = var.web_security_group_rules.ingress
  security_group_id            = aws_security_group.web.id
  cidr_ipv4                    = try(each.value.cidr_ipv4, null)
  cidr_ipv6                    = try(each.value.cidr_ipv6, null)
  prefix_list_id               = try(each.value.prefix_list_id, null)
  referenced_security_group_id = try(each.value.referenced_security_group_id, null)
  from_port                    = each.value.from_port
  ip_protocol                  = each.value.ip_protocol
  to_port                      = each.value.to_port
}

We will need to load the CSV file and use its content, which can be done using the csvdecode function and the file function in a local value. The loaded content will be a list of objects each representing a row in the CSV file. For column values that are not specified, they will be loaded as an empty string which we will need to convert to null in some arguments. Now we need to adapt it to a map which is easier to supply to the for_each meta-argument:

locals {
  sg_rules_csv = csvdecode(file("${path.module}/sg-rules.csv"))
  sg_rules     = { for e in local.sg_rules_csv : "${e.resource_name}-${e.type}-${e.name}" => e }
}

To ensure uniqueness, we will use a combination of the resource name, rule type, and rule name as the map key. With a friendlier structure to for_each, let's update the rule resource definition to create rules based on the map. Here is an example of the web server security group ingress rule resources:

resource "aws_vpc_security_group_ingress_rule" "web" {
  for_each                     = { for k,v in local.sg_rules : "${v.name}" => v if v.resource_name == "web" && v.type == "ingress" }
  security_group_id            = aws_security_group.db.id
  cidr_ipv4                    = try(each.value.cidr_ipv4 != "" ? each.value.cidr_ipv4 : null, null)
  cidr_ipv6                    = try(each.value.cidr_ipv6 != "" ? each.value.cidr_ipv6 : null, null)
  prefix_list_id               = try(each.value.prefix_list_id != "" ? each.value.prefix_list_id : null, null)
  referenced_security_group_id = try(each.value.referenced_security_group_id != "" ? each.value.referenced_security_group_id : null, null)
  from_port                    = each.value.from_port
  ip_protocol                  = each.value.ip_protocol
  to_port                      = each.value.to_port
}

The value of for_each is the result of a for loop that filters for the ingress rules relevant to the web server. The destination attribute values are also updated to check for empty string and set the value to null.

You can find the complete Terraform stack up to this point in the basic directory of the GitHub repository that accompanies this blog post.

Now you can apply the Terraform configuration and see that it completes successfully. For good measure, verify in the AWS Management Console that the security groups are created with the correct set of rules (particularly the inbound rules that refer to the subnet CIDRs).

Adding variable support - first attempt

While referring to subnet CIDRs works, it does not offer the best security following the least privilege principle. For instance, other future workloads that are deployed to the web private subnet may be able to access the MySQL instance. As an improvement, the destination of the existing security group ingress rules should instead point to the appropriate workload security group.

Since the security groups are provisioned when the Terraform configuration is applied, their IDs are only known after they are created. It is certainly not desirable to manually copy the IDs into the CSV file afterwards, so we need to find a way to dynamically inject them. To address this, we can consider employing variable substitution.

As many Terraform practitioner knows, there is a templatefile function that can read a file while replacing template variables in the file content. (There is also a template_file data source but it is now considered deprecated.) Let's update the CSV file to use template variables to inject security group IDs in runtime like so:

resource_name,type,name,description,cidr_ipv4,cidr_ipv6,prefix_list_id,referenced_security_group_id,ip_protocol,from_port,to_port
db,ingress,postgres-web,Allow MySQL access from the private (web) subnet CIDRs,,,,${web_sg_id},tcp,3306,3306
db,egress,all,Allow all outgoing traffic,0.0.0.0/0,,,,-1,-1,-1
web,ingress,http-public,Allow HTTP access from the public subnet CIDRs,,,,${alb_sg_id},tcp,80,80
web,egress,all,Allow all outgoing traffic,0.0.0.0/0,,,,-1,-1,-1
alb,ingress,https-all,Allow HTTPS access from the the internet,0.0.0.0/0,,,,tcp,443,443
alb,egress,all,Allow all outgoing traffic,0.0.0.0/0,,,,-1,-1,-1

We also need to update the local value that loads the CSV file as follows:

locals {
  sg_rules_csv = csvdecode(templatefile("${path.module}/sg-rules.csv", {
    "alb_sg_id" = aws_security_group.alb.id
    "web_sg_id" = aws_security_group.web.id
  }))
  sg_rules     = { for e in local.sg_rules_csv : "${e.resource_name}-${e.type}-${e.name}" => e }
}
You can find the complete Terraform stack up to this point in the dynamic_attempt_1 directory of the GitHub repository that accompanies this blog post.

All is seemingly well, however when we apply the Terraform configuration, it fails with a few errors similar to the one below:

Error: Invalid for_each argument
│
│   on main.tf line 38, in resource "aws_vpc_security_group_ingress_rule" "db":
│   38:   for_each                     = { for k, v in local.sg_rules : "${v.name}" => v if v.resource_name == "db" && v.type == "ingress" }
│     ├────────────────
│     │ local.sg_rules will be known only after apply
│
│ The "for_each" map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of   
│ keys that will identify the instances of this resource.
│
│ When working with unknown values in for_each, it's better to define the map keys statically in your configuration and place apply-time results only in the   
│ map values.
│
│ Alternatively, you could use the -target planning option to first apply only the resources that the for_each value depends on, and then apply a second time  
│ to fully converge.

So what exactly are these errors about, and how do we fix them?

Fixing the for_each key issue and finalizing the solution

As the error message explained, for_each requires that the map keys be known during plan time. In fact, it is a frustratingly common problem that many Terraform practitioners have encountered. The limitation is also explained in the for_each documentation.

The problem is that the sg_rules map is derived from the sg_rules_csv local value, which is loaded using the templatefile function with template variable replacement. Due the replacement, the loaded CSV file content is no longer considered static (for instance, I could substitute entire rows into the content). Although to us, it should have been fair game because we are technically only replacing some values in a row instead of replacing entire rows.

Since we are sure that most of the contents, particularly the fields that comprises the key for sg_rules, we can build a static list off of it for looping while using the dynamically loaded CSV file content for all other information. For this, we will need some new local values:

locals {
  sg_rules_csv = csvdecode(templatefile("${path.module}/sg-rules.csv", {
    "web_sg_id" = aws_security_group.web.id
    "alb_sg_id" = aws_security_group.alb.id
  }))

  sg_rule_names = [for e in csvdecode(file("${path.module}/sg-rules.csv")) : "${e.resource_name}-${e.type}-${e.name}"]
  sg_rules      = { for e in local.sg_rules_csv : "${e.resource_name}-${e.type}-${e.name}" => e }
}

Notice that there is now a new list called sg_rule_names, which contains the map key names, using the CSV file content loaded using the vanilla file function with has no variable substitution. Meanwhile, we keep the sg_rule_csv and sg_rules values the same. What's important is that sg_rule_names must contain the exact list of keys in sg_rules as we have coded.

We can now update the for_each value in the rule resources to iterate using the static list of keys in local.sg_rule_names while fetching rule settings from local.sg_rules below:


resource "aws_vpc_security_group_ingress_rule" "web" {
  for_each                     = { for k in local.sg_rule_names : k => local.sg_rules[k] if startswith(k, "web-ingress") }
  security_group_id            = aws_security_group.web.id
  cidr_ipv4                    = try(each.value.cidr_ipv4 != "" ? each.value.cidr_ipv4 : null, null)
  cidr_ipv6                    = try(each.value.cidr_ipv6 != "" ? each.value.cidr_ipv6 : null, null)
  prefix_list_id               = try(each.value.prefix_list_id != "" ? each.value.prefix_list_id : null, null)
  referenced_security_group_id = try(each.value.referenced_security_group_id != "" ? each.value.referenced_security_group_id : null, null)
  from_port                    = each.value.from_port
  ip_protocol                  = each.value.ip_protocol
  to_port                      = each.value.to_port
}

Note that we also need to use static values in the if condition in the for_each loop instead of attributes in the map values. While is it not the most elegant solution, the hardcoding is still passible because we know specifically the resource and type to which the resource block is applicable.

You can find the complete Terraform stack up to this point (which is also the final solution) in the final directory of the GitHub repository that accompanies this blog post.

With the updated configuration, the Terraform configuration can now be applied successfully! Please make sure that you verify the security groups and their rules in the AWS Management Console.

Maintaining the solution

In terms of maintenance, whenever you need to refer to new values that are derived from new resources such as:

  • Subnet CIDR blocks (for example, aws_subnet.private.cidr_block)

  • Managed prefix lists (for example, aws_ec2_managed_prefix_list.office_vpn.id)

  • Security groups (for example, aws_security_group.msk.id)

You will need to do the following, while paying attention to use unique IDs:

  1. Define new template replacement variables (for example, ${subnet_private_cidr}, ${office_vpn_prefix_list_id}, and ${msk_sg_id} per above).

  2. Update the list of variables in the tempatefile argument of the sg_rules_csv local value.

Summary

Congratulations, you have just built a CSV-based solutions for managing AWS security groups using Terraform! By employing a sensible design and naming scheme, as well as thoughtfully using Terraform functions and constructs, all rule settings can be maintained in a single CSV file. The same design can be extended to other cloud provider's corresponding concepts, such as network security groups in Azure.

If you find this solution helpful, please check out my other blog posts or let me know what you'd like to learn more about!