AWS VPC Security Groups & Network Policies
Design and manage AWS VPC security groups, NACLs, and network segmentation policies for secure and compliant cloud networking.
π Prerequisites
- Expert knowledge of VPC concepts (subnets, route tables, internet gateways, NAT gateways).
- Advanced proficiency with IAM and networking concepts (CIDR notation, ports, protocols).
- Strong experience with Terraform for deploying network infrastructure.
- Familiarity with AWS Organizations and AWS Firewall Manager.
π‘ Network Security: A Layered Defense
Effective network security in AWS is not about a single firewall; it's about a layered defense-in-depth strategy. **Security Groups** act as stateful firewalls at the instance level, while **Network ACLs (NACLs)** provide a stateless guardrail at the subnet level. Mastering the interplay between these two, along with centralized management and endpoint policies, is the key to building a secure and resilient network architecture.
What You'll Learn
π·οΈ Topics Covered
Security Groups vs. NACLs: A Deep Dive
While both control traffic, their behavior and position in the network stack are fundamentally different. Understanding this is critical for proper design and troubleshooting.
π‘οΈ Security Groups (Stateful)
- Scope: Acts at the ENI/instance level.
- Rules: Supports "allow" rules only. Everything else is implicitly denied.
- State: Stateful. If you allow inbound traffic, the return outbound traffic is automatically allowed.
- Use Case: Primary firewall for your instances, allowing application-specific traffic.
π§± Network ACLs (Stateless)
- Scope: Acts at the subnet level.
- Rules: Supports both "allow" and "deny" rules, evaluated in order.
- State: Stateless. You must explicitly allow both inbound and outbound return traffic.
- Use Case: A broad, blunt guardrail to block unwanted traffic from ever reaching your subnet.
Advanced Security Group Design Patterns
A key feature of Security Groups is their ability to reference other Security Groups as a source or destination. This allows you to create dynamic, tightly-scoped rules for multi-tier applications without hardcoding IP addresses.
Pattern: Three-Tier Web Application
In this classic architecture, we create three distinct security groupsβone for each layerβthat only allow traffic from the preceding layer.
ποΈ HCL: Multi-Tier Security Groups in Terraform
This Terraform code defines a secure, three-tier network model where only the Web layer is accessible from the internet, and the Database layer is only accessible from the App layer.
# 1. Web Tier Security Group (public facing)
resource "aws_security_group" "web_sg" {
name = "web-tier-sg"
vpc_id = aws_vpc.main.id
ingress {
description = "Allow HTTP/S from anywhere"
from_port = 80
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
# 2. App Tier Security Group (private)
resource "aws_security_group" "app_sg" {
name = "app-tier-sg"
vpc_id = aws_vpc.main.id
ingress {
description = "Allow traffic from the Web Tier"
from_port = 8080 # Application port
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.web_sg.id] # <-- Reference to Web SG
}
}
# 3. DB Tier Security Group (private)
resource "aws_security_group" "db_sg" {
name = "db-tier-sg"
vpc_id = aws_vpc.main.id
ingress {
description = "Allow traffic from the App Tier"
from_port = 5432 # PostgreSQL port
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app_sg.id] # <-- Reference to App SG
}
} Implementing a Defense-in-Depth NACL Strategy
NACLs are your first line of defense. While you should generally keep them open (as Security Groups provide finer control), they are extremely effective for blocking known bad actors or denying entire protocols at the subnet boundary.
ποΈ HCL: NACL for Blocking Bad IPs & Unwanted Protocols
This NACL explicitly denies traffic from a known malicious IP address and blocks all UDP traffic, regardless of Security Group rules.
resource "aws_network_acl" "main" {
vpc_id = aws_vpc.main.id
subnet_ids = [aws_subnet.private.id]
# INBOUND RULES (evaluated in order)
ingress {
rule_number = 100
protocol = "tcp"
rule_action = "deny"
cidr_block = "203.0.113.5/32" # Known malicious IP
from_port = 0
to_port = 0
}
ingress {
rule_number = 200
protocol = "udp"
rule_action = "deny"
cidr_block = "0.0.0.0/0"
from_port = 0
to_port = 0
}
# Remember to add allow rules for necessary traffic after the denies
# OUTBOUND RULES (must allow return traffic)
egress {
rule_number = 100
protocol = "tcp"
rule_action = "allow"
cidr_block = "0.0.0.0/0"
from_port = 1024
to_port = 65535
}
} Centralized Management with AWS Firewall Manager
Manually ensuring every new account and VPC has the correct baseline security groups is not scalable. **AWS Firewall Manager** is a security management service that allows you to centrally configure and deploy firewall rules across multiple accounts and resources in your AWS Organization.
π‘οΈ Firewall Manager Policy Types
You can create a **Firewall Manager security group policy** that defines a set of baseline rules (e.g., "all security groups must block inbound SSH from the internet"). Firewall Manager will then automatically audit all security groups in your organization and report any that are non-compliant.
Troubleshooting Common Network Connectivity Issues
Network access issues are common and can be complex to debug. Here's a systematic approach to solving them.
β±οΈ Connection Timeout Error
- Symptom: An application on Instance A cannot connect to a database on Instance B and the request times out.
- Root Cause Checklist:
- Security Group (Outbound): Does the Security Group for Instance A have an outbound rule allowing traffic to the database port (e.g., 5432) and destination (the IP or Security Group of Instance B)?
- Security Group (Inbound): Does the Security Group for Instance B have an inbound rule allowing traffic on the database port from the source (the IP or Security Group of Instance A)?
- Network ACLs: Check the NACLs on both the source and destination subnets. Is there an inbound rule on the destination subnet's NACL that allows the traffic? Is there an outbound rule on the source subnet's NACL that allows it? *Crucially, is there an outbound rule on the destination NACL and an inbound rule on the source NACL to allow the return traffic?*
π« "Connection Refused" Error
- Symptom: An application gets an immediate "connection refused" response, not a timeout.
- Cause: This typically means the network path is open (Security Groups and NACLs are likely correct), but there is no process listening on the destination port on the target instance.
- Solution: Verify that the application (e.g., the database server) is running on the target instance and is configured to listen on the correct port and network interface.
π Expert-Level VPC Security Best Practices
- Reference Security Groups, Not IPs: For internal traffic between tiers, always reference other security groups as the source. This is more secure and dynamically adapts as instances are added or removed.
- Keep NACLs Simple: Use NACLs as a broad shield. For most subnets, the default "ALLOW ALL" is fine. Only use custom NACLs to block specific, known-bad traffic at the edge.
- Use Firewall Manager for Baselines: Define your mandatory organization-wide rules (e.g., no public SSH) in a Firewall Manager policy and apply it to all accounts to ensure a consistent baseline.
- Automate Everything with IaC: All VPCs, subnets, route tables, security groups, and NACLs should be defined as code in Terraform for auditability and repeatability.