expert 50 min read aws Updated: 2025-07-23

Detective Controls (AWS Config)

Implement automated compliance monitoring with AWS Config, custom Lambda rules, and automated remediation actions for continuous cloud governance.

πŸ“‹ Prerequisites

  • Expert knowledge of AWS Config, IAM roles, and S3.
  • Advanced proficiency in Python for writing Lambda functions.
  • Strong experience with Infrastructure as Code (Terraform) for deploying governance controls.
  • Familiarity with AWS Systems Manager (SSM) Automation and AWS Organizations.

πŸ’‘ From Snapshot Audits to Continuous Compliance

Detective controls are essential for any governance framework, answering the question, "Is my environment compliant *right now*?" AWS Config is the core service for this, providing a detailed inventory of your AWS resources and continuously evaluating their configurations against desired policies. At an expert level, Config becomes a powerful engine for automated compliance validation and self-healing remediation at scale.

🏷️ Topics Covered

aws config custom rulesaws compliance as codeautomated remediation aws configaws config aggregator setuplambda for aws configcontinuous compliance aws

Multi-Account, Multi-Region Aggregation Architecture

To achieve a complete compliance picture, you must aggregate AWS Config data from all member accounts and regions into a single, central audit account. This is accomplished using a **Configuration Aggregator**.

πŸ—οΈ HCL: Deploying a Configuration Aggregator with Terraform

This Terraform code, run in your central audit account, creates an aggregator that pulls in data from your entire AWS Organization.

# Run this configuration in the designated central audit/security account.

# 1. The IAM Role the aggregator will assume to collect data.
resource "aws_iam_role" "config_aggregator_role" {
  name = "ConfigAggregatorRole"
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Action = "sts:AssumeRole",
      Effect = "Allow",
      Principal = { Service = "config.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "config_aggregator_policy" {
  role       = aws_iam_role.config_aggregator_role.name
  # This managed policy grants necessary read permissions across the organization.
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSConfigMultiAccountSetupRole"
}

# 2. The Configuration Aggregator itself.
resource "aws_config_configuration_aggregator" "organization_aggregator" {
  name = "OrganizationAggregator"

  organization_aggregation_source {
    all_regions = true
    role_arn    = aws_iam_role.config_aggregator_role.arn
  }

  # Note: For this to work, the "ConfigAggregatorRole" must also be created
  # in every member account with a trust policy allowing this audit account to assume it.
  # This is typically automated with CloudFormation StackSets or Terraform providers.
}

Advanced Custom Rules with AWS Lambda

While AWS provides many managed rules, you'll often need custom logic for organization-specific policies. Custom rules are powered by Lambda functions that contain your unique evaluation logic.

Example: Check for Approved AMIs

This expert rule checks if an EC2 instance is using an AMI from an approved list of golden, hardened AMIs. It marks any instance using a public or unapproved AMI as `NON_COMPLIANT`.

🐍 Python: Custom Lambda Rule for Approved AMIs

import boto3
import json

config = boto3.client('config')

# This list should be managed centrally (e.g., in SSM Parameter Store)
APPROVED_AMI_IDS = ["ami-0123456789abcdef0", "ami-fedcba9876543210"]

def evaluate_compliance(config_item):
    if config_item['resourceType'] != 'AWS::EC2::Instance':
        return 'NOT_APPLICABLE'

    ami_id = config_item['configuration']['imageId']
    
    if ami_id in APPROVED_AMI_IDS:
        return {
            "compliance_type": "COMPLIANT",
            "annotation": f"Instance is using an approved AMI ({ami_id})."
        }
    else:
        return {
            "compliance_type": "NON_COMPLIANT",
            "annotation": f"Instance is using unapproved AMI ({ami_id}). Only approved AMIs are allowed."
        }

def lambda_handler(event, context):
    invoking_event = json.loads(event['invokingEvent'])
    config_item = invoking_event['configurationItem']
    
    result = evaluate_compliance(config_item)
    
    evaluation = {
        'ComplianceResourceType': config_item['resourceType'],
        'ComplianceResourceId': config_item['resourceId'],
        'ComplianceType': result['compliance_type'],
        'Annotation': result['annotation'],
        'OrderingTimestamp': config_item['configurationItemCaptureTime']
    }

    config.put_evaluations(
        Evaluations=[evaluation],
        ResultToken=event['resultToken']
    )

Automated Remediation with SSM Automation

You can configure AWS Config to trigger an SSM Automation document as a remediation action for a non-compliant rule. This enables a self-healing infrastructure.

Example: Terminate EC2 Instance with Unapproved AMI

This SSM Automation document will be triggered when the custom rule above finds a non-compliant EC2 instance. It safely terminates the instance to prevent unhardened images from running.

βš™οΈ YAML: SSM Automation Document for Remediation

description: >-
  Terminates an EC2 instance found to be non-compliant by an AWS Config rule.
schemaVersion: '0.3'
assumeRole: '{{ AutomationAssumeRole }}'
parameters:
  AutomationAssumeRole:
    type: String
    description: (Required) ARN of the role for Automation to perform actions.
  InstanceId:
    type: String
    description: (Required) The ID of the non-compliant EC2 instance.
mainSteps:
  - name: VerifyInstanceState
    action: 'aws:assertAwsResourceProperty'
    inputs:
      Service: ec2
      Api: DescribeInstances
      InstanceIds: [ '{{ InstanceId }}' ]
      PropertySelector: '$.Reservations[0].Instances[0].State.Name'
      DesiredValues: [ 'running', 'stopped' ]
    onFailure: Abort
  
  - name: TerminateInstance
    action: 'aws:executeAwsApi'
    inputs:
      Service: ec2
      Api: TerminateInstances
      InstanceIds:
        - '{{ InstanceId }}'
    description: Terminates the non-compliant EC2 instance.
    
  - name: VerifyInstanceTerminated
    action: 'aws:waitForAwsResourceProperty'
    timeoutSeconds: 300
    inputs:
      Service: ec2
      Api: DescribeInstances
      InstanceIds: [ '{{ InstanceId }}' ]
      PropertySelector: '$.Reservations[0].Instances[0].State.Name'
      DesiredValues: [ 'terminated' ]
    onFailure: Abort

Deploying Compliance-as-Code with Terraform

To manage compliance at scale, you must define your rules, Lambda functions, and remediation actions as code. This ensures they are versioned, auditable, and consistently deployed.

πŸ—οΈ HCL: Deploying the Custom Rule and Remediation with Terraform

# 1. The Custom AWS Config Rule
resource "aws_config_custom_rule" "approved_ami_rule" {
  name = "ApprovedAmiRule"
  
  lambda_function_arn = aws_lambda_function.approved_ami_check_lambda.arn
  trigger_types       = ["ConfigurationItemChangeNotification"]

  source {
    owner             = "CUSTOM_LAMBDA"
    source_identifier = aws_lambda_function.approved_ami_check_lambda.arn
  }

  scope {
    compliance_resource_types = ["AWS::EC2::Instance"]
  }
}

# 2. The SSM Automation Document for Remediation
resource "aws_ssm_document" "terminate_instance_remediation" {
  name          = "TerminateNonCompliantInstance"
  document_type = "Automation"
  content       = file("ssm_documents/terminate_instance.yml")
}

# 3. The Remediation Configuration linking the rule to the SSM document
resource "aws_config_remediation_configuration" "ami_rule_remediation" {
  config_rule_name = aws_config_custom_rule.approved_ami_rule.name
  
  target_id        = aws_ssm_document.terminate_instance_remediation.name
  target_type      = "SSM_DOCUMENT"
  
  automatic = true
  maximum_automatic_attempts = 2
  retry_attempt_seconds = 120

  parameter {
    name = "InstanceId"
    resource_value {
      value = "RESOURCE_ID"
    }
  }
  
  parameter {
    name = "AutomationAssumeRole"
    static_value {
      values = [aws_iam_role.remediation_role.arn]
    }
  }
}

Troubleshooting Advanced Config Issues

As your Config setup grows, you may encounter complex issues. Here’s how to debug them.

❌ Custom Rule Lambda is Failing or Timing Out

  • Symptom: Your custom rule shows "No results" or evaluations are failing. CloudWatch logs for the Lambda show timeout errors or exceptions.
  • Root Cause:
    1. **IAM Permissions:** The Lambda's execution role may lack permissions to describe the resource it's evaluating (e.g., `ec2:DescribeImages`).
    2. **Throttling:** If the rule is triggered by many resources at once, it might be getting throttled by downstream service APIs. Implement error handling with backoff and retry logic.
    3. **Timeouts:** By default, Lambdas time out quickly. Complex evaluations or those in a VPC may need a longer timeout (e.g., 30-60 seconds).

βš™οΈ Remediation Action Fails to Execute

  • Symptom: A resource is marked `NON_COMPLIANT`, but the remediation action shows "Failed".
  • Cause: The IAM role that the remediation action assumes (`AutomationAssumeRole`) lacks the permissions to perform the required action (e.g., `ec2:TerminateInstances`). This role is separate from the Config service role.
  • **Solution:** Go to the SSM Automation document in the console and view its execution history. The output of the failed step will contain a detailed `AccessDenied` message explaining exactly which permission is missing. Add this permission to the remediation role.

πŸ”‘ Expert-Level AWS Config Best Practices

  • Centralize via Aggregators: Always use a Configuration Aggregator in a dedicated audit account for an organization-wide view of compliance.
  • **Codify Everything: Manage all Config rules, remediation actions, and supporting resources (Lambdas, IAM roles) in an IaC framework like Terraform.
  • **Remediate with Caution: Before enabling *automatic* remediation, test your SSM documents thoroughly. For destructive actions, consider a workflow that requires manual approval.
  • **Use Conformance Packs: For standard frameworks like PCI-DSS or HIPAA, use AWS Config Conformance Packs to deploy a curated collection of rules and remediations as a single unit.

You've Mastered Detective Controls!

Now that you can detect and remediate misconfigurations, your next step is to focus on real-time threat detection and incident response.