AWS S3 Security Policies & Data Governance
Secure AWS S3 with bucket policies, access controls, encryption, lifecycle management, and automated data governance for enterprise environments.
📋 Prerequisites
- Expert knowledge of S3 features (Bucket Policies, ACLs, CORS, Lifecycle Rules).
- Advanced proficiency with IAM policies, especially the interaction with resource-based policies.
- Strong experience with Terraform for deploying S3 buckets and associated governance controls.
- Skills in Python for writing Lambda functions for automated data governance.
💡 S3: More Than Storage, It's Your Data Foundation
Amazon S3 is the foundational data layer for countless applications. Securing it goes far beyond simply blocking public access. A comprehensive S3 governance strategy requires a multi-layered approach: a strict, preventive bucket policy as the primary guardrail, continuous detective controls to find misconfigurations, and automated workflows to manage the data lifecycle and remediate non-compliant objects.
What You'll Learn
🏷️ Topics Covered
Preventive Controls: The Ultimate S3 Bucket Policy
A bucket policy is the primary resource-based control for S3. A robust policy for a private, sensitive data bucket should enforce multiple security layers simultaneously.
📜 JSON: Multi-Layered S3 Bucket Policy
This expert-level bucket policy combines several best practices:
- Enforces encryption-in-transit (TLS/HTTPS).
- Denies any object upload that is not encrypted at rest with AES256 or KMS.
- Restricts access to principals from within your AWS Organization.
- Allows access only from specific VPC Endpoints, preventing data access from outside your network.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EnforceTLSTransport",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::my-secure-data-lake/*",
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
},
{
"Sid": "EnforceServerSideEncryption",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-secure-data-lake/*",
"Condition": {
"Null": {
"s3:x-amz-server-side-encryption": "true"
}
}
},
{
"Sid": "AllowAccessFromWithinOrganization",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::my-secure-data-lake/*",
"Condition": {
"StringEquals": {
"aws:PrincipalOrgID": "o-xxxxxxxxxxx"
}
}
},
{
"Sid": "AllowAccessFromVpcEndpointOnly",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::my-secure-data-lake/*",
"Condition": {
"StringNotEquals": {
"aws:sourceVpce": "vpce-0123456789abcdef0"
}
}
}
]
} Detective Controls for Data Compliance
While bucket policies are powerful, you need detective controls to verify compliance over time. AWS Config can monitor S3 bucket settings, but for object-level checks (like verifying encryption on existing objects), a custom rule is needed.
🐍 Python: Custom Config Rule to Find Unencrypted Objects
This Lambda function code can be used in a custom AWS Config rule. It is triggered by object creation events and checks if the object was uploaded with server-side encryption. If not, it marks the object as `NON_COMPLIANT`.
import boto3
import json
s3 = boto3.client('s3')
config = boto3.client('config')
def lambda_handler(event, context):
invoking_event = json.loads(event['invokingEvent'])
config_item = invoking_event['configurationItem']
bucket_name = config_item['supplementaryConfiguration']['BucketName']
object_key = config_item['configuration']['Key']
compliance_status = 'NON_COMPLIANT'
annotation = 'Object is not encrypted with server-side encryption.'
try:
head = s3.head_object(Bucket=bucket_name, Key=object_key)
if 'ServerSideEncryption' in head:
compliance_status = 'COMPLIANT'
annotation = 'Object is correctly encrypted.'
except Exception as e:
print(f"Could not head object {object_key} in {bucket_name}. Error: {e}")
# Mark as non-compliant if we can't verify
compliance_status = 'NON_COMPLIANT'
config.put_evaluations(
Evaluations=[{
'ComplianceResourceType': 'AWS::S3::Object',
'ComplianceResourceId': f'{bucket_name}/{object_key}',
'ComplianceType': compliance_status,
'Annotation': annotation,
'OrderingTimestamp': config_item['configurationItemCaptureTime']
}],
ResultToken=event['resultToken']
)
Automated Remediation for Non-Compliant Data
Detecting a non-compliant object is good; automatically remediating it is better. You can use S3 Event Notifications to trigger a Lambda function that takes action on objects that don't meet your governance standards.
Automation Pattern: Tag-on-Upload Enforcement
This workflow enforces a policy that all uploaded data must have a `DataClassification` tag. If an object is uploaded without this tag, a Lambda function is triggered to automatically apply a default "quarantine" tag, flagging it for review.
🐍 Python: Auto-Tagging Remediation Lambda
import boto3
import urllib.parse
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
response = s3.get_object_tagging(Bucket=bucket, Key=key)
tags = {tag['Key']: tag['Value'] for tag in response.get('TagSet', [])}
if 'DataClassification' in tags:
print(f"Object {key} is correctly tagged. No action needed.")
return
# If we reach here, the tag is missing.
print(f"Object {key} is missing DataClassification tag. Applying quarantine tag.")
# Add a new tag to the existing set, or create a new set
tags['SecurityStatus'] = 'Quarantine-MissingClassification'
new_tag_set = {'TagSet': [{'Key': k, 'Value': v} for k, v in tags.items()]}
s3.put_object_tagging(
Bucket=bucket,
Key=key,
Tagging=new_tag_set
)
print("Successfully applied quarantine tag.")
except Exception as e:
print(f"Error processing object {key} in bucket {bucket}. Error: {e}")
raise e
Data Lifecycle & Governance Automation
Managing data retention and storage costs is a critical governance task. S3 Lifecycle Policies automate the process of transitioning objects to cheaper storage tiers and eventually deleting them.
🏗️ HCL: Advanced Lifecycle Policy with Terraform
This policy implements a sophisticated lifecycle for a data lake bucket, moving data through different storage classes based on access patterns and deleting old, incomplete uploads.
resource "aws_s3_bucket_lifecycle_configuration" "data_lake_lifecycle" {
bucket = aws_s3_bucket.data_lake.id
rule {
id = "log-files-transition-and-expire"
filter {
prefix = "logs/"
}
status = "Enabled"
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 90
storage_class = "GLACIER_IR" # Instant Retrieval
}
expiration {
days = 365
}
}
rule {
id = "cleanup-incomplete-multipart-uploads"
status = "Enabled"
filter {
prefix = "/"
}
abort_incomplete_multipart_upload {
days_after_initiation = 7
}
}
} Troubleshooting S3 Access & Policy Issues
S3 security is a complex interplay of IAM policies, bucket policies, ACLs, and KMS permissions. Debugging access issues requires a systematic approach.
❌ `Access Denied` Despite Allow Policy
- Symptom: An IAM role has `s3:GetObject` allowed in its policy, but still gets Access Denied when trying to download an object.
- Root Cause Checklist:
- **Explicit Deny:** Is there an explicit `Deny` statement in the IAM policy, the Bucket Policy, or an SCP? A deny always wins.
- **Bucket Policy:** Does the bucket policy have a restrictive `Allow` that *doesn't* include the role's ARN? If a bucket policy exists, the principal must be allowed by it (unless the principal is in the same account and there's no explicit deny).
- **KMS Encryption:** Is the object encrypted with a customer-managed KMS key? If so, the role needs `kms:Decrypt` permission in **both** its IAM policy and the KMS key policy. This is the most common cause.
- **VPC Endpoint:** Is the request coming from a VPC? Does the VPC Endpoint Policy allow the role to access the bucket?
🔐 KMS Decryption Fails for S3 Object
- Symptom: `GetObject` fails with a KMS access denied error.
- Cause: The IAM role has `s3:GetObject`, but lacks `kms:Decrypt` permission on the specific key used to encrypt the object.
- Solution: Add a statement to the IAM role's policy allowing `kms:Decrypt` on the key's ARN. Also, verify the KMS key policy itself grants this permission to the role or to the account root, allowing IAM policies to take effect.
🔑 Expert-Level S3 Governance Best Practices
- Always Enable Block Public Access: This should be enabled at the account level and on every bucket containing sensitive data. There are very few reasons to disable this.
- Use Bucket Policies as Your Primary Control: Prefer strict, resource-based bucket policies over broad, identity-based IAM policies for data access control.
- Enforce Encryption Everywhere: Use a bucket policy to deny any unencrypted uploads (`s3:x-amz-server-side-encryption` condition) and enforce TLS (`aws:SecureTransport` condition).
- Automate Data Lifecycle: Don't rely on manual cleanup. Implement comprehensive S3 Lifecycle Policies to manage data retention and control storage costs automatically.