Chapter 58: Compliance Automation & Continuous Assurance¶
Chapter Goal
Transform compliance from a painful quarterly fire drill into a continuous, automated, evidence-rich program. By the end of this chapter, you will have the patterns, tools, and code to build a compliance-as-code pipeline that auditors love and engineers stop avoiding.
Prerequisites
- Chapter 13: Security Governance, Privacy & Risk -- governance foundations
- Chapter 29: Vulnerability Management -- vuln evidence feeds compliance
- Chapter 35: DevSecOps Pipeline -- CI/CD gates
- Chapter 56: Privacy Engineering -- privacy-by-design and DPIA
58.1 The Compliance Automation Problem¶
Ask any security engineer what they hate most about their job and "compliance" will rank in the top three. Ask any CISO what keeps them awake at night and "the auditor is arriving next Tuesday" will get a laugh of recognition. Ask any auditor what they see when they visit clients and they will tell you the same story: frantic screenshot collection, stale spreadsheets, missing evidence, controls that exist on paper but not in reality, and a team that will not sleep until the engagement ends.
This chapter is a rejection of that entire model.
Compliance done right is not an annual event. It is not a team of humans manually screenshotting AWS consoles. It is not a SharePoint folder full of Word documents no one has read in two years. It is a living, breathing, automated system that continuously proves your controls are working, collects evidence without human effort, blocks non-compliant changes before they ship, and produces audit-ready reports on demand.
This is compliance automation and continuous assurance. It is the difference between passing an audit and being audit-ready every single day.
58.1.1 Manual vs Automated Compliance¶
Manual compliance is the default state in most organizations. The pattern looks something like this: an auditor sends a sample request for, say, 25 user access reviews from the last quarter. A poor GRC analyst opens tickets, chases managers, takes screenshots of Okta, copies them into a document, uploads that document to a SharePoint folder, and emails a link to the auditor. This cycle repeats for every control, every audit, every year.
Automated compliance changes every step. The evidence is collected continuously by a system that queries the Okta API nightly. The review metadata is stored in an immutable evidence locker. The auditor has read-only access to a portal that shows every access review performed in the last 12 months, with timestamps, approver identity, and cryptographic hash. There is no chase. There is no screenshot. There is no human in the critical path.
| Dimension | Manual Compliance | Automated Compliance |
|---|---|---|
| Evidence collection | Human screenshots, manual queries | API-driven, continuous, scheduled |
| Evidence storage | SharePoint, email, local drives | Versioned evidence locker with hash chain |
| Control testing | Quarterly or annual | Every commit, every deploy, every hour |
| Drift detection | Discovered during audit | Detected within minutes of occurrence |
| Remediation | Manual ticket, chased by GRC | Auto-remediation or auto-ticket |
| Auditor experience | Request-response cycle, delays | Self-service portal, read-only access |
| Cost per audit | 400-1000 hours of GRC + engineer time | 40-80 hours of audit coordination |
| Confidence in controls | "We think we are compliant" | "We know we are compliant, here is the proof" |
The Compliance Theater Trap
Many organizations automate the reporting of compliance without automating the underlying control. This is compliance theater. A beautiful dashboard that reports "100% of servers patched" based on a CMDB that has not been updated since 2023 is worse than no dashboard at all. Automation must start from the source of truth (the actual infrastructure) not from the compliance spreadsheet.
58.1.2 Continuous Compliance vs Point-in-Time Audits¶
A traditional SOC 2 Type II audit covers a window, typically 6 or 12 months. The auditor samples evidence across that window and forms an opinion about whether your controls operated effectively during the period. The opinion is binary (clean opinion or qualified opinion) and it is issued months after the period ends. By the time the report is in the CEO's hands, half your environment has changed.
Continuous compliance inverts this. Instead of sampling evidence at the end, the system collects evidence continuously throughout the period. Instead of a single binary opinion, you have a time series of control states. Instead of "we passed SOC 2 Type II" you can say "here is a second-by-second history of every control, and here are the 14 minutes in March when MFA was briefly disabled on a dev account due to a misconfigured Terraform apply, and here is the automated remediation that restored it."
Continuous compliance is not a replacement for external audits. Auditors still need to issue the opinion. But it transforms the audit from an evidence collection exercise into a validation exercise. The auditor spot-checks the automated pipeline rather than sampling the underlying controls.
58.1.3 Integration with GRC Platforms¶
Governance, Risk, and Compliance (GRC) platforms have traditionally been document repositories with workflow engines bolted on. Modern GRC platforms are evolving into control orchestration layers that pull evidence from technical systems via API and push it to auditor portals.
| GRC Platform | Strengths | Weaknesses | Best For |
|---|---|---|---|
| ServiceNow GRC | Deep ITSM integration, workflow depth | Heavy, expensive, slow to configure | Large enterprises with existing ServiceNow |
| Archer (RSA) | Mature risk modeling, extensive frameworks | Legacy UX, licensing complexity | Regulated industries, financial services |
| Drata | SOC 2 / ISO focused, strong integrations | Narrow framework coverage, SMB focus | Startups and SMBs seeking SOC 2 |
| Vanta | Fast time-to-value, good for startups | Less flexibility for custom controls | Pre-IPO tech companies |
| AuditBoard | Internal audit focus, SOX depth | Less cloud-native integration depth | Public companies with SOX obligations |
| Hyperproof | Multi-framework harmonization | Younger platform, smaller ecosystem | Mid-market seeking multiple frameworks |
| LogicGate | Flexible risk-based workflows | Requires configuration expertise | Custom compliance program builders |
| Open-source (Eramba, Comp-Track) | Free, customizable | Requires internal engineering | Cost-constrained orgs with engineering capacity |
GRC Platform Selection Heuristic
Choose your GRC platform based on your integration story, not your framework story. Every platform supports SOC 2. Not every platform has a native integration with your cloud provider, your SIEM, your IAM, and your ticketing system. The integrations are the automation. Without them, the GRC platform is just a fancier SharePoint.
58.2 Policy-as-Code Fundamentals¶
Policy-as-code is the practice of expressing policy rules in machine-readable, version-controlled, testable code rather than in prose documents. It is to compliance what infrastructure-as-code is to operations: the difference between artisanal hand-crafted exceptions and reproducible automated enforcement.
58.2.1 Why Policy-as-Code?¶
Consider a simple policy: "All S3 buckets must have encryption enabled." In a prose world, this lives in a PDF called "Cloud Security Policy v3.2" that 40 people signed off on in 2024 and no one has read since. When a developer creates a new bucket, they may or may not remember the policy. Enforcement happens months later, if at all, during a compliance review.
In a policy-as-code world, the same rule exists as a Rego policy evaluated at deploy time. Every single bucket creation passes through the policy engine. If encryption is not enabled, the Terraform plan fails. The developer sees the failure in CI within 60 seconds, fixes the code, and the bucket ships encrypted. The policy cannot be forgotten because the policy is the gate.
Core benefits of policy-as-code:
- Deterministic enforcement -- the same input always produces the same decision
- Version control -- policies live in Git, with history, review, and rollback
- Testability -- you can unit test your policies with known inputs
- Reusability -- policies compose and share across teams
- Auditability -- every decision can be logged with the exact policy version that made it
- Shift-left -- enforcement moves from production to development
- Human-readable -- when Rego is written well, it reads like English
58.2.2 Open Policy Agent (OPA) and Rego¶
OPA is the CNCF-graduated general-purpose policy engine that has become the de-facto standard for cloud-native policy-as-code. Rego is its policy language, a declarative datalog derivative optimized for structured data.
Rego Policy 1: Enforce S3 Encryption
# package: compliance.s3.encryption
# Policy: All S3 buckets in Terraform plans must have server-side
# encryption enabled with KMS or AES256.
package compliance.s3.encryption
import future.keywords.if
import future.keywords.in
# Default deny -- policies are opt-in allow
default allow := false
# Collect all S3 bucket resources from the Terraform plan
s3_buckets[resource] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.actions[_] == "create"
}
# Check each bucket has encryption
bucket_has_encryption(bucket) if {
enc := input.resource_changes[_]
enc.type == "aws_s3_bucket_server_side_encryption_configuration"
enc.change.after.bucket == bucket.change.after.id
enc.change.after.rule[_].apply_server_side_encryption_by_default.sse_algorithm
in {"aws:kms", "AES256"}
}
# Deny bucket creation if encryption missing
deny[msg] {
bucket := s3_buckets[_]
not bucket_has_encryption(bucket)
msg := sprintf(
"S3 bucket '%s' must have server-side encryption enabled (KMS or AES256)",
[bucket.change.after.id]
)
}
# Allow only if no deny messages
allow if {
count(deny) == 0
}
Rego Policy 2: Kubernetes Pod Security
# package: kubernetes.admission.podsecurity
# Policy: Pods must not run as root, must have resource limits,
# must not use host network, and must pull only from approved registries.
package kubernetes.admission.podsecurity
import future.keywords.if
import future.keywords.in
import future.keywords.contains
approved_registries := {
"registry.example.com",
"ghcr.io/example-org",
"public.ecr.aws/example",
}
deny contains msg if {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
container.securityContext.runAsUser == 0
msg := sprintf("Container '%s' must not run as root (UID 0)", [container.name])
}
deny contains msg if {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not container.resources.limits.memory
msg := sprintf("Container '%s' missing memory limit", [container.name])
}
deny contains msg if {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not container.resources.limits.cpu
msg := sprintf("Container '%s' missing CPU limit", [container.name])
}
deny contains msg if {
input.request.kind.kind == "Pod"
input.request.object.spec.hostNetwork == true
msg := "Host network usage is forbidden"
}
deny contains msg if {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
image := container.image
not image_from_approved_registry(image)
msg := sprintf(
"Container '%s' uses image '%s' from unapproved registry",
[container.name, image]
)
}
image_from_approved_registry(image) if {
registry := approved_registries[_]
startswith(image, registry)
}
Rego Policy 3: IAM Least Privilege
# package: compliance.iam.leastprivilege
# Policy: IAM policies must not grant wildcard actions on sensitive services
# or wildcard resources for sensitive actions.
package compliance.iam.leastprivilege
import future.keywords.if
import future.keywords.in
sensitive_services := {"iam", "kms", "secretsmanager", "organizations", "sts"}
dangerous_actions := {
"iam:PassRole",
"iam:CreateAccessKey",
"iam:AttachUserPolicy",
"kms:Decrypt",
"secretsmanager:GetSecretValue",
}
deny[msg] {
statement := input.PolicyDocument.Statement[_]
statement.Effect == "Allow"
action := statement.Action[_]
action == "*"
resource := statement.Resource[_]
resource == "*"
msg := "Policy grants Action:* on Resource:* -- this is admin equivalent and forbidden"
}
deny[msg] {
statement := input.PolicyDocument.Statement[_]
statement.Effect == "Allow"
action := statement.Action[_]
parts := split(action, ":")
service := parts[0]
service in sensitive_services
parts[1] == "*"
msg := sprintf(
"Policy grants %s:* -- sensitive service requires explicit action list",
[service]
)
}
deny[msg] {
statement := input.PolicyDocument.Statement[_]
statement.Effect == "Allow"
action := statement.Action[_]
action in dangerous_actions
resource := statement.Resource[_]
resource == "*"
msg := sprintf(
"Dangerous action '%s' must be scoped to specific resource ARN, not '*'",
[action]
)
}
58.2.3 AWS Config Rules¶
AWS Config provides a managed policy engine for AWS resources. Config rules can be AWS-managed (pre-built) or custom (Lambda-based or Guard-based). Custom rules give you the flexibility to encode organization-specific policy.
# Lambda-backed AWS Config rule: Detect EC2 instances without required tags
import boto3
import json
REQUIRED_TAGS = {"Environment", "Owner", "CostCenter", "DataClassification"}
def lambda_handler(event, context):
invoking_event = json.loads(event["invokingEvent"])
config_item = invoking_event["configurationItem"]
if config_item["resourceType"] != "AWS::EC2::Instance":
return put_evaluation(event, config_item, "NOT_APPLICABLE",
"Not an EC2 instance")
tags = {t["key"]: t["value"] for t in config_item["tags"]}
missing = REQUIRED_TAGS - set(tags.keys())
if missing:
return put_evaluation(
event, config_item, "NON_COMPLIANT",
f"Missing required tags: {sorted(missing)}"
)
if tags.get("DataClassification") not in {"Public", "Internal", "Confidential", "Restricted"}:
return put_evaluation(
event, config_item, "NON_COMPLIANT",
f"DataClassification '{tags.get('DataClassification')}' is invalid"
)
return put_evaluation(event, config_item, "COMPLIANT",
"All required tags present with valid values")
def put_evaluation(event, config_item, compliance_type, annotation):
client = boto3.client("config")
client.put_evaluations(
Evaluations=[{
"ComplianceResourceType": config_item["resourceType"],
"ComplianceResourceId": config_item["resourceId"],
"ComplianceType": compliance_type,
"Annotation": annotation[:256],
"OrderingTimestamp": config_item["configurationItemCaptureTime"],
}],
ResultToken=event["resultToken"],
)
return {"compliance": compliance_type, "annotation": annotation}
58.2.4 Azure Policy¶
Azure Policy provides native policy enforcement for Azure resources. Policies are defined in JSON and assigned at management group, subscription, or resource group scope.
{
"properties": {
"displayName": "Storage accounts must require HTTPS and TLS 1.2+",
"description": "Enforces secure transfer and minimum TLS 1.2 on all storage accounts.",
"mode": "All",
"metadata": {
"category": "Storage",
"version": "1.0.0"
},
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Storage/storageAccounts"
},
{
"anyOf": [
{
"field": "Microsoft.Storage/storageAccounts/supportsHttpsTrafficOnly",
"notEquals": "true"
},
{
"field": "Microsoft.Storage/storageAccounts/minimumTlsVersion",
"notIn": ["TLS1_2", "TLS1_3"]
}
]
}
]
},
"then": {
"effect": "deny"
}
}
}
}
58.2.5 Kubernetes Gatekeeper¶
Gatekeeper is the OPA-based admission controller for Kubernetes. It lets you enforce Rego policies at the API server level, blocking non-compliant resources before they enter the cluster.
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
openAPIV3Schema:
type: object
properties:
labels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{"msg": msg, "details": {"missing_labels": missing}}] {
provided := {label | input.review.object.metadata.labels[label]}
required := {label | label := input.parameters.labels[_]}
missing := required - provided
count(missing) > 0
msg := sprintf("Missing required labels: %v", [missing])
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: ns-must-have-owner-and-env
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Namespace"]
parameters:
labels: ["owner", "environment", "cost-center"]
58.2.6 Terraform Compliance¶
Terraform compliance testing can happen at multiple layers: terraform validate for syntax, tflint for best practices, tfsec / checkov for security, and OPA conftest for organizational policy.
# conftest workflow: evaluate Terraform plan against Rego policies
terraform init
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test --policy ./policies/compliance tfplan.json
58.3 Continuous Compliance Monitoring¶
Policy-as-code enforces policy at change time. Continuous monitoring detects drift after deployment, when someone bypasses the pipeline, when a resource is modified in the console, when a previously-compliant control degrades over time.
58.3.1 Drift Detection¶
Drift is any divergence between the declared state (your IaC) and the actual state (your cloud). Drift can be benign (someone added a tag in the console) or catastrophic (someone disabled MFA on the root account).
flowchart LR
A[Declared State<br/>Terraform/GitOps] -->|Deploy| B[Actual State<br/>Cloud APIs]
B -->|Periodic Query| C[Drift Detector]
A -->|Compare| C
C -->|Drift Found| D{Drift Type}
D -->|Benign| E[Auto-reconcile<br/>Update IaC]
D -->|Security| F[Alert + Auto-remediate]
D -->|Manual Change| G[Create Ticket<br/>Require Approval]
F --> H[Evidence Locker]
G --> H
E --> H # Drift detector for S3 bucket encryption
import boto3
import json
import hashlib
from datetime import datetime, timezone
def detect_s3_encryption_drift(expected_config: dict, evidence_bucket: str):
s3 = boto3.client("s3")
drift_findings = []
buckets = s3.list_buckets()["Buckets"]
for bucket in buckets:
bucket_name = bucket["Name"]
expected = expected_config.get(bucket_name)
if not expected:
drift_findings.append({
"type": "UNTRACKED_BUCKET",
"bucket": bucket_name,
"severity": "MEDIUM",
"message": "Bucket exists but not declared in IaC",
})
continue
try:
actual = s3.get_bucket_encryption(Bucket=bucket_name)
rules = actual["ServerSideEncryptionConfiguration"]["Rules"]
algo = rules[0]["ApplyServerSideEncryptionByDefault"]["SSEAlgorithm"]
except s3.exceptions.ClientError:
drift_findings.append({
"type": "ENCRYPTION_DISABLED",
"bucket": bucket_name,
"severity": "CRITICAL",
"message": "Expected encryption enabled, found none",
})
continue
if algo != expected["encryption_algorithm"]:
drift_findings.append({
"type": "ENCRYPTION_ALGO_MISMATCH",
"bucket": bucket_name,
"severity": "HIGH",
"expected": expected["encryption_algorithm"],
"actual": algo,
})
# Store evidence
evidence = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"control": "CC-6.7-S3-Encryption",
"findings_count": len(drift_findings),
"findings": drift_findings,
}
evidence_bytes = json.dumps(evidence, sort_keys=True).encode()
evidence_hash = hashlib.sha256(evidence_bytes).hexdigest()
key = f"drift/s3-encryption/{evidence['timestamp']}-{evidence_hash[:8]}.json"
s3.put_object(
Bucket=evidence_bucket,
Key=key,
Body=evidence_bytes,
ContentType="application/json",
Metadata={"evidence-hash": evidence_hash},
ObjectLockMode="COMPLIANCE",
ObjectLockRetainUntilDate=datetime(2033, 1, 1, tzinfo=timezone.utc),
)
return drift_findings
58.3.2 Real-Time Compliance Dashboards¶
Dashboards should show control state over time, not just current state. A green tile that says "100% compliant" is less useful than a time-series graph showing the last 90 days of compliance percentage.
Dashboard Design Principles
- Lead with trends, not snapshots -- time series over point-in-time
- Show the worst, not the average -- min over last 30 days matters more than mean
- Drill-downs must be one click -- from aggregate to specific non-compliant resource
- Attribute everything -- every non-compliant item should have an owner
- Age matters -- how long has this been non-compliant?
- Segregate by severity -- CRITICAL / HIGH / MEDIUM / LOW
58.3.3 Automated Remediation Workflows¶
Remediation automation must be tiered by risk. Auto-remediate trivial drift. Auto-ticket medium drift. Alert humans for critical drift.
# Tiered remediation dispatcher
def remediate(finding: dict):
severity = finding["severity"]
finding_type = finding["type"]
auto_remediable = {
("LOW", "MISSING_TAG"): add_missing_tag,
("LOW", "UNENCRYPTED_VOLUME_DEFAULT"): enable_default_encryption,
("MEDIUM", "PUBLIC_S3_BLOCK_DISABLED"): enable_public_access_block,
("MEDIUM", "MFA_NOT_ENFORCED_GROUP"): enforce_mfa_group,
}
requires_approval = {
("HIGH", "IAM_WILDCARD_POLICY"),
("HIGH", "SECURITY_GROUP_0_0_0_0_0"),
}
key = (severity, finding_type)
if key in auto_remediable:
handler = auto_remediable[key]
result = handler(finding)
log_remediation(finding, "AUTO", result)
return result
if key in requires_approval:
ticket = create_approval_ticket(finding)
page_on_call_if_critical(finding)
log_remediation(finding, "TICKETED", ticket)
return ticket
if severity == "CRITICAL":
page_on_call(finding)
create_incident(finding)
log_remediation(finding, "INCIDENT", None)
else:
create_jira_ticket(finding)
log_remediation(finding, "TICKETED", None)
58.4 Regulatory Framework Mapping¶
The hardest problem in compliance is not any single framework. It is running six frameworks simultaneously without doing six times the work. Framework mapping is the practice of maintaining a single control library that maps up to multiple frameworks.
58.4.1 The Unified Control Framework Pattern¶
Instead of maintaining SOC 2 controls, ISO 27001 controls, and NIST controls as separate libraries, maintain a single library of internal controls and map each control to the external requirements it satisfies.
| Internal Control ID | Description | SOC 2 CC | ISO 27001 | NIST 800-53 | PCI-DSS 4.0 | HIPAA | GDPR |
|---|---|---|---|---|---|---|---|
| IC-AC-01 | MFA enforced for all users | CC6.1 | A.9.4.2 | IA-2(1), IA-2(2) | 8.4.2 | 164.312(a)(2)(i) | Art 32 |
| IC-AC-02 | Privileged access reviewed quarterly | CC6.2 | A.9.2.5 | AC-6(7) | 7.2.4 | 164.308(a)(4) | Art 32 |
| IC-CM-01 | Encryption at rest for production data | CC6.7 | A.10.1.1 | SC-28 | 3.5 | 164.312(a)(2)(iv) | Art 32 |
| IC-CM-02 | Encryption in transit (TLS 1.2+) | CC6.7 | A.13.2.3 | SC-8 | 4.2.1 | 164.312(e)(1) | Art 32 |
| IC-CH-01 | Changes reviewed before production | CC8.1 | A.14.2.2 | CM-3 | 6.5.1 | 164.308(a)(1) | Art 32 |
| IC-IR-01 | Incident response plan tested annually | CC7.3 | A.16.1.5 | IR-3 | 12.10.2 | 164.308(a)(6) | Art 33 |
| IC-LO-01 | Security logs retained 12+ months | CC7.2 | A.12.4.1 | AU-11 | 10.5.1 | 164.312(b) | Art 30 |
| IC-VM-01 | Vulnerability scanning weekly | CC7.1 | A.12.6.1 | RA-5 | 11.3.1 | 164.308(a)(1)(ii)(A) | Art 32 |
| IC-DP-01 | Data classification enforced | CC6.1 | A.8.2 | RA-2 | 3.4 | 164.308(a)(1) | Art 30 |
| IC-BC-01 | Backups tested quarterly | A1.2 | A.17.1.3 | CP-9(1) | 9.4.1 | 164.308(a)(7) | Art 32 |
58.4.2 Crosswalk Automation¶
Maintaining these mappings by hand is a losing battle. Automate with a structured control library in YAML or JSON:
# controls/IC-CM-01-encryption-at-rest.yaml
id: IC-CM-01
name: Encryption at Rest for Production Data
description: |
All production data stores (databases, object storage, block storage,
backups) must be encrypted at rest using approved algorithms (AES-256,
KMS-managed or HSM-managed keys).
owner: security-engineering@example.com
testing_frequency: continuous
automation_status: fully-automated
implementation:
- AWS: s3-encryption-enabled Config rule + KMS CMK required
- Azure: encryption-at-rest Azure Policy
- GCP: CMEK required via Organization Policy
mappings:
soc2:
- CC6.7
iso27001_2022:
- "A.8.24" # Use of cryptography
- "A.5.33" # Protection of records
nist_800_53_r5:
- SC-28
- SC-28(1)
pci_dss_v4:
- "3.5.1"
- "3.5.1.1"
hipaa:
- "164.312(a)(2)(iv)"
gdpr:
- "Article 32"
fedramp_moderate:
- SC-28
evidence_sources:
- aws_config_rule: encrypted-volumes
- aws_config_rule: s3-bucket-server-side-encryption-enabled
- azure_policy: storage-account-encryption
- custom_script: scripts/evidence/encryption_inventory.py
tests:
- id: T-IC-CM-01-01
description: Query all S3 buckets and confirm encryption enabled
automation: scripts/tests/s3_encryption_test.py
frequency: daily
- id: T-IC-CM-01-02
description: Query all RDS instances and confirm storage encrypted
automation: scripts/tests/rds_encryption_test.py
frequency: daily
58.4.3 Framework-Specific Considerations¶
Each framework has quirks that automation must respect:
GDPR
GDPR is not primarily a security framework -- it is a privacy and data protection framework. Compliance automation for GDPR is heavily about data mapping, consent management, subject rights automation, and DPIA tracking. See Chapter 56: Privacy Engineering for depth.
HIPAA
HIPAA compliance hinges on the Business Associate Agreement (BAA) and the scope of Protected Health Information (PHI). Automation must know which systems process PHI. Tag every resource with data_classification=PHI where applicable.
PCI-DSS v4.0
PCI-DSS v4.0 introduced the concept of "customized approach" which allows alternative implementations if you can prove equivalent risk reduction. This requires a Targeted Risk Analysis (TRA) document per customized control. Automate the TRA tracking.
SOX
SOX ITGC testing is narrower than most frameworks -- focus on financial reporting systems only. Scope carefully. A SOX auditor does not care about your marketing website.
FedRAMP
FedRAMP Moderate requires 325 controls from NIST 800-53. FedRAMP High requires 421. The continuous monitoring burden is real -- monthly POA&M updates, monthly vulnerability scans, annual reassessment. Automation is not optional at FedRAMP scale.
58.5 Compliance-as-Code Pipelines¶
The compliance pipeline is a CI/CD pattern where policy evaluation gates code promotion. See also Chapter 35: DevSecOps Pipeline.
58.5.1 Pipeline Architecture¶
flowchart TB
A[Developer commits Terraform] --> B[CI triggered]
B --> C[terraform fmt/validate]
C --> D[tflint]
D --> E[tfsec / checkov]
E --> F[terraform plan]
F --> G[conftest: OPA policies]
G --> H{All gates pass?}
H -->|No| I[Block merge<br/>Post PR comment]
H -->|Yes| J[Human review]
J -->|Approved| K[terraform apply]
K --> L[Post-apply drift check]
L --> M[Evidence locker]
M --> N[Compliance dashboard update]
I -.->|Developer fixes| A 58.5.2 GitHub Actions Example¶
name: compliance-pipeline
on:
pull_request:
paths:
- 'terraform/**'
- 'kubernetes/**'
jobs:
compliance-gates:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.9.0
- name: Terraform Init
working-directory: ./terraform
run: terraform init -backend=false
- name: Terraform Validate
working-directory: ./terraform
run: terraform validate
- name: tflint
uses: terraform-linters/setup-tflint@v4
- run: tflint --recursive
- name: tfsec
uses: aquasecurity/tfsec-action@v1.0.3
with:
soft_fail: false
- name: Checkov
uses: bridgecrewio/checkov-action@v12
with:
directory: ./terraform
framework: terraform
soft_fail: false
- name: Terraform Plan
working-directory: ./terraform
run: |
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
- name: OPA Conftest
uses: instrumenta/conftest-action@master
with:
files: terraform/tfplan.json
policy: policies/compliance
- name: Post compliance report to PR
if: always()
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('compliance-report.md', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: report
});
58.5.3 Pre-Deployment Gates vs Post-Deployment Monitoring¶
Gates and monitors are complementary, not redundant. Gates catch what they can before production. Monitors catch what gates miss (manual console changes, emergency bypasses, new policies applied retroactively).
| Control Dimension | Pre-Deploy Gate | Post-Deploy Monitor |
|---|---|---|
| Latency | Blocks deploy (minutes) | Detects drift (minutes to hours) |
| Coverage | Code changes only | All changes including manual |
| Enforcement | Hard block | Alert + auto-remediate |
| Performance impact | Adds to CI time | Continuous background load |
| False positive cost | Developer friction | Alert fatigue |
| Primary owner | Platform engineering | Security operations |
58.6 Evidence Collection Automation¶
Evidence is the currency of audit. The volume of evidence a modern audit requires has grown exponentially. Manual collection does not scale.
58.6.1 Evidence Locker Design¶
An evidence locker is an immutable, versioned, hash-chained store of compliance evidence. Critical properties:
- Immutable -- use S3 Object Lock in COMPLIANCE mode, or equivalent WORM storage
- Hash-chained -- each evidence artifact references the hash of the previous, creating an audit trail that detects tampering
- Cryptographically signed -- evidence is signed by the collector service
- Retained -- retention policy aligned with longest applicable audit window (typically 7 years for SOX, indefinitely for some frameworks)
- Indexed -- searchable by control, timestamp, resource, framework
- Access-controlled -- auditors get read-only access, engineers get write-only
# Evidence locker writer with hash chain
import boto3
import hashlib
import json
import uuid
from datetime import datetime, timezone
class EvidenceLocker:
def __init__(self, bucket: str, kms_key_id: str):
self.s3 = boto3.client("s3")
self.kms = boto3.client("kms")
self.bucket = bucket
self.kms_key_id = kms_key_id
def _get_last_hash(self, control_id: str) -> str:
"""Retrieve the hash of the most recent evidence for this control."""
prefix = f"control/{control_id}/"
response = self.s3.list_objects_v2(
Bucket=self.bucket,
Prefix=prefix,
MaxKeys=1000,
)
if "Contents" not in response:
return "0" * 64 # genesis hash
latest = sorted(response["Contents"], key=lambda o: o["LastModified"])[-1]
head = self.s3.head_object(Bucket=self.bucket, Key=latest["Key"])
return head["Metadata"].get("evidence-hash", "0" * 64)
def _sign(self, payload: bytes) -> str:
response = self.kms.sign(
KeyId=self.kms_key_id,
Message=payload,
SigningAlgorithm="ECDSA_SHA_256",
)
return response["Signature"].hex()
def put_evidence(self, control_id: str, evidence_type: str,
payload: dict, collected_by: str) -> dict:
prev_hash = self._get_last_hash(control_id)
ts = datetime.now(timezone.utc).isoformat()
envelope = {
"evidence_id": str(uuid.uuid4()),
"control_id": control_id,
"evidence_type": evidence_type,
"collected_at": ts,
"collected_by": collected_by,
"previous_hash": prev_hash,
"payload": payload,
}
envelope_bytes = json.dumps(envelope, sort_keys=True).encode()
evidence_hash = hashlib.sha256(envelope_bytes).hexdigest()
signature = self._sign(envelope_bytes)
key = f"control/{control_id}/{ts}-{envelope['evidence_id'][:8]}.json"
self.s3.put_object(
Bucket=self.bucket,
Key=key,
Body=envelope_bytes,
ContentType="application/json",
Metadata={
"evidence-hash": evidence_hash,
"previous-hash": prev_hash,
"signature": signature,
"control-id": control_id,
},
ObjectLockMode="COMPLIANCE",
ObjectLockRetainUntilDate=datetime(2033, 1, 1, tzinfo=timezone.utc),
ServerSideEncryption="aws:kms",
SSEKMSKeyId=self.kms_key_id,
)
return {
"key": key,
"evidence_hash": evidence_hash,
"previous_hash": prev_hash,
"signature": signature,
}
58.6.2 Automated Evidence Types¶
| Evidence Type | Source | Frequency | Example Payload |
|---|---|---|---|
| User access review | Okta / AD / IAM | Quarterly | List of users, last login, manager approval |
| Config snapshot | AWS Config, Azure Resource Graph | Daily | Full resource inventory with compliance state |
| Vulnerability scan results | Qualys / Tenable / Wiz | Weekly | CVE list, affected assets, remediation SLA |
| Patch status | OS management agents | Daily | Per-host patch level, last update timestamp |
| MFA enforcement | IdP logs | Continuous | User auth events with MFA factor |
| Backup verification | Backup system | Daily | Last successful backup, restore test results |
| Security training completion | LMS | Quarterly | User, course, completion date, score |
| Change approval records | Change management | Per change | Change ID, approver, CAB meeting minutes |
| Incident response tests | IR platform | Annually | Tabletop minutes, findings, remediation |
| Penetration test | Third party | Annually | Executive summary, findings, remediation |
| Log retention | SIEM | Daily | Index size, retention policy, oldest event |
| Encryption key rotation | KMS | Per event | Key ID, rotation date, previous version |
58.6.3 Screenshot and Walkthrough Automation¶
Some evidence still requires a visual or procedural walkthrough. Automate with headless browsers:
# Automated evidence screenshot via Playwright
from playwright.sync_api import sync_playwright
import hashlib
from datetime import datetime, timezone
def capture_control_evidence(url: str, control_id: str, locker: EvidenceLocker):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (compliance-evidence-collector/1.0)",
)
page = context.new_page()
page.goto(url, wait_until="networkidle")
# Capture full-page screenshot
screenshot_bytes = page.screenshot(full_page=True, type="png")
screenshot_hash = hashlib.sha256(screenshot_bytes).hexdigest()
# Capture the accessibility tree for text-based audit
snapshot = page.accessibility.snapshot()
browser.close()
return locker.put_evidence(
control_id=control_id,
evidence_type="screenshot",
payload={
"url": url,
"screenshot_sha256": screenshot_hash,
"screenshot_bytes_base64": screenshot_bytes.hex(),
"accessibility_tree": snapshot,
"captured_at": datetime.now(timezone.utc).isoformat(),
},
collected_by="automated-evidence-collector",
)
58.6.4 Audit Trail Integrity¶
The audit trail itself must be tamper-evident. Techniques:
- Hash chain -- each record includes hash of previous (blockchain pattern without the blockchain)
- External timestamping -- hash the daily evidence manifest and submit to an RFC 3161 timestamping authority
- Write-only IAM -- the collector service can write but not delete or modify
- Object Lock -- AWS S3 Object Lock in COMPLIANCE mode prevents even root deletion
- Separate audit account -- evidence locker in a dedicated AWS account with different access controls
- Alerting on deletion attempts -- CloudTrail alert on any DeleteObject or PutObjectLockConfiguration event
58.7 Control Testing Automation¶
Policy-as-code gates prevent bad changes. Control testing verifies that controls remain effective over time.
58.7.1 Test Types¶
| Test Type | Description | Automation Level |
|---|---|---|
| Presence | Does the control exist? | Fully automated |
| Configuration | Is it configured correctly? | Fully automated |
| Effectiveness | Does it actually work? | Partially automated |
| Coverage | Does it cover all in-scope assets? | Fully automated |
| Exception | Are exceptions documented and approved? | Partially automated |
58.7.2 Effectiveness Testing Example¶
Testing that MFA enforcement is effective requires more than checking the setting. You must verify that authentication without MFA actually fails:
# Synthetic MFA enforcement test
import requests
from datetime import datetime, timezone
def test_mfa_enforcement(idp_url: str, test_user: str, test_password: str) -> dict:
"""
Synthetic test: attempt authentication WITHOUT MFA token.
Expected result: HTTP 401 or 403, auth fails.
If auth succeeds, MFA enforcement is broken.
Test account: testuser@example.com / REDACTED (isolated test tenant)
"""
response = requests.post(
f"{idp_url}/auth",
json={
"username": test_user,
"password": test_password,
# Deliberately omit MFA token
},
timeout=10,
)
result = {
"test": "mfa_enforcement_effectiveness",
"tested_at": datetime.now(timezone.utc).isoformat(),
"expected_status": [401, 403],
"actual_status": response.status_code,
"response_body_hash": hashlib.sha256(response.content).hexdigest(),
}
if response.status_code in {401, 403}:
result["outcome"] = "PASS"
result["message"] = "MFA enforcement working -- auth without MFA was rejected"
elif response.status_code == 200:
result["outcome"] = "FAIL"
result["severity"] = "CRITICAL"
result["message"] = "MFA BYPASS -- authentication succeeded without MFA token"
else:
result["outcome"] = "INCONCLUSIVE"
result["message"] = f"Unexpected status {response.status_code}"
return result
Effectiveness Tests are Sensitive
Effectiveness tests often involve attempting the malicious action. Run them against isolated test tenants with synthetic accounts (testuser@example.com / REDACTED). Never run them against production user accounts. Tag all synthetic test activity so SOC knows to suppress the alerts.
58.7.3 Gap Analysis¶
Gap analysis compares controls required by a framework against controls implemented. Automate the comparison:
# Gap analysis: Which NIST 800-53 Moderate controls are not covered?
import yaml
from pathlib import Path
def load_control_library(path: Path) -> list[dict]:
controls = []
for yaml_file in path.glob("**/*.yaml"):
with open(yaml_file) as f:
controls.append(yaml.safe_load(f))
return controls
def required_controls_nist_moderate() -> set[str]:
"""NIST 800-53 Rev 5 Moderate baseline controls."""
# Simplified -- full list would have ~290 controls
return {
"AC-1", "AC-2", "AC-2(1)", "AC-3", "AC-4", "AC-5", "AC-6",
"AU-1", "AU-2", "AU-3", "AU-4", "AU-5", "AU-6", "AU-7",
"CA-1", "CA-2", "CA-3", "CA-5", "CA-6", "CA-7", "CA-9",
"CM-1", "CM-2", "CM-3", "CM-4", "CM-5", "CM-6", "CM-7",
"IA-1", "IA-2", "IA-2(1)", "IA-2(2)", "IA-3", "IA-4",
"IR-1", "IR-2", "IR-3", "IR-4", "IR-5", "IR-6", "IR-7", "IR-8",
"RA-1", "RA-2", "RA-3", "RA-5", "RA-7",
"SC-1", "SC-2", "SC-4", "SC-5", "SC-7", "SC-8", "SC-12", "SC-13",
"SC-28",
"SI-1", "SI-2", "SI-3", "SI-4", "SI-5", "SI-7", "SI-10",
# ... hundreds more
}
def gap_analysis(library_path: Path, framework: str) -> dict:
controls = load_control_library(library_path)
required = required_controls_nist_moderate()
covered = set()
partial = set()
for control in controls:
nist_mappings = control.get("mappings", {}).get("nist_800_53_r5", [])
if control.get("automation_status") == "fully-automated":
covered.update(nist_mappings)
elif control.get("automation_status") == "partial":
partial.update(nist_mappings)
gaps = required - covered - partial
return {
"framework": framework,
"required_count": len(required),
"fully_covered": sorted(covered & required),
"partially_covered": sorted(partial & required),
"gaps": sorted(gaps),
"coverage_pct": round(len(covered & required) / len(required) * 100, 1),
}
58.8 Compliance Reporting¶
Different audiences need different reports. One size does not fit all.
58.8.1 Executive Dashboard¶
Executives care about risk, trends, and incidents. Five metrics maximum:
- Overall compliance score -- weighted average across frameworks
- Trend -- 90-day direction
- Critical gaps -- count of controls in CRITICAL state
- Audit readiness -- days until next audit + readiness percentage
- Open findings -- count by age bucket (<30d, 30-90d, >90d)
58.8.2 Auditor Report¶
Auditors want evidence and traceability. The auditor portal should expose:
- Control library with current state
- Evidence browser (read-only, full text search)
- Sampling support (random selection across the period)
- Export to PDF / Excel for audit workpapers
- Q&A workflow for auditor requests
58.8.3 Regulatory Submission¶
Some frameworks require formal submissions:
- FedRAMP -- monthly POA&M (Plan of Action and Milestones), annual SAR (Security Assessment Report)
- PCI-DSS -- annual ROC (Report on Compliance) or SAQ (Self-Assessment Questionnaire)
- HIPAA -- annual risk analysis (45 CFR 164.308(a)(1)(ii)(A))
- SOC 2 -- annual Type II attestation
- ISO 27001 -- three-year certification cycle with annual surveillance audits
Automation can generate the data. Humans still write the narrative.
58.9 Audit Preparation¶
The perfect audit is boring. No surprises, no fire drills, no "we will get back to you." Boring audits require preparation.
58.9.1 Pre-Audit Self-Assessment¶
Run the audit yourself, 30 days before the auditor arrives:
# Pre-audit self-assessment checklist
class PreAuditCheck:
def __init__(self, framework: str, audit_date: str):
self.framework = framework
self.audit_date = audit_date
self.checks = []
def run(self):
self.check_evidence_continuity()
self.check_control_exceptions_documented()
self.check_walkthrough_scripts_ready()
self.check_evidence_access_provisioned()
self.check_recent_incidents_documented()
self.check_policy_approvals_current()
self.generate_report()
def check_evidence_continuity(self):
"""Verify no gaps in evidence for the audit period."""
# For each control, ensure evidence exists for every day of audit window
pass
def check_control_exceptions_documented(self):
"""Every non-compliant state must have an approved exception."""
pass
def check_walkthrough_scripts_ready(self):
"""Process owners have updated their walkthrough scripts."""
pass
def check_evidence_access_provisioned(self):
"""Auditor accounts provisioned, 2FA enrolled, access tested."""
pass
58.9.2 Walkthrough Scripts¶
For every control, maintain a walkthrough script -- the 5-minute narrative an engineer gives the auditor:
# Walkthrough: IC-AC-01 (MFA Enforcement)
**Control owner**: IAM team (iam-team@example.com)
**Last reviewed**: 2026-01-15
**Typical duration**: 8 minutes
## Scope
All human users accessing production systems via our Okta IdP.
Service accounts excluded (covered by IC-AC-05 separately).
## Walkthrough steps
1. **Open the Okta admin console** (okta.example.com)
2. **Navigate to Security > Authentication > Authentication Policies**
3. **Demonstrate the "Production Access" policy**:
- Show the rule: "Require MFA for any access to apps tagged production"
- Show the app assignments (AWS SSO, GitHub Enterprise, Snowflake, Datadog)
4. **Open evidence locker at evidence.example.com/control/IC-AC-01**
5. **Show 90-day MFA enforcement evidence**:
- Daily samples of authentication events
- Zero non-MFA authentications in window
6. **Show the effectiveness test results**:
- Synthetic MFA-bypass test runs nightly
- Last 90 days: 90/90 PASS (auth without MFA correctly rejected)
7. **Exception handling**:
- Two documented exceptions (testuser-emergency, testuser-breakglass)
- Both require quarterly review (next review 2026-04-30)
## Common auditor questions
**Q**: How do you handle emergency access when MFA fails?
**A**: Break-glass procedure in [runbook link]. Two-person control required.
**Q**: What happens when MFA enforcement breaks?
**A**: The nightly effectiveness test would catch it within 24 hours.
Security team paged. Incident ticket created automatically.
**Q**: Who can modify the authentication policy?
**A**: Only members of the IAM-Admin group (5 members). Changes require
PR review and produce CloudTrail events alerting the security team.
58.9.3 Evidence Readiness¶
Two weeks before the audit, pre-stage evidence in the auditor portal:
- Provision auditor accounts in the evidence locker with read-only access
- Run the full evidence query for the audit period
- Verify no gaps in the time series
- Export the control library crosswalk for the specific framework
- Test the auditor experience end-to-end
- Send access instructions
58.10 Risk-Based Compliance¶
Not every non-compliant finding is equally urgent. Risk-based compliance prioritizes remediation by actual business impact.
58.10.1 Risk Scoring Integration¶
# Risk score = severity x likelihood x asset value x exposure
def calculate_risk_score(finding: dict, asset_registry: dict) -> float:
severity_weights = {"CRITICAL": 10, "HIGH": 7, "MEDIUM": 4, "LOW": 1}
exposure_weights = {"INTERNET": 10, "VPN": 5, "INTERNAL": 2, "AIR_GAPPED": 1}
classification_weights = {"RESTRICTED": 10, "CONFIDENTIAL": 7, "INTERNAL": 3, "PUBLIC": 1}
asset = asset_registry.get(finding["resource_id"], {})
severity = severity_weights[finding["severity"]]
likelihood = finding.get("likelihood_score", 5) # 1-10
exposure = exposure_weights.get(asset.get("exposure", "INTERNAL"), 2)
value = classification_weights.get(asset.get("data_classification", "INTERNAL"), 3)
raw_score = severity * likelihood * exposure * value
# Normalize to 0-1000 range
return min(raw_score, 10000) / 10
58.10.2 Compensating Controls¶
When a primary control cannot be implemented (technical, business, cost), compensating controls reduce the residual risk to acceptable levels.
Example: a legacy system cannot support MFA. Compensating controls:
- Network isolation (system accessible only from jump host)
- Jump host requires MFA
- All actions on legacy system logged and reviewed daily
- Password length minimum 20 characters, rotated every 30 days
- Session timeout 10 minutes
- Compensating control package approved by CISO, reviewed annually
Document compensating controls formally. Auditors accept them when documented, reject them when verbal.
58.11 Multi-Framework Harmonization¶
The enterprise running SOC 2, ISO 27001, PCI-DSS, HIPAA, GDPR, and FedRAMP simultaneously does not run six programs. It runs one.
58.11.1 Test Once, Satisfy Many¶
The unified control library enables test-once-satisfy-many. A single test of encryption at rest satisfies SOC 2 CC6.7, ISO 27001 A.8.24, NIST SC-28, PCI-DSS 3.5.1, HIPAA 164.312(a)(2)(iv), and GDPR Article 32.
flowchart LR
T[Single Control Test<br/>Encryption at Rest]
T --> S1[SOC 2 CC6.7]
T --> S2[ISO 27001 A.8.24]
T --> S3[NIST SC-28]
T --> S4[PCI-DSS 3.5.1]
T --> S5[HIPAA 164.312]
T --> S6[GDPR Art 32] 58.11.2 Framework-Specific Overlays¶
Some framework requirements do not map cleanly. Handle these as overlays on top of the core control library:
- FedRAMP-specific: FIPS 140-2 validated cryptographic modules (not just any AES-256)
- PCI-DSS-specific: Segmentation testing of the CDE (cardholder data environment)
- HIPAA-specific: BAA tracking with all vendors processing PHI
- GDPR-specific: DPIA for high-risk processing, DPO appointment, 72-hour breach notification
58.11.3 Crosswalk Maintenance¶
Frameworks evolve. ISO 27001 moved from 2013 to 2022 and restructured Annex A. PCI-DSS moved from 3.2.1 to 4.0 with substantial new requirements. NIST 800-53 is on Rev 5. Your crosswalk must track these revisions.
Pin framework versions explicitly:
frameworks_in_scope:
- name: soc2
version: "2017 (TSC 2017)"
criteria_revision: "2022"
- name: iso27001
version: "2022"
- name: nist_800_53
revision: "5"
- name: pci_dss
version: "4.0"
- name: hipaa
effective_date: "2013-09-23"
- name: gdpr
effective_date: "2018-05-25"
- name: fedramp
baseline: "moderate_rev5"
58.12 Continuous Assurance Program¶
Tools alone do not make a program. A mature continuous assurance program is a governance structure, a team, a toolchain, and a maturity roadmap.
58.12.1 Governance Model¶
flowchart TB
Board[Board / Audit Committee]
CEO[CEO / CFO]
CISO[CISO]
CCO[Chief Compliance Officer]
GRC[GRC Team]
SE[Security Engineering]
Plat[Platform Engineering]
App[Application Teams]
IA[Internal Audit]
EA[External Auditors]
Board --> CEO
CEO --> CISO
CEO --> CCO
CISO --> SE
CCO --> GRC
GRC --> IA
SE --> Plat
Plat --> App
IA -.Independence.-> Board
EA -.Engagement.-> Board Key principles:
- Three lines of defense -- operational owners, risk/compliance, internal audit
- Internal audit independence -- reports to audit committee, not to CISO
- External auditor rotation -- firm rotation every 5-10 years per SOX and best practice
58.12.2 Team Structure¶
A mid-size (500-2000 employee) continuous assurance team:
| Role | FTE | Responsibility |
|---|---|---|
| Chief Compliance Officer | 1 | Program ownership, board reporting |
| Compliance Program Manager | 2-3 | Framework management, audit coordination |
| GRC Engineer | 3-5 | Policy-as-code, automation, evidence pipelines |
| Control Testing Engineer | 2-3 | Effectiveness testing, gap analysis |
| Auditor Liaison | 1-2 | External auditor management |
| Risk Analyst | 2-3 | Risk scoring, compensating controls |
58.12.3 Tooling Stack Reference Architecture¶
| Layer | Function | Example Tools |
|---|---|---|
| Policy-as-code | Preventive controls | OPA/Gatekeeper, AWS Config, Azure Policy, Sentinel |
| CSPM | Cloud posture monitoring | Wiz, Prisma Cloud, Lacework, CrowdStrike Falcon Cloud |
| CWPP | Workload protection | Aqua, Sysdig, Lacework |
| SIEM | Log aggregation, detection | Splunk, Sentinel, Elastic, Chronicle |
| Vulnerability | Vuln scanning | Qualys, Tenable, Rapid7, Wiz |
| Secrets | Secret scanning | GitGuardian, TruffleHog, GitHub secret scanning |
| GRC | Control orchestration | Drata, Vanta, ServiceNow GRC, Hyperproof |
| Evidence locker | Immutable evidence store | S3 Object Lock, Azure Immutable Blob, custom |
| Change mgmt | Change approvals | ServiceNow, Jira Service Management |
| IAM governance | Access reviews | SailPoint, Saviynt, Okta Identity Governance |
58.12.4 Maturity Roadmap¶
| Level | Name | Characteristics |
|---|---|---|
| 1 | Reactive | Audit-driven, manual evidence, point-in-time testing |
| 2 | Managed | Documented controls, scheduled testing, some automation |
| 3 | Defined | Unified control library, policy-as-code for critical controls |
| 4 | Quantified | Risk-scored findings, KPIs/KRIs tracked, continuous monitoring |
| 5 | Optimized | Full automation, auto-remediation, real-time audit readiness |
The 18-month target
Most organizations can reach Level 3 in 12 months and Level 4 in 18-24 months with focused investment. Level 5 requires sustained commitment over 3+ years and deep engineering culture. Do not skip levels -- each builds on the previous.
58.13 KQL and SPL Detection Queries for Compliance Violations¶
Your SIEM is a compliance goldmine. These queries detect common compliance violations in real time.
58.13.1 KQL (Microsoft Sentinel / Log Analytics)¶
// Detect: S3 bucket made public within the last hour
AWSCloudTrail
| where TimeGenerated > ago(1h)
| where EventName in ("PutBucketPolicy", "PutBucketAcl", "DeletePublicAccessBlock")
| extend RequestParams = parse_json(RequestParameters)
| where RequestParams has "AllUsers" or RequestParams has "AuthenticatedUsers"
or EventName == "DeletePublicAccessBlock"
| project TimeGenerated, UserIdentityUserName, EventName,
BucketName=tostring(RequestParams.bucketName),
SourceIpAddress, UserAgent
| where SourceIpAddress !in ("192.0.2.10", "192.0.2.11") // known admin jump hosts
// Detect: MFA disabled on a user
AWSCloudTrail
| where EventName in ("DeactivateMFADevice", "DeleteVirtualMFADevice")
| project TimeGenerated, UserIdentityUserName, EventName,
TargetUser=tostring(parse_json(RequestParameters).userName),
SourceIpAddress
// Detect: Root account usage (any use = violation of IC-AC-03)
AWSCloudTrail
| where UserIdentityType == "Root"
| where EventName != "ConsoleLogin" or ResponseElements contains "Failure"
| project TimeGenerated, EventName, EventSource, SourceIpAddress,
UserAgent, ErrorCode, ErrorMessage
// Detect: Encryption disabled on data store
AzureActivity
| where OperationNameValue has_any ("storageAccounts/write", "databases/write",
"disks/write")
| where ActivityStatusValue == "Success"
| extend Properties = parse_json(Properties)
| where Properties has "encryption" and Properties.encryption.services.blob.enabled == "false"
| project TimeGenerated, Caller, ResourceId, OperationNameValue
// Detect: Privileged role assignment without approval ticket
SigninLogs
| join kind=inner (
AuditLogs
| where OperationName == "Add member to role"
| where Result == "success"
| extend RoleName = tostring(TargetResources[0].modifiedProperties[1].newValue)
| where RoleName has_any ("Global Administrator", "Privileged Role Administrator",
"Security Administrator")
) on $left.UserPrincipalName == $right.InitiatedBy.user.userPrincipalName
| project TimeGenerated, UserPrincipalName, RoleName, IPAddress
| join kind=leftouter (
// Correlate with approval ticketing system logs
ServiceNow_CL
| where Category_s == "PrivilegedAccessRequest"
| where State_s == "Approved"
| project TicketId_s, RequestedUser_s, ApprovedAt_t
) on $left.UserPrincipalName == $right.RequestedUser_s
| where isempty(TicketId_s) // No matching approval ticket
58.13.2 SPL (Splunk)¶
# Detect: Configuration drift from IaC baseline
index=aws sourcetype=aws:cloudtrail
| where eventName IN ("CreateBucket", "PutBucketAcl", "PutBucketPolicy",
"ModifyDBInstance", "CreateSecurityGroup")
| eval requested_by=coalesce(userIdentity.arn, userIdentity.userName)
| lookup iac_managed_resources resource_id AS requestParameters.bucketName OUTPUT managed_by_iac
| where isnull(managed_by_iac) OR managed_by_iac="false"
| where userIdentity.type!="AssumedRole" OR userIdentity.sessionContext.sessionIssuer.userName!="terraform-runner"
| table _time requested_by eventName requestParameters.bucketName sourceIPAddress
# Detect: Password policy weakened
index=aws sourcetype=aws:cloudtrail eventName=UpdateAccountPasswordPolicy
| eval new_min_length=requestParameters.minimumPasswordLength
| eval new_max_age=requestParameters.maxPasswordAge
| where new_min_length < 14 OR new_max_age > 90 OR requestParameters.requireSymbols=false
| table _time userIdentity.arn new_min_length new_max_age requestParameters.requireSymbols
# Detect: KMS key policy allowing public access
index=aws sourcetype=aws:cloudtrail eventName IN ("PutKeyPolicy", "CreateKey")
| rex field=requestParameters.policy "\"Principal\":\s*\"(?<principal>[^\"]+)\""
| where principal="*" OR principal="AWS:*"
| table _time userIdentity.arn requestParameters.keyId principal
# Detect: Network ACL allows 0.0.0.0/0 on sensitive port
index=aws sourcetype=aws:cloudtrail eventName=AuthorizeSecurityGroupIngress
| spath path=requestParameters.ipPermissions{} output=perms
| mvexpand perms
| spath input=perms
| where '{}.cidrIp'="0.0.0.0/0"
| where '{}.fromPort' IN (22, 3389, 1433, 3306, 5432, 27017, 6379, 9200)
| table _time userIdentity.arn requestParameters.groupId {}.fromPort {}.cidrIp
58.14 Practical Implementation Roadmap¶
A suggested 12-month roadmap for an organization starting from Level 1:
Months 1-3: Foundation¶
- [ ] Build unified control library (start with 50 highest-value controls)
- [ ] Deploy OPA/Gatekeeper in one non-production environment
- [ ] Stand up evidence locker (S3 Object Lock + KMS)
- [ ] Implement 5 Rego policies covering highest-risk controls
- [ ] Deploy CSPM tool (Wiz, Prisma, Lacework, or equivalent)
Months 4-6: Automation¶
- [ ] Extend policy-as-code to production
- [ ] Automate evidence collection for 20 controls
- [ ] Build compliance dashboard (Grafana, Looker, or GRC platform native)
- [ ] Implement drift detection for 10 critical resource types
- [ ] Complete first framework mapping (typically SOC 2 or ISO 27001)
Months 7-9: Scale¶
- [ ] Extend to additional frameworks (second and third)
- [ ] Deploy auto-remediation for LOW and MEDIUM findings
- [ ] Implement effectiveness testing for 15 controls
- [ ] Onboard auditors to evidence locker portal
- [ ] First continuous compliance audit with external auditor
Months 10-12: Optimize¶
- [ ] Full framework crosswalk for all in-scope frameworks
- [ ] Risk-based remediation prioritization live
- [ ] Pre-audit self-assessment automation
- [ ] 90% of controls fully automated
- [ ] Level 4 maturity achieved
58.15 Anti-Patterns to Avoid¶
Compliance Anti-Patterns
Automation theater -- dashboards report 100% compliant while the underlying data is stale or fake.
Framework sprawl -- adopting every framework that a customer mentions without strategic prioritization.
Policy paralysis -- maintaining 600 policies when 60 well-enforced ones would serve better.
Evidence hoarding -- collecting everything, indexing nothing. If you cannot answer an auditor question in 60 seconds, your evidence is useless.
Compliance by exception -- every control has 40 documented exceptions. At that point, the control is not really a control.
Shadow compliance -- engineering builds its own compliance tools without GRC involvement. Auditors will not trust unvalidated systems.
Gate avoidance -- emergency bypass used routinely. Every bypass must be documented, justified, time-boxed, and reviewed.
Crosswalk drift -- framework mappings not updated as frameworks evolve. SOC 2 TSC 2022 is different from 2017.
One-time implementation -- standing up the program is 10% of the work. Keeping it running is 90%.
No humans -- full automation with no humans means no one understands the controls anymore. You need both.
58.16 Summary¶
Compliance automation is not about passing audits faster. It is about building a continuously assured operating system for your security program. When done right:
- Engineers stop hating compliance because compliance stops interrupting them
- Auditors stop chasing because evidence is continuously fresh
- Executives stop guessing because dashboards show real state
- Regulators stop penalizing because violations are rare and documented
- Customers stop asking because your public trust center answers their questions
The path from Level 1 (reactive) to Level 5 (optimized) is a 3-5 year journey. Start with policy-as-code for your top 10 risks. Add continuous monitoring. Build the evidence locker. Unify the control library. Harmonize frameworks. Automate remediation. Measure. Improve. Repeat.
Compliance should be boring. Boring is the goal.
58.17 Cross-References¶
- Chapter 13: Security Governance, Privacy & Risk -- governance foundations, risk frameworks, policy lifecycle
- Chapter 29: Vulnerability Management -- vuln data feeds compliance evidence for RA-5 / CC7.1
- Chapter 35: DevSecOps Pipeline -- CI/CD gates where policy-as-code executes
- Chapter 56: Privacy Engineering -- privacy-by-design, DPIA, consent management for GDPR
58.18 Further Reading¶
- NIST SP 800-53 Rev 5 -- Security and Privacy Controls for Information Systems and Organizations
- NIST SP 800-53A Rev 5 -- Assessing Security and Privacy Controls
- ISO/IEC 27001:2022 -- Information Security Management Systems Requirements
- PCI-DSS v4.0 -- Payment Card Industry Data Security Standard
- AICPA Trust Services Criteria (2017, revised 2022) -- SOC 2 criteria
- CSA Cloud Controls Matrix v4
- CIS Controls v8
- OPA Documentation -- openpolicyagent.org/docs
- Gatekeeper Policy Library -- open-policy-agent.github.io/gatekeeper-library
Chapter 58 Checklist
- [ ] Unified control library defined (YAML per control)
- [ ] OPA/Gatekeeper deployed in Kubernetes
- [ ] 10+ Rego policies in production
- [ ] CSPM tool deployed and covering all cloud accounts
- [ ] Evidence locker operational (S3 Object Lock)
- [ ] Hash-chained evidence collection for 20+ controls
- [ ] CI/CD compliance gates blocking non-compliant changes
- [ ] Drift detection running hourly
- [ ] Auto-remediation live for LOW/MEDIUM findings
- [ ] Compliance dashboard with 90-day trends
- [ ] Framework crosswalk for all in-scope frameworks
- [ ] Effectiveness testing for 15+ controls
- [ ] Pre-audit self-assessment automation
- [ ] Auditor portal provisioned for external auditors
- [ ] Quarterly program review with CISO and CCO