CCSK Domain 11: Incident Response & Resilience

11.1 Incident Response

Definitions:
- Event: Any observable issue that may indicate a security or availability problem.
- Incident: An event that violates security policies or threatens operations; requires immediate attention.
- Breach: Successful circumvention of security, leading to unauthorised access or data extraction.
Incident Response Lifecycle (NIST/CSA):
1. Preparation: Build IR capability, assign roles, train team, establish communication, ensure access to environments and tools, document assets, evaluate infrastructure, and subscribe to threat intelligence.
2. Detection & Analysis: Detect incidents (using CSPM, SIEM, workload/network monitoring), validate alerts, estimate scope, assign incident manager, build attack timeline, determine impact, and communicate status.
3. Containment, Eradication & Recovery: Isolate affected systems, remove root cause, restore systems, document incident, and preserve evidence.
4. Post-Incident Analysis: Learn from the incident, document lessons, improve processes, and share indicators of compromise.

Cloud-specific preparation:
- Understand CSP contractual agreements and support options (paid/free).
- Record incident support contacts in a cloud deployment registry.
- Plan for incidents affecting the CSP (e.g., public vulnerabilities, denial-of-service).
- Coordinate with business continuity planning.
Training:
- Cloud IR requires understanding of cloud-specific processes and technologies.
- Responders need persistent read access to deployments (metadata/configurations).
- Full-read access (data review) should require multiple approvals (“break glass” process).
- Access to deployment registry, CI/CD pipelines, and code repositories may be needed.

Cloud-specific challenges:
- New telemetry sources, expanded attack surface (management plane), rapid changes, lack of traditional perimeter, IAM blast radius, API-driven and ephemeral resources, decentralised management, and automation.
Incident analysis focus: Management plane logs are crucial for identifying unauthorised access and misconfigurations.
Forensics:
- Use snapshots for VM/container analysis.
- Volatile memory acquisition may require special tools.
- Log analysis (management, system, application, user activity).
- Evidence preservation: Understand backup/data retention policies and chain of custody.
Containers/serverless:
- Containers are ephemeral; redirect logs to external storage.
- Serverless relies on function logs for forensic analysis.

Containment:
- Engage cloud/application owners for containment plans.
- IAM and management plane containment are top priorities (may require changes at identity provider and relying party).
- Network containment is easier with SDN (API/web console).
- Prioritise resources made public/shared with unknown destinations.
- Escalate quickly for critical data, even if it risks breaking functionality.
Eradication:
- Remove attacker from management plane (credential rotation, MFA, policy changes).
- Delete old images, serverless code, and IaC to prevent re-compromise.
Recovery:
- Use IaC, autoscaling, and automation to redeploy hardened/clean environments.
- Analyse all recovery resources to ensure root cause is eliminated.

Lessons learned:
- Include cloud deployment teams in analysis.
- Create new runbooks/playbooks for new incident types.
- Focus on systemic issues (“Just Culture”) rather than blame.
- Use scanners to identify IAM issues; consider just-in-time entitlements and strong authentication.

Subscribe to get the latest posts sent to your email.