MONITORING & LOGGING

Security Monitoring

Logging and monitoring for security - how to track activities in your systems, detect threats, and respond to security incidents.

Why Do We Need Security Monitoring?

Even with the strongest security measures, we must be prepared for situations where an attack is attempted or actually carried out. Logging is like security cameras in your building - they record what happens. Monitoring is the alarm system - it alerts you when something suspicious occurs. Without these tools, you won't know that someone stole your credentials until it's too late.

Physical Security Analogy

Imagine a luxury villa with valuable items inside. You have high walls, locked doors, and a safe for the most precious belongings. But what if someone still manages to get inside?

  • Cameras (Logging): Record everything that happens - who arrived, what they did, where they went
  • Alarm (Monitoring): Immediately alerts when someone enters a secured area
  • Security Service (Incident Response): The team that responds to alarms

Security is Layered

We protect the system at every level of attack - before, during, and after an attack. Monitoring and logging are key to detection and response during and after an attack.

AWS Monitoring & Logging Services

Audit & Compliance

AWS CloudTrail

Continuous monitoring and auditing of all API calls in your AWS account.

  • Records every API call
  • Who, what, when, from where
  • Management and Data events
  • Multi-region trails
  • Log file integrity validation
Monitoring & Alerting

AWS CloudWatch

Monitoring and management service for metrics, logs, and alarms.

  • Metrics from AWS services
  • Custom metrics
  • Log groups and log streams
  • Alarms and notifications
  • Dashboards
Threat Detection

AWS GuardDuty

Intelligent threat detection using machine learning.

  • Anomaly detection
  • Threat intelligence feeds
  • Automated findings
  • Integration with Security Hub
Centralized Overview

AWS Security Hub

Central dashboard for security findings from multiple services.

  • Findings aggregation
  • Compliance checks
  • Priority scoring
  • Automated remediation

CloudTrail - Basic Configuration

CloudTrail records all API activity in your AWS account. Every login, every instance creation, every security group change.

# Terraform - Multi-region CloudTrail with CloudWatch integration resource "aws_cloudtrail" "security_trail" { name = "security-audit-trail" s3_bucket_name = aws_s3_bucket.cloudtrail_logs.id include_global_service_events = true is_multi_region_trail = true enable_log_file_validation = true # Send logs to CloudWatch cloud_watch_logs_group_arn = "${aws_cloudwatch_log_group.cloudtrail.arn}:*" cloud_watch_logs_role_arn = aws_iam_role.cloudtrail_cloudwatch.arn # Management events (API calls) event_selector { read_write_type = "All" include_management_events = true } # Data events for S3 (optional) event_selector { read_write_type = "All" include_management_events = false data_resource { type = "AWS::S3::Object" values = ["arn:aws:s3:::sensitive-bucket/"] } } tags = { Name = "security-audit-trail" Environment = "production" } } # S3 bucket for CloudTrail logs resource "aws_s3_bucket" "cloudtrail_logs" { bucket = "company-cloudtrail-logs-${data.aws_caller_identity.current.account_id}" tags = { Name = "cloudtrail-logs" } } # Encryption for CloudTrail logs resource "aws_s3_bucket_server_side_encryption_configuration" "cloudtrail_logs" { bucket = aws_s3_bucket.cloudtrail_logs.id rule { apply_server_side_encryption_by_default { sse_algorithm = "aws:kms" } } } # CloudWatch Log Group resource "aws_cloudwatch_log_group" "cloudtrail" { name = "/aws/cloudtrail/security-audit" retention_in_days = 365 tags = { Name = "cloudtrail-logs" } }

CloudWatch Alarms for Security Events

Example: Failed Login Alarm

# Metric filter for failed console logins resource "aws_cloudwatch_log_metric_filter" "failed_console_login" { name = "FailedConsoleLogin" pattern = "{ ($.eventName = ConsoleLogin) && ($.errorMessage = \"Failed authentication\") }" log_group_name = aws_cloudwatch_log_group.cloudtrail.name metric_transformation { name = "FailedConsoleLoginCount" namespace = "SecurityMetrics" value = "1" } } # Alarm when more than 3 failed logins within 5 minutes resource "aws_cloudwatch_metric_alarm" "failed_login_alarm" { alarm_name = "FailedConsoleLoginAlarm" comparison_operator = "GreaterThanThreshold" evaluation_periods = 1 metric_name = "FailedConsoleLoginCount" namespace = "SecurityMetrics" period = 300 # 5 minutes statistic = "Sum" threshold = 3 alarm_description = "More than 3 failed login attempts within 5 minutes" alarm_actions = [aws_sns_topic.security_alerts.arn] tags = { Name = "failed-login-alarm" } } # SNS topic for security alerts resource "aws_sns_topic" "security_alerts" { name = "security-alerts" } # Email subscription resource "aws_sns_topic_subscription" "security_email" { topic_arn = aws_sns_topic.security_alerts.arn protocol = "email" endpoint = "security-team@company.cz" }

Other Important Alarms

# Alarm on root user activity resource "aws_cloudwatch_log_metric_filter" "root_activity" { name = "RootAccountUsage" pattern = "{ $.userIdentity.type = \"Root\" && $.userIdentity.invokedBy NOT EXISTS && $.eventType != \"AwsServiceEvent\" }" log_group_name = aws_cloudwatch_log_group.cloudtrail.name metric_transformation { name = "RootAccountUsageCount" namespace = "SecurityMetrics" value = "1" } } resource "aws_cloudwatch_metric_alarm" "root_usage_alarm" { alarm_name = "RootAccountUsageAlarm" comparison_operator = "GreaterThanThreshold" evaluation_periods = 1 metric_name = "RootAccountUsageCount" namespace = "SecurityMetrics" period = 60 statistic = "Sum" threshold = 0 alarm_description = "Root account was used!" alarm_actions = [aws_sns_topic.security_alerts.arn] } # Alarm on Security Group changes resource "aws_cloudwatch_log_metric_filter" "security_group_changes" { name = "SecurityGroupChanges" pattern = "{ ($.eventName = AuthorizeSecurityGroupIngress) || ($.eventName = AuthorizeSecurityGroupEgress) || ($.eventName = RevokeSecurityGroupIngress) || ($.eventName = RevokeSecurityGroupEgress) || ($.eventName = CreateSecurityGroup) || ($.eventName = DeleteSecurityGroup) }" log_group_name = aws_cloudwatch_log_group.cloudtrail.name metric_transformation { name = "SecurityGroupChangeCount" namespace = "SecurityMetrics" value = "1" } } # Alarm on IAM policy changes resource "aws_cloudwatch_log_metric_filter" "iam_policy_changes" { name = "IAMPolicyChanges" pattern = "{ ($.eventName = DeleteGroupPolicy) || ($.eventName = DeleteRolePolicy) || ($.eventName = DeleteUserPolicy) || ($.eventName = PutGroupPolicy) || ($.eventName = PutRolePolicy) || ($.eventName = PutUserPolicy) || ($.eventName = CreatePolicy) || ($.eventName = DeletePolicy) || ($.eventName = AttachRolePolicy) || ($.eventName = DetachRolePolicy) || ($.eventName = AttachUserPolicy) || ($.eventName = DetachUserPolicy) }" log_group_name = aws_cloudwatch_log_group.cloudtrail.name metric_transformation { name = "IAMPolicyChangeCount" namespace = "SecurityMetrics" value = "1" } }

What to Monitor?

Key Security Events

Authentication & Authorization

  • Failed login attempts (brute force detection)
  • Root account usage
  • MFA disabled
  • New IAM users/roles created
  • Permission changes

Network & Infrastructure

  • Security group changes
  • VPC changes
  • Route table modifications
  • Network ACL changes
  • New EC2 instances

Data Access

  • S3 bucket policy changes
  • S3 public access
  • KMS key deletion/disable
  • RDS snapshot sharing
  • Secrets Manager access

Billing & Account

  • CloudTrail disabled
  • Config rules changes
  • Unusual billing spikes (cryptomining)
  • Account settings changes

Security Monitoring Best Practices

Logging

  • Enable CloudTrail in all regions
  • Retain logs for at least 1 year
  • Encrypt logs at rest (KMS)
  • Enable log file integrity validation
  • VPC Flow Logs for network traffic
  • Centralize logs into a single account (Security Account)

Monitoring & Alerting

  • Alarms on critical security events
  • Real-time notifications (SNS, Slack, PagerDuty)
  • Dashboards for the security team
  • GuardDuty for threat detection
  • Security Hub for centralized overview
  • Regular review of findings

Incident Response

  • Defined playbooks for different types of incidents
  • Automated remediation where possible
  • Regular incident response drills
  • Post-incident reviews
  • Documentation of all incidents

What NOT to Do

  • Disable CloudTrail (even temporarily)
  • Ignore alarms (alert fatigue)
  • Delete logs before the retention period ends
  • Make logs accessible to everyone (least privilege)
  • Forget to monitor production
  • Not have an incident response plan