Multi-Region Resiliency with Active-Active Setup
Global E-Commerce Platform
Imagine you have an e-commerce website that serves customers worldwide. To ensure high availability, low latency, and disaster recovery, you decide to implement an Active-Active AWS Architecture across multiple regions.
How It Works
Duplicate Infrastructure in Two or More AWS Regions
Your application is deployed in Region A (e.g., US-East-1) and Region B (e.g., US-West-2).
Each region has EC2, ECS, RDS, DynamoDB, S3, API Gateway, and Lambda to handle requests.
Traffic Distribution with Amazon Route 53
A global customer visits your website.
Amazon Route 53 (DNS) directs them to the closest and healthiest region (A or B) based on latency-based routing.
Data Synchronization
If a user updates their shopping cart in Region A, changes must reflect in Region B.
You can achieve this via multi-region databases like DynamoDB Global Tables, Amazon Aurora Global Database, or data replication methods.
Load Balancing and Auto Scaling
Application Load Balancers (ALB) + Auto Scaling Groups (ASG) ensure traffic is evenly distributed within each region.
If traffic spikes in one region, it can scale up automatically.
Failover and Disaster Recovery
If Region A goes down (e.g., due to an outage), traffic is automatically rerouted to Region B via Route 53 health checks.
Since both regions are active, there is no downtime.
Multi-Region Active/Active Architecture
The following architecture demonstrates a Multi-Region active/active setup using AWS Regions as active sites. While the example shows two Regions, the architecture can scale to include more Regions.Comment
Multi-Region Design Highlights
Traffic Distribution: Route 53 ensures requests are routed based on latency or geolocation for optimal performance and compliance.
High Availability: Each Region operates independently, supporting both compute and database operations locally.
Low Latency
DynamoDB Global Tables handle local read/write operations in each Region.
Aurora Global Database ensures low-latency reads via replicas in secondary Regions.
Disaster Recovery: Supports active/active configuration for high resilience, with mechanisms to route traffic away from impacted Regions.
Benefits
Low Recovery Time Objective (RTO): Minimal downtime in case of failure.
Low Recovery Point Objective (RPO): Minimal data loss during recovery.
Global Low-Latency Access: Optimized for geographically distributed users.
Site Independence: Each Region operates independently, providing separation between sites.
Challenges
Increased Complexity: Managing data synchronization, traffic routing, and read/write patterns.
Higher Costs: Running active resources in multiple Regions.
Services
Route 53
Acts as the DNS service for highly available and scalable traffic routing.
Directs user requests to the appropriate API Gateway based on configured routing policies (e.g., latency or geolocation).
While failover routing is not explicitly configured in active-active setups, health checks, and Route 53's inherent traffic management ensure failover.
Route 53 Health Check with CloudWatch Alarm based on
API Gateway Metrics that monitor aggregated API Gateway performance metrics like 5XXError to detect high-level issues across all APIs.
CloudWatch Synthetic Canary for custom health check endpoints for each API behind the API Gateway
API Gateway
Serves as the primary entry point for application traffic in each Region.
Routes requests to
AWS Lambda: For serverless workloads requiring minimal infrastructure management.
Application Load Balancer (ALB): For workloads hosted on ECS Fargate.
Universal across Single-Region and Multi-Region Setups: The distinction between single-region and multi-region setups comes from how other AWS services (e.g., Route 53) are integrated with API Gateway to manage global traffic distribution and failover. These services add layers of functionality, but the API Gateway itself remains unchanged.
Application Load Balancer (ALB)
Distributes incoming traffic to ECS Fargate tasks within its Region.
Provides fault tolerance and scalability for containerized workloads.
Universal across Single-Region and Multi-Region Setups: The distinction between single-region and multi-region setups comes from how other AWS services (e.g., Route 53) are integrated with ALBs to manage global traffic distribution and failover. These services add layers of functionality, but the ALB itself remains unchanged.
DynamoDB Global Tables
Supports the Read-Local/Write-Local pattern, allowing each Region to handle reads and writes locally for low-latency access.
Data is asynchronously replicated across Regions to maintain eventual consistency.
Provides high availability and fault tolerance for distributed workloads.
Supports Read-Local/Write-Local Pattern: In this pattern, requests routed to a Region are handled entirely within that Region for both reads and writes. This approach minimizes latency and potential network errors.
DynamoDB global tables enable live data replication across Regions within seconds, supporting the read-local/write-local model. However, concurrent updates to the same item in different Regions may lead to write contention, with the most recent update prevailing (the last writer wins). If this behavior is unsuitable, alternative write strategies may be necessary.
Aurora Global Database
Implements the Read-Local/Write-Global pattern
Region 1 contains the primary cluster for global writes.
Region 2 hosts a read replica for low-latency reads.
Data replication between Regions occurs with a typical latency of less than a second.
Backups are maintained to guard against
Accidental deletions
Data corruption
Backups allow restoration to the last known good state.
Support Read-Local/Write-Global Pattern
Aurora Global Database provides a primary cluster for global writes and read-only replicas in other Regions. Using Aurora's write-forwarding feature, write requests from replicas are routed to the primary cluster over the AWS network, reducing latency.
Last updated