AWS Architected Best Practice
Well-Architected
Can you confidently answer these questions when reviewing your team’s systems or applications? Sometimes you design a system, but you’re unsure if it’s well-designed — that’s where this knowledge comes in handy.
What defines a good design?
- Core elements
- Benefits
- General design principles
- Cost
- Performance
- CPU/memory configuration based on RPS (Requests Per Second)
Operational Excellence
- Security
- Reliability
- Performance efficiency
- Cost optimization
- Sustainability
Design Principles
- Faster development and deployment
- Risk mitigation or reduction
- Is it safe to open port 22 on a server?
- Should we rely on a Bash script or online guides to open well-known ports?
- With AWS Systems Manager Session Manager, you can access servers without opening port 22
SCALE UP / SCALE OUT
- SCALE UP: Vertical scaling — upgrading hardware specs like CPU/memory (UP/DOWN)
- SCALE OUT: Horizontal scaling — adding more servers with the same specifications (IN/OUT)
Best Practices for EC2
Infrastructure Perspective
- Estimate capacity first → this leads to cost estimation
- Automate response to security events: trigger automatic actions based on event or condition-based alerts
Operational Excellence
- Core focus: How the organization supports its business objectives
- Effectively running workloads, gaining operational insights, and continuously improving processes and procedures to deliver business value
Design Principles
- Over-provisioning = waste (based on peak traffic estimates)
- Under-provisioning = overload risk
- Test systems at production scale
- Use identical environments for testing to ensure stability
- Since it’s cloud-based, you can terminate unused resources (e.g., Blue/Green deployment)
- Architecture experimentation becomes easier with automation
Enabling Innovative Architectures
- MSA (Microservices) ↔ Monolith
- Use PoCs to identify better migration strategies — don’t get stuck
- Data-driven architecture
- Don’t rely on gut feeling; base decisions on data
Improvements Through Real-World Testing
- Failing to prepare for failure can lead to expensive recoveries
- Human resources should be included as part of the workload
Infrastructure as Code (IaC)
Manage infrastructure using code
Make small, reversible changes frequently
- Like Merge Requests
- Build CI/CD workflows using CodePipeline
- Canary Deployment: Deploy to a subset of servers, test and monitor, then roll out to the rest
Re-defining Operational Procedures Frequently
- Opening SSH = opening port 22
- Opening well-known ports increases the risk of hacking attempts
- AWS Session Manager allows browser-based management
- You can perform prompt-based operations just like with SSH
Failure Prediction and Response
- Build failure-tolerant architectures
- Perform health checks before routing traffic
- If a health check fails, isolate the traffic → prevent failure propagation
e.g., Case Study: Any Company
[Issue]
When Availability Zone A goes down, all services go down
→ Split services across multiple Availability Zones
Installing a database directly on EC2 increases the management burden
→ Use redundancy and dedicated instances to ensure high availability
[Best Practice Review]
- Most operations were performed manually
- Product catalog application needed a highly available architecture
- Security was the top priority
Database Replication
- Use Active/Standby setup
- Data is synchronized in real-time
- RPO (Recovery Point Objective): How often data is backed up — RPO can be zero
Reliability
Key Elements
- Recover from infrastructure or service failures
- Dynamically acquire computing resources based on demand
- Mitigate interruptions due to misconfigurations or temporary network issues
Auto Recovery From Failures
- Use Rolling, Canary, or Blue-Green deployment strategies
- Configure Auto Scaling with min:max:desired capacity settings
Horizontal Scaling
- Scale across multiple Availability Zones
- Elastic Load Balancer adds capacity automatically when traffic increases (Auto Scaling)
Security
Security involves protecting your data, systems, and assets using cloud technology and improving your overall security posture.
- Multiple teams sharing a single account
- Use separate AWS accounts for each function
Apply Security at Every Layer
Strong Identity and Access Management
- Like Git commits, changes can be tracked similarly to ChangeLogs — know when and how changes occurred
Protect Data In Transit and At Rest
- Encryption: Amazon Macie
- For key management, use AWS KMS for common resources, and AWS CloudHSM for team-specific keys
Cost Optimization
- Continuously monitor usage and operate systems that deliver business value at the lowest possible cost
- Pay-as-you-go architecture
- Minimize resources for development/test environments
Efficient Cost Management
- Measure value as workloads evolve
- Consider switching to serverless architectures
- Use tags for cost tracking: Identify where usage or costs are increasing