AWS Well-Architected framework was developed to help cloud architects build secure, high-performing, resilient and efficient infrastructure for their applications. This will enable architects to better understand the business impact of their decisions and influence future architectures.
This framework provides
- a blueprint to build resilient, scalable and secure architecture.
- a consistent approach for customers to evaluate their architecture.
- reusable architecture patterns targeting specific business outcomes.
- six main pillars which describes the key concepts, design principles and best practices to consider when operating in a cloud although the guidance is largely applicable to both cloud and on-premises environments.
- collection of workshops and hands-on labs to help you learn, measure and build using architectural best practices.
The Well-Architected Framework has identified a set of design principles to facilitate good design in the cloud:
- General design principles
- Pillar-specific design principles
General Design Principles
- stop guessing your capacity needs
- test systems at production scale
- automated with architectural experimentation in mind
- consider evolutionary architectures
- drive architectures using data
- improve through game days i.e. simulation
In addition to the framework, AWS also provides AWS Well-Architected Tool (AWS WA Tool) which provides a consistent process for measuring your architecture using AWS best practices.
- use the tool to perform an initial review of your architecture and identify improvements
- available in AWS management console
- provides a framework to evaluate the state of your applications and workloads against architectural best practices
- gain insight into your overall architectural health
- available at no additional charge
You must understand this framework to prepare for AWS Certified Solutions Architect – Associate exam.
Six Main Pillars
Each of the six pillars has
- an official definition
- design principles
- best practices by categories
- a prescriptive guide (links below under the references section)
Operational Excellence
The ability to support development and run workloads effectively, gain insight into their operations, and to continuously improve supporting processes and procedures to deliver business value.
Design Principles
- perform operations as code
- make frequent, small, reversible changes
- refine operations procedures frequently
- anticipate failure
- learn from all operational failures
- use managed services
- implement observability for actionable insights
Self-Assessment Questions
OPS 1: How do you determine what your priorities are?
OPS 2: How do you structure your organization to support your business outcomes?
OPS 3: How does your organizational culture support your business outcomes?
OPS 4: How do you implement observability in your workload?
OPS 5: How do you reduce defects, ease remediation, and improve flow into production?
OPS 6: How do you mitigate deployment risks?
OPS 7: How do you know that you are ready to support a workload?
OPS 8: How do you utilize workload observability in your organization?
OPS 9: How do you understand the health of your workload?
OPS 10: How do you manage workload and operations events?
OPS 11: How do you evolve operations?
Security
The ability to take advantage of cloud technologies to protect data, systems and assets while delivering business value through risk assessments and mitigation strategies.
Design Principles
- implement a strong identity foundation
- maintain traceability
- apply security at all layers
- automate security best practices
- protect data in transit and at rest
- keep people away from data
- prepare for security events
Self-Assessment Questions
SEC 1: How do you securely operate your workload?
SEC 2: How do you manage authentication for people and machines?
SEC 3: How do you manage permissions for people and machines?
SEC 4: How do you detect and investigate security events?
SEC 5: How do you protect your network resources?
SEC 6: How do you protect your compute resources?
SEC 7: How do you classify your data?
SEC 8: How do you protect your data at rest?
SEC 9: How do you protect your data in transit?
SEC 10: How do you anticipate, respond to, and recover from incidents?
SEC 11: How do you incorporate and validate the security properties of applications throughout the design, development and deployment lifecycle?
Reliability
The ability of a workload to perform its intended function correctly and consistently when it’s expected to. This includes the ability to recover from infrastructure or service failures, operate and test the workload through its total lifecycle.
Design Principles
- automatically recover from failure
- test recovery procedures
- scale horizontally to increase aggregate workload availability
- stop guessing capacity
- manage change through automation
Self-Assessment Questions
REL 1: How do you manage service quotas and constraints?
REL 2: How do you plan your network topology?
REL 3: How do you design your workload service architecture?
REL 4: How do you design interactions in a distributed system to prevent failures?
REL 5: How do you design interactions in a distributed system to mitigate or withstand failures?
REL 6: How do you monitor workload resources?
REL 7: How do you design your workload to adapt to changes in demand?
REL 8: How do you implement change?
REL 9: How do you back up data?
REL 10: How do you use fault isolation to protect your workload?
REL 11: How do you design your workload to withstand component failures?
REL 12: How do you test reliability?
REL 13: How do you plan for disaster recovery (DR)?
Performance Efficiency
The ability to use computing resources efficiently to meet system requirements and to maintain that efficiency as demand changes and technologies evolve.
Design Principles
- democratize advanced technologies
- go global in minutes
- use serverless architectures
- experiment more often
- consider mechanical sympathy
Self-Assessment Questions
PERF 1: How do you select appropriate cloud resources and architecture patters for your workload?
PERF 2: How do you select and use your compute resources in your workload?
PERF 3: How do you store, manage and access data in your workload?
PERF 4: How do you select and configure networking resources in your workload?
PERF 5: How do your organizational practices and culture contribute to performance efficiency in your workload?
Sustainability
The ability to continually improve sustainability impacts by reducing energy consumption and increasing efficiency across all components of a workload by maximizing the benefits from the provisioned resources and minimizing the total resources required.
Design Principles
- understand your cloud workload impact
- establish sustainability goals
- maximize utilization
- anticipate and adopt new, more efficient hardware and software offerings
- use managed services
- reduce the downstream impact of your cloud workloads
Self-Assessment Questions
SUS 1: How do you select Regions for your workloads?
SUS 2: How do you align cloud resources to your demand?
SUS 3: How do you take advantage of software and architecture patterns to support your sustainability goals?
SUS 4: How do you take advantage of data access and usage patterns to support your sustainability goals?
SUS 5: How do your select and use cloud hardware and services in your architecture to support your sustainability goals?
SUS 6: How do your organizational processes support your sustainability goals?
Cost Optimization
The ability to avoid or eliminate unneeded cost or suboptimal resources and deliver business value at the lowest price point.
Design Principles
- implement cloud financial management
- adopt a consumption model
- measure overall efficiency
- stop spending money on undifferentiated heavy lifting
- analyze and attribute expenditure
Self-Assessment Questions
COST 1: How do you implement cloud financial management?
COST 2: How do you govern usage?
COST 3: How do you monitor your cost and usage?
COST 4: How do you decommission resources?
COST 5: How do you evaluate cost when you select services?
COST 6: How do you meet cost targets when you select resource type, size and number?
COST 7: How do you use pricing models to reduce cost?
COST 8: How do you plan for data transfer charges?
COST 9: How do you manage demand, and supply resources?
COST 10: How do you evaluate new services?
COST 11: How do you evaluate the cost of effort?
References:
- Operatonal Excellence
- Security
- Reliability
- Performance Efficiency
- Sustainability
- Cost Optimization
- Well-Architected Tool
- Well-Architected Labs