AWS Fundamentals: RDS + Aurora + ElastiCache
I. Amazon RDS Overview
- RDS stands for Relational Database Services
- It's managed Database Service for DB use SQL as s Query Language
- It allows you to create Databases in the cloud that managed byAWS
- Postgres
- MySQL
- Maria DB
- Oracle
- Microsoft SQL server
- Aurora (AWS proprietary database)
Advantage over using RDS versus deploying DB on EC2
- RDS is a managed service:
- Automated provisioning, OS patching
- Continuous backups and restore to specific timestamp (Point in Time Restore)!
- Monitoring dashboards
- Read replicas for improved read performance
- Multi AZ setup for DR (Disaster Recovery)
- Maintenance windows for upgrades
- Scaling capability (verical and horizontal)
- Storage backed by EBS (gp2 or io1)
- BUT you can't SSH ino your instances
RDS - Storage Auto Scaling
- Helps you increase storage on your RDS DB instance dynamically
- When RDS detects you are running out of free database storage, it scales automatically
- Avoid manually scaling your database storage
- You have to set Maximun Storage Threshold (maximum limit for DB storage)
- Automatically modify storage if:
- Free storage is less than 10% of allocated storage
- Low-storage lasts at leat 5 minutes
- 6 hours have passed since last modification
- Useful for applications with unpredictable workloads
- Support all RDS database engines (MariaDB, MySQL, PostgreSQL, SQL server, Oracle)
1. RDS Read Replicas for read scalability
- Up to 15 Read Replicas
- Within AZ, Cross AZ or Cross Region
- Replication is Async, so reads are eventually consistent
- Replicas can be promoted to their own DB
- Applications must update the connection string to leverage read replicas
- You have a production Database that is taking on normal load
- You want to run a report application to some analytics
- You create a Read Replica to run the new workload there
- The production application is unaffected
- Read replicas are used for SELECT (=read) only kind of statements (not INSERT, UPDATE, DELETE)
3. RDS Read Replicas - Network Cost
- In AWS there's a network cost when data goes from one AZ to other
- For RDS Read Replicas within the same region, you don't pay that free
- Sync replication
- One DNS name - automatic app failover to standby
- Increase availability
- Failover in case of loss of AZ, loss of network, instance or storage failure
- No manual intervention in apps
- Not used for scaling
- Note: The Read Replicas be setup as Multi AZ for Disaster Recovery (DR)
- Zero downtime operation (no need to stop the DB)
- Just click on "modify" for the database
- The following happens internally:
- A snapshot is taken
- A new DB is restored for the snapshot in new AZ
- Synchronization is established between the two databases
III. Amazon RDS Hands on
Setup in free tier
Delete
modify this
IV. Amazon Aurora
- Aurora is a proprietary technolochy from AWS (not open source)
- Postges and MySQL are both supported as Aurora DB(that means your drivers will work as if Aurora was a Postgres or MySQL database)
- Aurora is "AWS cloud optimized) and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
- Aurora storage automatically grows in increments of 10GB, up to 128TB
- Aurora can have up to 15 replicas and replication process is faster than MySQL (sub 10ms replica lag)
- Failover in Aurora is instantaneous, It's HA native
- Aurora costs more than RDS (20% more) - but is more efficient
Aurora hight availability and read scaling
- 6 copies of your data across 3 AZ:
- 4 copies out of 6 need for writes
- 3 copies out of 6 need for reads
- Self healing with peer-to-peer replication
- Storage is striped across 100s of volums
- One Arora instance takes writes (master)
- Automatic failover for master in less than 30seconds
- Master + upto 15 Aurora Read Replicas serve reads
- Support for Cross Region Replication
- Automatic fail-over
- Backup and recovery
- Isolation and security
- Industry compliance
- Push-button scaling
- Automated patching with Zero Downtime
- Advanced Monitoring
- Routine Maintenance
- Backtrack: restore data at any point of time without using backups
V. Amazon Aurora - Hands on
Similar to other RDS
VI. RDS & Aurora Security
- At-rest encryption:
- Database master & replicas encryption using AWS KMS - must be defined as lunch time
- If the master is not encrypted, the read replicas can not encrypted
- To encrypt an uncrypted database, go through an DB snapshot & restore as encrypted
- In-flight encryption: TLS-ready by default, use the AWS TSL root certificates client-side
- IAM authentication: IAM roles to connect your database (instead of username/pw)
- Security Groups: Control network access to your RDS / Aurora DB
- No SSH available except on RDS Custom
- Audit Logs can be enabled and sent to CloudWatch Logs for longer retention
VII. RDS Proxy
- Fully managed database proxy for RDS
- Allows apps to pool and share DB connections established with database
- Improve database efficiency by reducing the stress on database resources (e.g., CPU, RAM) and minize open connections (and timeouts)
- Serverless, auto scaling, hightly available (multi AZ)
- Reduced RDS & Aurora failover time by up 66%
- Supports RDS (MySQL, PostgreSQL, MariaDB, MS SQL Server) and Aurora (MySQL, PostgresSQL)
- No code changes required for most apps
- Enforce IAM Authentication for DB, and securely store credentials is AWS Secrets Manager
- RDS Proxy is never publicly accessible (must be accessed from VPC)
VIII. ElasticCache Overview
- The same way RDS is to get managed Relational Databasees...
- ElastiCache is to get managed Redis or Memcached
- Caches are in-memory databases with really high performance, low latency
- Helps reduce load off of databases for read intensive workloads
- Help make your application stateless
- AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups
- Using ElasticCache involves heavy application code changes
ElasticCache - Solution Architecture - DB Cache
- Applications queries ElastiCache, if not available, get from RDS and store in ElastiCache
- Helps relieve load in RDS
- Cache must have an invalidation strategy to make sure only the most current data is used in there
ElasticCache - Solution Architecture - User Sesstion Store
- User logs into any of the application
- The application writes the session data into ElastiCache
- The user hits another instance of our application
- The instance retrieves the data and the user is ready to logged in
IX. ElasticCache - Hands on
Severless
X. ElasticCache - Strategies
1. Caching Implementation Considerations
- AWS documentation: Caching Best Practices
- Is it safe to cache data? Data may be out of date, eventually consistent
- Is caching effective for that data?
- Pattern: data change slowly, few keys are frequently needed
- Anti patterns: data changing rapidly, all large key space frequently needed
- Is data structured well for caching?
- example: key value caching, or caching of aggregations results
- Which caching design pattern is the most appropriate?
2. Lazy Loading / Cache-Aside / Lazy Population
- Pros:
- Only requested data is cached ( the cache isn't filled up with unused data)
- Node failures are not fatal (just increased latency to warm the cache. It means that all the reads have to go to RDS and then be cached)
- Cons:
- Cache miss penalty that results in 3 round trips, noticeable delay for that request
- Stale data: data can be updated in the database and outdated in the cache
3. Write Through - Add or Update cache when database is updated
- Pros:
- Data in cache is never stale, reads are quick
- Write penalty vs Read penalty (each write requires 2 calls)
- Cons:
- Missing Data until it is added/ updated in the DB. Mitigation is to implement Lazy Loading strategy as well
- Cache churn - a lot of the data will never read
- Cache eviction can occur in three ways:
- you delete the item explicitly in the cache
- Item is evicted because the memory is full and it's not recently used (least recently used LRU)
- You set an item time-to-live (TTL)
- TTL helpfull for any kind of data
- Leaderboards
- Comments
- Activity streams
- TTL can range from few seconds to hours or days
- If to many evictions happen due to memory, you should scale up or out
5. Final words of wisdom
- Lazy loading / Cache aside is easy to implement and works for many situations as a foundation, especially on the read side
- Write-through is usually combined with Lazy loading as targeted for the queries or workloads that benefit from this optimization
- Setting a TTL is usually not a bad idea, except when you are using Write-through. Set it to a sensible value for your application
- Only cache the data that makes sense (user profiles, blogs, etc, ...)
- Quote: There are only two hard things in Computer Science: cache invalidation and naming things
XI. Amazon MemoryDB for Redis - Overview
- Redis-compatible, durable, in memory database service
- Ultra-fast performance with over 160millions RPS (request per second)
- Durable in-memory data storage with Multi-AZ transactional log
- Scale seamlessly from 10s GBs to 100s TBs of storage
- Use cases: web and mobile apps, online gaming, media streaming, ...
Comments
Post a Comment