Stack Overflow receives more than 670 million page views per month and they achieve sub-second performance for all their pages. This is truly commendable. The more admirable fact here is that they achieve this with just handful of servers. Following is the technical stack they use
- Net MVC
- SQL Server
- Elastic Search
Below is there Hardware configuration
- 4 SQL servers (2 clusters, SO is on one cluster, everything else on another) – 384 GB RAM, 2.4 TB, SSD storage
- 9 Web Servers – 64 GB RAM
- 2 Redis Server (Master/Slave) – 128 GB
- 3 Tag Engine server – 64 GB
- 3 elastic search server – 196 GB Load balanced
You can observe that they have a scale up strategy versus a scale out and sure it is working well for them. They attribute this entirely to the efficiency of their code. I like their performance quotes “Performance is a feature” and “Make performance a matter of (public) pride”. They also monitor performance of every request and track it continuously to a detailed extent.
- Scale Up and then Scale Out: You need to find the right point at which you need to stop scaling up and start scaling out. From a manageability perspective few servers are better than more, but there is always a point beyond which the performance is not going to improve even if you scale up. Finding that point is crucial.
- Use the right tools for the right work: Use DB for storage and straight forward reads and writes alone, use Search where necessary. Having all the business logic in DB is one common mistake I have seen in software architectures which is a clear path to scalability and performance nightmares. Similarly, technologies such as search services are best optimized for high volume search Use cases and adopting that will not only put lesser load on the DB but also optimize your search workload.
- Cache wisely to keep the DB hits lower: Right caching is mandatory for performance. In fact a combination of local plus distributed caching yields best performance in my opinion. The most used and non-volatile data can reside in your local cache where your application server resides. This can be synched using a distributed cache when a change happens. You need to be careful about the memory growth here because, if you have everything local all you appservers are going to need that high of memory. Striking a balance is important here.
- SSD works great for write heavy load on DBs (SO’s write latency is almost 0 seconds): DB storage needs to be best optimized for writes and reads. SSD is the default storage for DB that many adopt and is proven.
- Keep the HTML and CSS foot print low and use CDN where required: This is the easiest to do but many forget. Having a low footprint consumes lower bandwidth and is transferred fast over internet. Minification and compression can be used to reduce the foot print. If you have a global presence, adopting CDN is highly recommended
- Profile right from development and monitor post production: Measure, Tune, Measure, and Tune. This cycle is the mantra of achieving performance. If you are a project manager of a product requiring high performance and you don’t have enough time in the plan for performance benchmarking and tuning, you are going to be in trouble. Do not expect your best performance in the first cut. You need to tune to get there.
- Make performance a culture for developers: This is the toughest part to achieve. Developers should not consider their task done until they test and approve their task for the required performance. This comes through proper process and practise.
For those who think that performance and scale is just a hardware business, Stack overflow is a lesson certainly. Performance and scale is a team work and if done well can be achieved without spending lots of money.