Scalability at the database layer has always been a challenging task for architects. Sadly, this also happens to be one of the areas that is typically underestimated and later leading to major issues. My experience with databases is that everything may look pretty until you reach that threshold of data which is when you start seeing things falling apart.

 Therefore, its very important from an (datbase) architecture point of view to design a scalable solution that can address not just the current need(or load) but also can take care of future needs.

Primary bottleneck for DBs are Input/Output(I/O) opeations, CPU and memory. Therefore, its essential to isolate these bottlenecks by distributing the data and resources required for performing these operations..

There are broadly 2 approaches that are followed to achieve the above,

Horizontal Scaling – In this approach the data present in a table is cut out horizontally (that is by rows). For example, if there are 1000 records in a table where Dept A has 200, Dept B has 300 and Dept C has 500. Then, we are talking about logically separating the data by say Dept, in which case we will have 3 sets of data each containing their respective department records. Therefore, it’s much more easier to operate on these department records instead of the whole table. Please note that all the sets of data will have the same set of columns, it’s just that they are grouped/separated by rows. This technique is referred as ‘database sharding’

While the above can be designed by the architects it’s a lot of overhead to have your own technique to separate the data, identify which logical group to operate depending on the data being requested, etc.

Oracle came up with RAC (Real Application Clusters) which completely takes care of horizontal scaling, and has provisions to implement sharding techniques. Recently Microsoft has released SQL Azure Federation feature which supports live sharding of data.

The general purpose database requirements that are a fit for sharding include:

  • High-transaction database applications
  • Mixed workload database usage
    • Frequent reads, including complex queries and joins
    • Write-intensive transactions (CRUD statements, including INSERT, UPDATE, DELETE)
    • Contention for common tables and/or rows
  • General Business Reporting
    • Typical “repeating segment” report generation
    • Some data analysis (mixed with other workloads)

Vertical Scaling – In this approach, the data grouping happens in a vertical fashion (by columns). For example, let’s take an insurance table, which may possibly contain hunderds of columns. However, only certain columns may be accessed more frequently and therefore more processing is required for them. In Vertical scaling you logically group the data by columns – less used columns vs frequently used columns. This way you can have different processing capability associated with the groups (obiviously more processing for frequently used columns). You can view this more or less like a denormalization. As far I know there aren’t any direct support from database vendors to facilitate this feature. Therefore, the architect will have to put together their own solution to make it work.

 We will get in to more details of specific horizontal features from Oracle and Microsoft SQL in the coming blogs.