Learning AWS
上QQ阅读APP看书,第一时间看更新

Designing for multi-tenancy

The major benefit of multi-tenancy is cost saving due to infrastructure sharing and the operational efficiency of managing a single instance of the application across multiple customers or tenants. However, multi-tenancy introduces complexity. Issues can arise when a tenant's action or usage affects the performance and availability of the application for other tenants on the shared infrastructure. In addition, security, customization, upgrades, recovery, and many more requirements of one tenant can create issues for other tenants as well.

Multi-tenancy models may lie anywhere from the share-nothing to share-everything continuum. While technical ease may be a key factor from the IT department's perspective, the cloud architect should never lose sight of the business implications and costs of selecting the approach to multi-tenancy.

Whatever the multi-tenancy model, the data architecture needs to ensure robust security, extensibility, and scalability in the data tier. For example, storing a particular customer's data in a separate database leads to the simplest design and development approach. Having data isolation is the easiest and the quickest to both understand and explain to your customers.

Note

It is very tempting to offer tenant-specific customizations when each tenant's data is stored in separate databases. However, this is primarily done to separate data and associated operations, and not to arbitrarily allow dramatic changes to the database schema per tenant.

In this model, suitable metadata is maintained to link each database with the correct tenant. In addition, appropriate database security measures are implemented to prevent tenants from accessing other tenants' data. From an operations perspective, backups and restores are simpler for separate databases, as they can be executed without impacting other customers. However, this approach can and will lead to higher infrastructure costs.

Typically, you would offer this approach to your bigger customers who might be more willing to pay a premium to isolate their data. Larger enterprise customers prefer database isolation for higher security, or in some cases, to comply with their security policies. Such customers might also have a higher need for customizations.

Tip

While architecting multi-tenanted applications, pay particular attention to the expected number of tenants, storage per tenant, expected number of concurrent users, regulatory and policy requirements, and many more. If any of these parameters are heavily skewed in favor of a particular tenant, then it might be advisable to isolate their data.

We can define a separate database schema for each of the tenants (within the same database server instance) for applications having a limited number of database tables. This approach is relatively simple to implement, and offers flexibility for custom tables to be defined per tenant. However, data restore for a particular tenant can impact other tenants hosted on the same database instance, but this approach can reduce costs while separating out the data of each tenant.

In a shared database, with a shared schema approach, the costs are minimized, but the complexity of the application is much higher. This model works well for cost conscious customers. However, restoring a customer's data is complicated, as you will be restoring specific rows belonging to a specific tenant. This operation can impact all other tenants using the shared database.

In cloud architectures, the main factors to consider while designing multi tenancies are the security, extensibility, and scalability. In addition, multi-tenancy brings additional complexity from a DevOps perspective, and we need to ensure that we are able to effectively manage upgrades and troubleshoot, bugs and maintain high service levels and operations' efficiency.

Data security

There are two levels of security to be considered—at the tenant level (typically, an organization) and the end-user level, who is a member or an employee of a given tenant. In order to implement a security model, you need to create a database access account at the tenant level. This account can specify (using ACLs) the database objects accessible to a specific tenant. Then at the application level, you can prevent users from accessing any data they are not entitled to. A security token service can be used to implement the access at the tenant level.

When multi-tenancy is realized by having separate databases or separate schemas per tenant, you can restrict access at the database or the schema level for a particular tenant. The following diagram depicts a very common scenario, where both these models are present in a single database server instance:

If database tables are shared across tenants, then you need to filter data access by each tenant. This is accomplished by having a column that stores a tenant ID per record (to clearly identify records that belong to a specific tenant). In such a schema, a typical SQL statement will contain a where clause based on the tenant ID being equal to the security ID of the user account, namely an account belonging to the tenant.

Aside from database level security, organizational policies or regulatory requirements can mandate securing your data at rest. There are several options available from the cloud service provider and third-party vendors for implementing encryption to protect your data. These range from manual ones implemented on the client-side to fully automated solutions. This topic will be discussed in detail in Chapter 6, Designing for and Implementing Security.

Regardless of the approach, it is a good practice to encrypt sensitive data fields in your cloud database and storage. Encryption ensures that the data remains secure, even if a nonauthorized user accesses it. This is more critical for shared database/schema model. In many cases, encrypting a database column that is part of an index can lead to full table scans. Hence, try not to encrypt everything in your database, as it can lead to poor performance. It is therefore important to carefully identify sensitive information fields in your database, and encrypt them more selectively. This will result in the right balance between security and performance.

Tip

It is a good idea to store a tenant ID for all records in the database and encrypt sensitive data regardless of which approach you take for implementing data multi-tenancy. A customer willing to pay a premium for having a separate database might want to shift to a more economical shared model later. Having a tenant ID and encryption already in place can simplify such a migration.

Data extensibility

Having a rigid database schema will not work for you across all your customers. Customers have their specific business rules and supporting data requirements. They will want to introduce their own customizations to the database schema. You must ensure that you don't change your schema for a tenant so much that your product no longer fits into the SaaS model. But you do want to bake in sufficient flexibility and extensibility to handle custom data requirements of your customers (without impacting subsequent product upgrades or patch releases).

One approach to achieve extensibility in the database schema is to preallocate a bunch of extra fields in your tables, which can then be used by your customers to implement their own business requirements. All these fields can be defined as string or varchar fields. You also create an additional metadata table to further define a field label, data type, field length, and so on for each of these fields on a per tenant basis. You can choose to create a metadata table per field or have a single metadata table for all the extra fields in the table. Alternatively, you can introduce an additional column for the table name, to have a common table describing all custom fields (for each tenant) across all the tables in the schema.

This approach is depicted in the following figure. Fields 1 to 4 are defined as extra columns in the customer table. Further, the metadata table defined the field labels and data types:

A second approach, takes a name-value pair approach, where you have a main data table that points to an intermediate table containing the value of the field, and a pointer to a metadata table that contains the field label, data type, and such information. This approach cuts out potential waste and does not limit the number of fields available for customization as in the first approach, but is obviously more complicated to implement.

A variation on the preceding two approaches is to define an extra field per table, and store all custom name-value pairs per tenant in an XML or JSON format, as shown in the following figure:

A third approach is to add columns per tenant as required. This approach is more suitable in the separate database or separate schema per tenant models. However, this approach should generally be avoided as it leads to complexity in application code that is, handling arbitrary number of columns in a table per tenant. Further, it can lead to operational headaches during upgrades.

Note

You will need to design your database schema carefully for providing custom extensions to your tenants, as this can have a ripple effect on the application code and the user interface.

In this section, we have primarily covered multi-tenant approaches for relational databases. Depending on your particular application requirements, for instance, type and volume of data, and types of database operations, a NoSQL database can be a good data storage solution. NoSQL databases use nontabular structures, such as key-value pairs, graphs, documents, and so on to store data. The design techniques in such cases would depend on your choice of NoSQL database.

Application multi-tenancy

In addition to introducing a tenant ID column in the database, if the application has web service interfaces, then these services should also include the tenant ID parameter in its request and/or response schemas. To ensure smooth transition between shared and isolated application instances, it is important to maintain tenant IDs in the application tier. In addition, tenant aware business rules can be encoded in a business rules engine, and tenant specific workflows can be modeled in multi-tenanted workflow engine software, using Business Process Execution Language (BPEL) process templates.

In cases where you end up creating a tenant-specific web service, you will need to design it in a manner that least impacts your other tenants. A mediation proxy service that contains routing rules can help in this case. This service can route the requests from a particular tenant's users (specified by the tenant ID in the request) to the appropriate web service implemented for that tenant.

Similarly, the frontend or the UI is also configured for each tenant to provide a more customized look and feel (for example, CSS files per tenant), tenant specific logos, and color schemes. For differences in tenant UIs, portal servers can be used to serve up portlets, appropriately.

If different service levels need to be supported across tenants, then an instance of the application can be deployed on separate infrastructure for your high-end customers. The isolation provided at the application layer (and the underlying infrastructure) helps avoid tenants impacting each other, by consuming more CPU or memory resources than originally planned.

Logging also needs to be tenant-aware (that is, use tenant ID in your record format). You can also use other resources such as queues, file directories, directory servers, caches, and so on for each of your tenants. These can be done in a dedicated or separated out application stacks (per tenant). In all cases, make use of the tenant ID filter for maximum flexibility.

Other application multi-tenancy-related issues include tenant-specific notifications, new tenant provisioning and decommissioning, and so on.