Example: Adapting a Database Schema for SQLFire

If you have an existing database design that you want to deploy to SQLFire, translate the entity-relationship model into a physical design that is optimized for SQLFire design principles.

The goal is to identify tables to partition or replicate in the SQLFire cluster, and determine the partitioning key(s) for partitioned tables. This generally requires an iterative process to produce the most optimal design.

Procedure
  1. Read Guidelines for Adapting a Database to SQLFireand the preceding sections under Designing vFabric SQLFire Databases.
  2. Evaluate data access patterns to define entity groups that are candidates for partitioning. See Step 1: Determine the entity groups.

  3. Identify all tables in the entity groups. See Step 2: Identify the tables in each entity group.

  4. Identify the "partitioning key" for each partitioned table. The partitioning key is the column or set of columns that are common across a set of related tables. See Step 3: Define the partitioning key for each group.

  5. Identify the tables that are candidates for replication. You can replicate table data for high availability, or to co-locate table data that is necessary to execute joins. See Step 4: Identify replicated tables.

Guidelines for Adapting a Database to SQLFire

Follow these guidelines for designing a SQLFire database or adapting an existing database to SQLFire:
  • Focus your efforts on commonly-joined entities. Remember that all join queries must be performed on data that is co-located. In this release, SQLFire only supports joins where the data is co-located. Co-located data is also important for transaction updates, because the transaction can execute without requiring distributed locks in a multi-phase commit protocol.
  • After you locate commonly-joined entities, look for parent-child relationships in the joined tables. The primary key of a root entity is generally also the best choice for partitioning key.
  • Understand the trade-offs associated with different partitioning and co-location strategies. The steps that follow describe how to evaluate a customer order management system.

This example shows tables from the Microsoft Northwind Traders sample database.



Step 1: Determine the entity groups

Entity groups are generally course-grained entities that have children, grand children, and so forth, and they are commonly used in queries. This example chooses these entity groups:
Entity group Description
Customer This group uses the customer identity along with orders and order details as the children.
Product This group uses product details along with the associated supplier information.

Step 2: Identify the tables in each entity group

Identify the tables that belong to each entity group. In this example, entity groups use the following tables.
Entity group Tables
Customer

Customers

Orders

Shippers

Order Details

Product

Product

Suppliers

Category

Step 3: Define the partitioning key for each group

In this example, the partitioning keys are:

Entity group Partitioning key
Customer CustomerID
Product ProductID

This example uses customerID as the partitioning key for the Customer group. The customer row and all associated orders will be collocated into a single partition. To explicitly colocate Orders with its parent customer row, use the colocate with clause in the create table statement:

create table orders (<column definitions, constraints>)
partition by (customerID)
colocate with (customers);

Create the OrderDetails table in a similar fashion. In this way, SQLFire supports any queries that join any of Customer, Orders, and OrderDetails. This join query would be distributed to all partitions and executed in parallel, with the results streamed back to the client:

select * from customer c , orders o where c.customerID = o.customerID;

A query such as this would be pruned to the single partition that stores "customer100" and executed only on that SQLFire member:

select * from customer c, orders o where c.customerID = o.customerID 
and c.customerID = 'customer100';

The optimization provided when queries are highly selective comes from engaging the query processor and indexing on a single member rather than on all partitions. With all customer data managed in memory, query response times are very fast. Consider how the above query would execute if the primary key was not used to partition the table. In this case, the query would be routed to each partition member where an index lookup would be performed, even though only a single member might have any data associated with the query.

Finally, consider a case where an application needs to access customer order data for several customers:

 select * from customer c, orders o 
where c.customerID = o.customerID and c.customerID IN ('cust1', 'cust2', 'cust3');

Here, SQLFire prunes the query execution to only those partitions that host 'cust1', 'cust2', and 'cust3'. The union of the results is then returned to the caller.

Step 4: Identify replicated tables

If we assume that the number of categories and suppliers rarely changes, those tables can be replicated in the SQLFire cluster (replicated to all of the SQLFire members that host the entity group). If we assume that the Products table does change often and can be relatively large in size, then partitioning is a better strategy for that table.

So for the product entity group, table Products is partitioned by ProductID, and the Suppliers and Categories tables are replicated to all of the members where Products is partitioned.

Applications can now join Products, Suppliers and categories. For example:

select * from Products p , Suppliers s, Categories c 
where c.categoryID = p.categoryID and p.supplierID = s.supplierID 
and p.productID IN ('someProductKey1', ' someProductKey2', ' someProductKey3');

In the above query, SQLFire prunes the query execution to only those partitions that host 'someProductKey1', ' someProductKey2', and ' someProductKey3.'