Implement a Partitioning Strategy

In this procedure you drop all tables in the ToursDB schema and then recreate them using a new partitioning and replication strategy.

The ToursDB schema in this tutorial is similar to a 'STAR' schema, having only a few fact tables and several dimension tables. Dimension tables are generally small and change infrequently, but are commonly used in join queries. Dimension tables are good candidates for replicating across SQLFire members, because join queries can execute in parallel.

The AIRLINES, CITIES, COUNTRIES, and MAPS tables are treated as dimension tables, and are replicated across the SQLFire cluster. In the tutorial it is assumed that applications frequently join these related tables based on the FLIGHT_ID column, which is chosen as the partitioning column.

FLIGHTS, FLIGHTS_HISTORY, and FLIGHTAVAILABILITY are fact tables, and they will be partitioned. You will co-locate these tables to ensure that all rows that are associated with FLIGHT_ID are maintained in a single partition. This step ensures that frequent join queries based on a selected flight are pruned to a single member and executed efficiently.

Procedure
  1. In a separate terminal window or GUI editor, open the create_colocated_schema.sql file in the vFabric_SQLFire_10x/quickstart directory to examine the included DDL commands. The SQL script begins by dropping the existing tables in the schema:

    DROP TABLE AIRLINES;
    DROP TABLE CITIES;
    DROP TABLE COUNTRIES;
    DROP TABLE FLIGHTAVAILABILITY;
    DROP TABLE FLIGHTS;
    DROP TABLE MAPS;
    DROP TABLE FLIGHTS_HISTORY;
    

    Dimension tables can be replicated using the same basic CREATE statement from the preceding section of the tutorial. However, this script explicitly adds the REPLICATE keyword for clarity. For example:

    CREATE TABLE AIRLINES
       (
          AIRLINE CHAR(2) NOT NULL CONSTRAINT AIRLINES_PK PRIMARY KEY,
          AIRLINE_FULL VARCHAR(24),
          BASIC_RATE DOUBLE PRECISION,
          DISTANCE_DISCOUNT DOUBLE PRECISION,
          BUSINESS_LEVEL_FACTOR DOUBLE PRECISION,
          FIRSTCLASS_LEVEL_FACTOR DOUBLE PRECISION,
          ECONOMY_SEATS INTEGER,
          BUSINESS_SEATS INTEGER,
          FIRSTCLASS_SEATS INTEGER
       ) REPLICATE;
    

    The FLIGHTS table is partitioned based on the FLIGHT_ID column:

    CREATE TABLE FLIGHTS
       (
          FLIGHT_ID CHAR(6) NOT NULL ,
          SEGMENT_NUMBER INTEGER NOT NULL ,
          ORIG_AIRPORT CHAR(3),
          DEPART_TIME TIME,
          DEST_AIRPORT CHAR(3),
          ARRIVE_TIME TIME,
          MEAL CHAR(1),
          FLYING_TIME DOUBLE PRECISION,
          MILES INTEGER,
          AIRCRAFT VARCHAR(6),
          CONSTRAINT FLIGHTS_PK PRIMARY KEY (
                                FLIGHT_ID,
                                SEGMENT_NUMBER),
          CONSTRAINT MEAL_CONSTRAINT
              CHECK (meal IN ('B', 'L', 'D', 'S'))
    
       )
       PARTITION BY COLUMN (FLIGHT_ID);
    

    The remaining facts tables are also partitioned, and also colocated with the FLIGHTS table. For example:

    CREATE TABLE FLIGHTAVAILABILITY
       (
          FLIGHT_ID CHAR(6) NOT NULL ,
          SEGMENT_NUMBER INTEGER NOT NULL ,
          FLIGHT_DATE DATE NOT NULL ,
          ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0,
          BUSINESS_SEATS_TAKEN INTEGER DEFAULT 0,
          FIRSTCLASS_SEATS_TAKEN INTEGER DEFAULT 0,
          CONSTRAINT FLIGHTAVAIL_PK PRIMARY KEY (
                                      FLIGHT_ID,
                                      SEGMENT_NUMBER,
                                      FLIGHT_DATE),
          CONSTRAINT FLIGHTS_FK2 Foreign Key (
                FLIGHT_ID,
                SEGMENT_NUMBER)
             REFERENCES FLIGHTS (
                FLIGHT_ID,
                SEGMENT_NUMBER)
    
       )
       PARTITION BY COLUMN (FLIGHT_ID)
       COLOCATE WITH (FLIGHTS);
    
  2. In the sqlf session, execute the create_colocated_schema.sql script to drop the existing tables and recreate them with the new partitioning and replication strategy. Execute loadTables.sql to populate the tables with data:

    run 'create_colocated_schema.sql';
    run 'loadTables.sql';
  3. Confirm that the tables are created:

    show tables in APP;
  4. Verify whether individual tables are replicated or partitioned:
    select tablename, datapolicy from sys.systables where tableschemaname='APP';
  5. The FLIGHTS table and others are now partitioned across the SQLFire cluster. Query the sys.members table again to display the member IDs that host the table:

    select id from sys.systables s, sys.members where s.tablename='FLIGHTS';
  6. Now use the DSID Function to see how many rows of the partitioned FLIGHT table are stored on that SQLFire server. For example:

    select count(*) memberRowCount, dsid() from flights group by dsid();
  7. Execute a join on both partition members in parallel.

    select * from flights f, flightavailability fa 
         where f.flight_id = fa.flight_id and f.flight_id = 'AA1116';
    The combined results are returned. Because the table is partitioned by FLIGHT_ID, the execution of the join is pruned to the partition that stores the value 'AA1116.' You can verify that flight_id 'AA1116' is located on only one data store using the query:
    select count(*), dsid() from flights where flight_id = 'AA1116';