How Custom Partitioning and Data Co-location Work

You can customize how vFabric GemFire groups your partitioned region data with custom partitioning and data co-location.

Custom partitioning and data co-location can be used separately or in conjunction with one another.

Custom Partitioning

Use custom partitioning to group like entries into region buckets within a region. By default, GemFire assigns new entries to buckets based on the entry key contents. With custom partitioning, you can assign your entries to buckets in whatever way you want.

You can generally get better performance if you use custom partitioning to group similar data within a region. For example, a query run on all accounts created in January runs faster if all January account data is hosted by a single member. Grouping all data for a single customer can improve performance of data operations that work on customer data. Data aware function execution takes advantage of custom partitioning.

This figure shows a region with customer data that is grouped into buckets by customer.

With custom partitioning, you have two choices:
  • Standard custom partitioning. With standard partitioning, you group entries into buckets, but you do not specify where the buckets reside. GemFire always keeps the entries in the buckets you have specified, but may move the buckets around for load balancing.
  • Fixed custom partitioning. With fixed partitioning, you provide standard partitioning plus you specify the exact member where each data entry resides. You do this by assigning the data entry to a bucket and to a partition and by naming specific members as primary and secondary hosts of each partition.

    This gives you complete control over the locations of your primary and any secondary buckets for the region. This can be useful when you want to store specific data on specific physical machines or when you need to keep data close to certain hardware elements.

    Fixed partitioning has these requirements and caveats:
    • GemFire cannot rebalance fixed partition region data because it cannot move the buckets around among the host members. You must carefully consider your expected data loads for the partitions you create.
    • With fixed partitioning, the region configuration is different between host members. Each member identifies the named partitions it hosts, and whether it is hosting the primary copy or a secondary copy. You then program fixed partition resolver to return the partition id, so the entry is placed on the right members. Only one member can be primary for a particular partition name and that member cannot be the partition's secondary.

Data Co-location Between Regions

With data co-location, GemFire stores entries that are related across multiple data regions in a single member. GemFire does this by storing all of the regions' buckets with the same ID together in the same member. During rebalancing operations, GemFire moves these bucket groups together or not at all.

So, for example, if you have one region with customer contact information and another region with customer orders, you can use co-location to keep all contact information and all orders for a single customer in a single member. This way, any operation done for a single customer uses the cache of only a single member.

This figure shows two regions with data co-location where the data is partitioned by customer type.

Data co-location requires the same data partitioning mechanism for all of the co-located regions. You can use the default partitioning provided by GemFire or custom partitioning.

You must use the same high availability settings across your co-located regions.