Transactions by Region Type

A transaction is managed on a per-cache basis, so multiple regions in the cache can participate in a single transaction. The data scope of a vFabric GemFire cache transaction is the cache that hosts the transactional data. For partitioned regions, this may be a remote host to the host running the transaction application. Any transaction that includes one or more partitioned regions is run on the member storing the primary copy of the partitioned region data. Otherwise, the transaction host is the same one running the application.
  • The member running the transaction code is called the transaction initiator.
  • The member that hosts the data—and the transaction—is called the transactional data host.

So the transactional data host may be local or remote to the transaction initiator. In either case, when the transaction commits, data distribution is done from the transactional data host in the same way.

Transactions and Partitioned Regions

In partitioned regions, transaction operations are done first on the primary data store then distributed to other members from there, regardless of which member initializes the cache operation. This is the same as is done for normal cache operations on partitioned regions.

In this figure, M1 runs two transactions.
  • The first, T1, works on data whose primary buckets are stored in M1, so M1 is both initiator and data host for the transaction.
  • The second transaction, T2, works on data whose primary buckets are stored in M2, so M1 is the transaction initiator and M2 is the transactional data host.
Transaction on a Partitioned Region:

The transaction is managed on the data host. This includes the transactional view, all operations, and all local cache event handling. In the figure, when T2 is committed, the cache on M2 is updated and the transaction events distributed throughout the system, exactly as if the transaction had originated on M2.

The first region operation in the transaction determines the transactional data host. All other operations must also work with that as their transactional data host:
  • All partitioned region data managed inside the transaction must use the transactional data host as their primary data store. In the figure above, if transaction T2 tried to put entry W or transaction T1 tried to put entry Z, they would get a TransactionDataNotColocatedException. For information on partitioning your data so it is grouped properly for your transactions, see How Custom Partitioning and Data Co-location Work. In addition, the data must not be moved during the transaction. Plan any partitioned region rebalancing to avoid rebalancing while transactions are running. See Rebalancing Partitioned Region Data.
  • All non-partitioned region data managed inside the transaction must be available on the transactional data host and must be distributed. Operations on regions with local scope are not allowed in transactions with partitioned regions.

The next figure shows a transaction that uses two partitioned regions and one replicated region. As with the single region example, all local event handling is done on the transactional data host.

For a transaction in these data keys to work, the first operation must be on one of the partitioned regions, to establish M2 as the transactional data host. Running the first operation on a key in the replicated region would establish M1 as the transactional data host, and subsequent operations on the partitioned region data would fail with a TransactionDataNotColocated exception.

Transaction on a Partitioned Region with Other Regions:

Transactions and Replicated Regions

For replicated regions, the transaction and its operations are applied to the local member and the resulting transaction state is distributed to other members according to the attributes of each region.

Note: If possible, use distributed-ack scope for your regions where you will run transactions. The REPLICATE region shortcuts use distributed-ack scope.
The region’s scope affects how data is distributed during the commit phase. Transactions are supported for these region scopes:
  • distributed-ack. Handles transactional conflicts both locally and between members. The distributed-ack scope is designed to protect data consistency. This scope provides the highest level of coordination among transactions in different members. When the commit call returns for a transaction run on all distributed-ack regions, you can be sure that the transaction’s changes have already been sent and processed. In addition, any callbacks in the remote member have been invoked.
  • distributed-no-ack. Handles transactional conflicts locally, less coordination between members. This provides the fastest transactions with distributed regions, but doesn't work for all situations. This scope is appropriate for:
    • Applications with only one writer
    • Applications with multiple writers that write to different data sets
  • local. No distribution, handles transactional conflicts locally. Transactions on regions with local scope have no distribution, but they perform conflict checks in the local member. You can have conflict between two threads when their transactions change the same entry, like object Y in this figure.

Conflicting Transactions in Distributed-Ack Regions

In this series of figures, even after the commit operation is launched, the transaction continues to exist during the data distribution (step 3). The commit does not complete until the changes are made in the remote caches and the application in M1 receives the callbacks to verify that the tasks are complete.
  • Step 1: Before commit, Transactions T1 and T2 each change the same entry in Region B within their local cache. T1 also makes a change to Region A.

  • Step 2: Conflict detected and eliminated. The distributed system recognizes the potential conflict from Transactions T1 and T2 using the same entry. T1 started to commit first, so it is allowed to continue. T2's commit fails with a conflict.

  • Step 3: Changes are in transit. T1 commits and its changes are merged into the local cache. The commit does not complete until GemFire distributes the changes to the remote regions and acknowledgment is received.

  • Step 4: After commit. Region A in M2 and Region B in M3 reflect the changes from transaction T1 and M1 has received acknowledgment. Results may not be identical in different members if their region attributes (such as expiration) are different.

Conflicting Transactions in Distributed-No-Ack Regions

These figures show how using the no-ack scope can produce unexpected results. These two transactions are operating on the same data point, in region B. Since they use no-ack scope, the conflicting changes cross paths and leave the data in an inconsistent state.
  • Step 1: As in the previous example, Transactions T1 and T2 each change the same entry in Region B within their local cache. T1 also makes a change to Region A. However, neither commit fails.

  • Step 2: Changes are in transit. Transactions T1 and T2 commit and merge their changes into the local cache. GemFire then distributes changes to the remote regions.

  • Step 3: Distribution is complete. The non-conflicting changes in Region A have been distributed to M2 as expected. For Region B however, T1 and T2 have traded changes, which is probably not the intended result.

Conflicting Transactions with Local Scope

When encountering conflicts with local scope, the first transaction to start the commit process "wins." The other transaction’s commit fails with a conflict, and its changes are dropped. In the diagram below, the resulting value for entry "Y" depends on which transaction commits first.