How Multi-site Event Distribution Works

vFabric GemFire distributes a subset of cache events between distributed systems, with a minimum impact on each system's performance. Events are distributed only for the regions that you configure for gateway distribution.

Operation Distribution Through the Gateways

In regions that are configured with enable-gateway set to true, events are automatically forwarded to the gateway hub for distribution to other sites. The events are placed in a gateway queue and distributed asynchronously to remote sites. Ordering of events sent between sites is preserved.

If the queue becomes too full, it is overflowed to disk to keep the member from running out of memory. You can optionally configure the queue to be persisted to disk (with the enable-persistence gateway-queue property). With persistence, if the member managing the queue goes down, it will pick up where it left off once it is restarted.

The multi-site installation is designed for minimal impact on distributed system performance, so only the farthest-reaching entry operations are distributed between sites.

These operations are distributed:
  • entry create
  • entry put
  • entry distributed destroy, providing the operation is not an expiration action
These operations are not distributed:
  • get
  • invalidate
  • local destroy
  • expiration actions of any kind
  • region operations

If you are using a GatewayEventListener, it will also not process any of the above listed non-distributable events.

How the Gateway Processes Its Queue

Each primary gateway contains a processor thread that reads messages from the queue, batches them, and distributes them to the connected site. To process the queue, the thread takes the following actions:
  1. Reads messages from the queue
  2. Creates a batch of the messages
  3. Synchronously distributes the batch to the other site and waits for a reply
  4. Removes the batch from the queue once the other site has successfully replied

Because the batch is not removed from the queue until after the other site has replied, the message cannot get lost. On the other hand, in this mode a message could be processed more than once. If a site goes offline in the middle of processing a batch of messages, then that same batch will be sent again once the site is back online. The batch is configurable via the batch size and batch time interval settings. A batch of messages is processed by the queue when either the batch size or the time interval is reached. In an active network, it is likely that the batch size will be reached before the time interval. In an idle network, the time interval will most likely be reached before the batch size. This may result in some network latency that corresponds to the time interval.

How the Gateway Handles Batch Processing Failure

Exceptions can occur during batch processing.
  • The receiving gateway fails with acknowledgment. If processing fails while the receiving gateway is processing a batch, it replies with a failure acknowledgment containing the exception, including the identity of the message that failed, and the ID of the last message successfully processed. The sending gateway removes the successfully processed messages and the failed message from the queue and logs the exception and the failed message information. The sender then continues processing the messages remaining in the queue.
  • The receiving gateway fails without acknowledgment. If the receiving gateway does not acknowledge a sent batch, the sender does not know which messages were successfully processed, so it re-sends the entire batch.
  • No remote gateways are available. If a batch processing exception occurs because there are no remote gateways available, the batch remains on the queue. The sending gateway waits for a time, then attempts to re-send the batch. The time periods between attempts is five seconds. The existing server monitor continuously attempts to connect to the remote gateway, so eventually a connection is made and queue processing continues. Messages build up in the queue and possibly overflow to disk in the meantime.