How Multi-site (WAN) Systems Work

The vFabric GemFire multi-site implementation connects disparate distributed systems. The systems to act as one when they are coupled and they act as independent systems when communication between sites fails. The coupling is tolerant of weak or slow links between distributed system sites.

A wide-area network (WAN) is the main use case for the multi-site topology. The multi-site topology makes systems at disparate geographical locations appear as one coherent system at all locations. It also ensures independence of the systems, so if any are lost from view, the remaining continue to operate.

The GemFire multi-site configuration allows you to share data between independent distributed systems:
  • Each site manages its own VMware® vFabric™ GemFire® distributed system, but region data is shared across all sites.
  • Data is consistent between sites, provided the update key sets are unique to each site. If two sites are updating data for the same keys at the same time, there is no guarantee of consistency.
  • The queue used for messaging between sites is overflowed to disk as needed. If the queue becomes too big, the sending member overflows its updates to disk to avoid running out of memory. In addition, you can configure the queue to be persisted to disk (enable-persistence gateway-queue property), so if the member managing the queue goes down it will pick up where it left off once it is restarted.
  • To increase throughput, gateways can be configured to use multiple concurrent queues. You can also configure the amount of maximum amount of memory to be used by each queue.
  • Ordering of events sent between sites is preserved. For gateways using multiple concurrent queues, you can configure the ordering policy that gateways will use to distribute events. The valid policies are key and thread. Key ordering means that order is preserved by key updates. Updates to the same key are added to the same gateway queue. Thread ordering means that order is preserved by the initiating thread. Updates by the same thread are added to the same gateway queue. The default ordering is key.

Multi-site Caching Overview

A multi-site installation consists of two or more distributed systems that are loosely coupled. Each site is its own distributed system and has one or more logical connections to other sites over one or more physical connections. The logical connections are known as gateway hubs. In a client/server installation, the gateway hubs are configured in the server layer.

The gateway hubs are defined at startup in the distributed system member caches. Each hub is usually hosted by more than one system member, with one member hosting the primary hub instance and the others hosting backup secondaries. The primary is the only instance that communicates with remote sites. Hubs are identified within the distributed system by their hub-id. For any single hub, the configuration must be consistent across the distributed system.



Each gateway hub defines:
  • One or more gateways for outgoing communication to remote hubs
  • A port where the hub listens for incoming communication from remote gateways



Multiple Hub Configuration

In a multiple hub configuration, you define multiple hubs and split the location of your primary hubs between members. You then split outgoing region event messaging between hubs. This spreads the gateway event processing load between members.

This figure shows a typical multiple-hub configuration. There may be any number of members in this distributed system in addition to the ones that host the gateways hubs.

In this configuration:
  • Two members host gateway hubs.
  • Each member has two gateway hubs. One hub is primary in VM1 and the other hub is primary in VM2. Only the primaries send data to remote sites.
  • Each hub has some of the region traffic pointing to it.



This next figure shows event flow for region A in the multiple-hub configuration:
  • Region events originating in VM1 are stored in the local secondary Hub Site1A and are also distributed to VM2. In VM2, these events arriving in the local Region are sent to the JVM’s primary Hub Site1A, which then forwards them to remote sites.
  • Region events originating in VM2 are sent directly out of the local primary Hub Site1A. They are also distributed to VM1 for cache update and storage in the secondary Hub Site1A.
  • Hub Site1B never sees this region’s outgoing events.



How Hub Startup Policy Affects Hub Startup Behavior

This applies to multiple-hub systems. In a multiple-hub system, every hub must have one primary instance running to maintain communication with remote sites. The other instances are secondaries that act as backups to the primary.

You can specify which hub instances should start as primary and which as secondary in your distributed system or you can let the system handle it. When a hub is initialized, it attempts to assume the role specified, but will start up even if it must assume a different role.

These are the startup polies and their effect on startup behavior:
  • none (default). If no primary is running for the id, the hub starts as primary. Otherwise it starts as secondary.
  • primary. Same behavior as for none with the addition that the hub logs a warning if it starts as secondary.
  • secondary. If a primary is present, the hub starts as secondary. If there is no primary, the hub waits for up to one minute for a primary to start. If no primary starts in that time, the hub starts as primary and logs a warning.