Configuring a Multi-site (WAN) System

Plan and configure your multi-site topologies and configure the regions that will be shared between systems.

Prerequisites

Before you start, you should understand how to configure membership and communication in peer-to-peer systems. See Configuring Peer-to-Peer Discovery and Configuring Peer Communication.

If you are using client/server, you should understand how to configure communication between client and server systems. In client/server installations, you configure the server systems for multi-site communication. See Configuring a Client/Server System.

High Level Process

Use the following steps to configure a multi-site system:
  1. Plan the topology of your multi-site system. See Multi-site (WAN) Topologies for a description of different multi-site topologies.
  2. Configure membership and communication for each distributed system in your multi-site system. This configuration will depend on the type of distributed system. See Configuring Peer-to-Peer Discovery, Configuring Peer Communication or Configuring a Client/Server System as appropriate.
  3. Define and configure the gateway hubs for communication between distributed systems. The required configuration steps are described in Define and Configure Gateway Hubs.
  4. Create all the data regions that you want to participate in the multi-site system. See Create Data Regions for Multi-site Communication.
  5. Enable gateway communication in the data regions you want to have participate in the multi-site system. This procedure is described in Enable Data Regions for Multi-site Communication.
  6. Make sure you start each distributed system process in the correct order (data nodes first, gateways afterwards) to avoid errors in gateway communications. See Correct Order for Starting Up Nodes in Multi-Site Systems.

Define and Configure Gateway Hubs

See Gateway Configuration Properties.
  1. For each system, choose the members that will act as gateway-hubs for communication between sites. Using multiple gateway-hubs provides high availability and allows you to spread event queue processing across the gateway-hubs. Each gateway-hub usually has a port where it listens for incoming communication and one or more gateways defined for outgoing communication to remote hubs. If a gateway-hub has zero gateways defined, it means the gateway-hub is acting as a receiver only.
  2. For each gateway, pair up remote gateway-hub ids and locations to local gateway endpoints, following this template:
    <cache>
        <gateway-hub ... port="<thisHubsPort>"> 
            <gateway ...> 
                <gateway-endpoint id="<remoteSiteID1>" 
                    host="<remoteSiteAddress1>" port="<remoteSitePort1>"/> 
                <gateway-endpoint id="<remoteSiteID2>" 
                    host="<remoteSiteAddress2>" port="<remoteSitePort2>"/> 
            </gateway> 
        </gateway-hub> ... 
    Note: For this hub to receive communication from other sites, this member's host address and the port specified for this gateway hub must correspond to the host and port specified in at least one remote site’s gateway endpoint. For this hub to send communication to other sites, it must have gateway endpoints defined in its gateways with address and port pairs corresponding to the address and port of remote hubs.

  3. Configure your gateway hubs and gateways using either cache.xml or Java APIs. For example, you would create the following configurations for this multi-site topology:


    • cache.xml configuration
      For Site 1:
      // Hub listening at port 22222, with gateways to other 
      // hubs listening at port 22222 on their hosts
      <cache>
        <gateway-hub id="Site1" port="22222" socket-buffer-size="125000"> 
          <gateway id="Site2" socket-buffer-size="125000"> 
            <gateway-endpoint id="Site2Lucy" host="lucy" port="22222"/> 
            <gateway-endpoint id="Site2Ricky" host="ricky" port="22222"/> 
          </gateway> 
        </gateway-hub> ... 
      For Site 2:
      <cache>
        <gateway-hub id="Site2" port="22222" socket-buffer-size="125000"> 
          <gateway id="Site1" socket-buffer-size="125000"> 
            <gateway-endpoint id="Site1Fred" host="fred" port="22222"/> 
            <gateway-endpoint id="Site1Ethel" host="ethel" port="22222"/> 
          </gateway> 
        </gateway-hub> ... 
      
      

      Notice that in the second configuration that the gateway- hub id is Site2, which corresponds to the gateway id configured for Site1. The gateway id is Site1, which corresponds to the gateway hub-id from Site1's configuration.

    • Java configuration
      // Create or obtain the GemFire cache
      Cache cache = ... ;
      // Create the Gateway Hub
      GatewayHub site1Hub = cache.addGatewayHub("Site1", 22222);
      site1Hub.setSocketBufferSize(125000);
      
      // Add the Site2 gateway
      Gateway site2Gateway = site1Hub.addGateway("Site2");
      site2Gateway.setSocketBufferSize(125000);
      
      // Create the Site2 endpoints
      site2Gateway.addEndpoint("Site2", "lucy", 22222);
      site2Gateway.addEndpoint("Site2", "ricky", 22222);
  4. Based on the needs of your applications, determine how to configure the gateway queue or queues on your gateway. Things you will need to consider:

    • Whether to enable disk persistence. If you enable disk persistence, you need to specify the disk store and the maximum amount of memory that an individual queue can use before overflowing to disk. See Configuring Highly Available Gateway Queues.
    • Whether to use a single queue or multiple concurrent queues to process events in parallel on the gateway. The number of queues used by a gateway is configured in the concurrency-level attribute of the gateway. See Configuring Gateway Queue Concurrency Levels and Order Policy.
    • If you have configured a concurrency-level, you can then decide which event ordering policy to use on the gateway-- you can use either a key-based, thread-based, or partitioning key-based ordering policy. The default is key-based ordering. See Configuring Gateway Event Ordering Policy.
    • Determine whether you should conflate events in the queue. See Conflating a Multi-Site Gateway Queue.

    Example Configuration: The following configuration provides an example of a gateway with multiple queues and disk persistence enabled.
    • cache.xml configuration:
      <gateway-hub id="LN" port="22221">
        <gateway id="NY" concurrency-level="5" order-policy="key">
          <gateway-endpoint id="NY-1" host="localhost" port="11111"/>
          <gateway-endpoint id="NY-2" host="localhost" port="11112"/>
          <gateway-queue batch-size="1000" enable-persistence=true disk-store-name="gateway-disk-store" 
      maximum-queue-memory="200"/>
        </gateway>
        <gateway id="TK" concurrency-level="10" order-policy="thread">
          <gateway-endpoint id="NY-1" host="localhost" port="33331"/>
          <gateway-endpoint id="NY-2" host="localhost" port="33332"/>
          <gateway-queue batch-size="1000" enable-persistence=true disk-store-name="gateway-disk-store" 
      maximum-queue-memory="100"/>
        </gateway>
      </gateway-hub>
Note: Except for ensuring that each hub is listening at a different host and port location and the startup-policy setting, the gateway hub configuration should be identical between members in the same distributed system.

Create Data Regions for Multi-site Communication

Any region that might receive events from a remote sites must be created before starting the gateway. Otherwise, batches of events could arrive from remote sites before the regions for those events are created. If this occurs, the local site will throw exceptions since the receiving region does not exist yet. Note that if you define your regions in cache.xml, the startup order is handled properly.

Enable Data Regions for Multi-site Communication

Set the region enable-gateway region attribute to true for every region that will be shared between sites.

Note: Set this configuration option the same for all caches where the region is defined regardless of whether a hub is running in the cache.

Enable gateway communication using one of the following:

XML:
<region name="gatewayRegion"> 
  <region-attributes ... enable-gateway="true"/> 
  ... 
</region> 
Java:
RegionFactory factory = ...;
factory.setEnableGateway(new Boolean(true));
multiSiteRegion = factory.create("gatewayRegion");

You can also configure gateway events to be sent to one or more specific gateway hubs. To configure the gateway hub or hubs that should receive the events, supply the hub id or a comma-separated list of gateway hub ids to the region configuration or to the region creation API.

XML:
<region name="gatewayRegion"> 
  <region-attributes ... enable-gateway="true" hub-id="DB1,DB2"/> 
  ... 
</region> 
Java:
RegionFactory factory = ...;
factory = factory.setGatewayHubId("DB1,DB2");
Note: For multiple-hub systems, you should specify the hub ids of the destination hubs that are intended to receive events. If you do not, region events go to all available hubs, which can result in duplicate sends to remote sites.

(Optional) Give Each System a Unique ID

If you will use GemFire's Portable Data eXchange (PDX) serialization for data distribution, for each system in your WAN installation, choose a unique integer between 0 (zero) and 255 and set the distributed-system-id in every member's gemfire.properties file.

See vFabric GemFire PDX Serialization and gemfire.properties and gfsecurity.properties (vFabric GemFire Property Files) for more information.

Correct Order for Starting Up Nodes in Multi-Site Systems

Some deployments use multiple JVMs in a distributed system. You may have nodes dedicated to data and nodes that are dedicated to functioning as gateways.

In this situation, you must start and initialize all data nodes before the starting the gateway JVMs. Otherwise, events received from remote sites will cause exceptions on the local site since the regions (data nodes) don't exist yet.