How Network Partitioning Management Works

GemFire handles network outages by using a weighting system to determine whether the remaining available members have a sufficient quorum to continue as a distributed system. Individual members are each assigned a weight, and quorum is determined by comparing the total weight of currently responsive members to the previous total weight of responsive members.

Your distributed system can split into separate running systems when members lose the ability to see each other. The typical cause of this problem is a failure in the network. To guard against this, vFabric GemFire offers network partitioning detection so that when a failure occurs, only one side of the system keeps running and the other side automatically shuts down.

The overall process for detecting a network partition is as follows:
  1. The distributed system starts up. When you start up a distributed system, you typically start the locators first, then the cache servers and then other members such as applications or processes that access distributed system data.
  2. After the locators have started up, the oldest locator assumes the role of the membership coordinator. Peer discovery occurs as members come up and locators generate a membership discovery list for the distributed system. Locators hand out the membership discovery list as each member process starts up. This list typically contains a hint on who the current membership coordinator is
  3. Member processes make a request to the coordinator to join the distributed system. If authenticated, the coordinator hands the new member the current membership view and begins the process of updating the current view (to add the new member or members) by sending out a view preparation message to existing members in the view.
  4. While members are joining the system, it is also possible that members are being removed through the normal failure detection process. This process removes unresponsive or slow members. See Managing Slow Receivers and Failure Detection and Membership Views for descriptions of the failure detection process.
  5. Whenever the coordinator is alerted of a membership change (a member either joins or leaves the distributed system), it generates a new membership view. The membership view is generated by a two-phase protocol. In the first phase, the membership coordinator sends out view preparation message to all members and waits 12 seconds for a view preparation ack return message from each member.
  6. If the coordinator does not receive an ack message from a member within 12 seconds, the coordinator attempts to connect to the member's failure-detection socket and then attempts to connect to its direct-channel socket. If the coordinator cannot connect to the member's sockets, it declares the member dead and starts the process over again with a new view.
  7. The coordinator sends out the new membership view to all members that acknowledged the view preparation message. The coordinator waits another 12 seconds for an acknowledgment of receiving the new view from each member. Any members that fail to acknowledge the view are removed from the view.
  8. The membership coordinator calculates the total weight of members in the current membership view and compares it to the total weight of the previous membership view. Some conditions to note:
    • When the first membership view is sent out, there are no accumulated losses. The first view only has additions and usually contains the initial locator/coordinator.
    • A new coordinator may have a stale view of membership if it did not see the last membership view sent by the previous (failed) coordinator. If new members were added during that failure, then the new members may be ignored when the first new view is sent out.
    • If members were removed during the failover to the new coordinator, then the new coordinator will have to determine these losses during the view preparation step.
  9. When GemFire detects that the total membership weight has dropped by a configured percentage within a single membership view change, GemFire declares a network partition. The coordinator sends a network-partitioned-detected UDP message to all members (even non-responsive ones) and then closes the distributed system with a ForcedDisconnectException. If a member fails to receive the message before the coordinator closes the system, the member is responsible for detecting the event on its own.

The presumption is that when a network partition is declared, the members that can generate a quorum of membership will continue operations and assume the role of the "surviving" side. The surviving members will elect a new coordinator, designate a lead member and so on.

Note that it is possible that a member can fail during view transmission and that some other process will reuse its fd-sock or direct-channel port, causing a false positive in the member verification step. This is acceptable because it means that the machine that hosted the process is still reachable. No network failure occurred and the member that didn't acknowledge the view preparation message will be removed in a subsequent view.