Memory Requirements for Cached Data

GemFire solutions architects need to estimate resource requirements for meeting application performance, scalability and availability goals. These requirements include estimates for the following resources:


The information here is only a guideline, and assumes a basic understanding of GemFire. While no two applications or use cases are exactly alike, the information here should be a solid starting point, based on real-world experience. Much like with physical database design, ultimately the right configuration and physical topology for deployment is based on the performance requirements, application data access characteristics, and resource constraints (i.e., memory, CPU, and network bandwidth) of the operating environment.

Core Guidelines for GemFire Data Region Design


Memory Usage Overview

The following guidelines should provide a rough estimate of the amount of memory consumed by your system. A worksheet is available to help calculate your capacity using this information.

Memory calculation about keys and entries (objects) and region overhead for them can be divided by the number of members of the distributed system for data placed in partitioned regions only. For other regions, the calculation is for each member that hosts the region. Memory used by sockets, threads, and the small amount of application overhead for GemFire is per member.

For each entry added to a region, the GemFire cache API consumes a certain amount of memory to store and manage the data. This overhead is required even when an entry is overflowed or persisted to disk. Thus objects on disk take up some JVM memory, even when they are paged to disk. The Java cache overhead introduced by a region, using a 32-bit JVM, can be approximated as listed below.

Actual memory use varies based on a number of factors, including the JVM you are using and the platform you are running on. For 64-bit JVMs, the usage will usually be larger than with 32-bit JVMs. As much as 80% more memory may be required for 64-bit JVMs, due in large part to the fact that all address and integers are 64 bits, not 32 bits. The companion spread sheet does provide for 64-bit JVMs but it is necessarily an approximation due to the platform and JVM issues mentioned above.

Note: Objects in GemFire are serialized for storage into partitioned regions and for all distribution activities, including overflow and persistence to disk. For optimum performance, GemFire tries to reduce the number of times an object is serialized and deserialized. Because of this, your objects may be stored in serialized form or non-serialized form in the cache. To do capacity planning for your data, therefore, use the larger of the serialized and deserialized sizes. If your objects classes are DataSerializable , the non-serialized form will generally be the larger of the two. See Data Serialization Data Serialization.

There are several additional considerations for calculating your memory requirements:


  • Size of your stored data. Use the larger of the serialized and deserialized forms for each data object type. For DataSerializable object classes, the non-serialized form will generally be the larger of the two.

    Objects in GemFire are serialized for storage into partitioned regions and for all distribution activities, including moving data to disk for overflow and persistence. For optimum performance, GemFire tries to reduce the number of times an object is serialized and deserialized, so your objects may be stored in serialized or non-serialized form in the cache.

  • Application object overhead for your data. These are the estimated values for 32-bit JVMs and 64-bit JVMs. Sizes may vary slightly between JVMs and platforms.
    • Object header. On 32-bit JVMs, 12 bytes. (The object header is actually only 8 bytes, but an extra 4 bytes padding is added if the total object size is not a multiple of 8, as is true roughly half the time.) On 64-bit JVMs, 20 bytes. Make sure to count the key as well as the value, and to count every object if the key and/or value is a composite object.
    • Field. On 32-bit JVMs, 8 bytes for fields of type double or long, 4 bytes per field for all others. On 64-bit JVMs, the size is the same as 32-bit except for fields that are references to objects, which take 8 bytes.

Cache Overhead

This table gives estimates for the cache overhead in a 32-bit JVM. The overhead is required even when an entry is overflowed or persisted to disk. Actual memory use varies based on a number of factors, including the JVM type and the platform you run on. For 64-bit JVMs, the usage will usually be larger than with 32-bit JVMs and may be as much as 80% more.

For this add
For each region. This value can vary because memory consumption for object headers and object references varies for 64-bit JVMs, different JVM implementations, and different JDK versions. 87 bytes per entry
If statistics are enabled for the member 16 bytes per entry
If the region is partitioned 16 bytes per entry
If the region is persisted and/or overflowed 40 bytes per entry
If the region has an LRU eviction controller 16 bytes per entry
If the region has global scope 90 bytes per entry
If the region has entry expiration configured 147 bytes per entry
For each optional user attribute 52 bytes per entry
For indexes used in querying, the overhead varies greatly depending on the type of data you are storing and the type of index you create. You can roughly estimate the overhead for some types of indexes as follows:
  • If the index has a single value per region entry for the indexed expression, the index introduces at most 243 bytes per region entry. An example of this type of index is: fromClause="/portfolios", indexedExpression="id". The maximum of 243 bytes per region entry is reached if each entry has a unique value for the indexed expression. The overhead is reduced if the entries do not have unique index values.
  • If each region entry has more than one value for the indexed expression, but no two region entries have the same value for it, then the index introduces at most 236 C + 75 bytes per region entry, where C is the average number of values per region entry for the expression.

Sockets

Servers always maintain two outgoing connections to each of their peers. So for each peer a server has, there are four total connections: two going out to the peer and two coming in from the peer.

The server threads that service client requests also communicate with peers to distribute events and forward client requests. If the server's GemFire connection property conserve-sockets is set to true (the default), these threads use the already-established peer connections for this communication.

If conserve-sockets is false, each thread that services clients establishes two of its own individual connections to its server peers, one to send, and one to receive. Each socket uses a file descriptor, so the number of available sockets is governed by two operating system settings:
  • maximum open files allowed on the system as a whole
  • maximum open files allowed for each session
In servers with many threads servicing clients, if conserve-sockets is set to false, the demand for connections can easily overrun the number of available sockets. Even with conserve-sockets set to false, you can cap the number of these connections by setting the server's max-threads parameter.

Since each client connection takes one server socket on a thread to handle the connection, and since that server acts as a proxy on partitioned regions to get results, or execute the function service on behalf of the client, for partitioned regions, if conserve sockets is set to false, this also results in a new socket on the server being opened to each peer. Thus N sockets are opened, where N is the number of peers. Large number of clients simultaneously connecting to a large set of peers with a partitioned region with conserve sockets set to false can cause a huge amount of memory to be consumed by socket. Set conserve socket to true in these instances.

Note: there is also JVM overhead for the thread stack for each client connection being processed, set at 256KB or 512 KB for most JVMs . On some JVMs you can reduce it to128KB. You can use the GemFire max-threads property or the GemFire max-connections property to limit the number of client threads and thus both thread overhead and socket overhead.

The following table lists the memory requirements based on connections.

Connections Memory requirements

Per socket

32,768 /socket (configurable)

Default value per socket should be set to a number > 100 + sizeof (largest object in region) + sizeof (largest key)

If server (for example if there are clients that connect to it)

= (lesser of max-threads property on server or max-connections)* (socket buffer size +thread overhead for the JVM )
Per member of the distributed system if conserve sockets is set to true 4* number of peers
Per member, if conserve sockets is set to false 4 * number of peers hosting that region* number of threads
If member hosts a Partitioned Region, If conserve sockets set to false and it is a Server (this is cumulative with the above)

=< max-threads * 2 * number of peers

Note: it is = 2* current number of clients connected * number of peers. Each connection spawns a thread.
Subscription Queues

Per Server, depending on whether you limit the queue size. If you do, you can specify the number of megabytes or the number of entries until the queue overflows to disk. Whereever possible entries on the queue are references to minimize memory impact. The queue consumes memory not only for the key and the entry but also for the client ID/or thread ID as well as for the operation type. Since you can limit the queue to 1 MB, this number is completely configurable and thus there is no simple formula.

1 MB +

GemFire classes and JVM overhead

Roughly 50MB

Thread overhead

Each concurrent client connection into the a server results in a thread being spawned up to max-threads setting. After that a thread services multiple clients up to max-clients setting.

There is a thread stack overhead per connection (at a minimum 256KB to 512 KB, you can set it to smaller to 128KB on many JVMs.) The accompanying spreadsheet uses worst case assumptions, you can change them.

To calculate capacity, you can use the worksheet Capacity Plan Guide.