Before you begin the vFabric Data Director for Hadoop deployment tasks, make sure that your system meets all of the prerequisites.

vFabric Data Director for Hadoop requires that you install and configure vSphere, and that your environment meets minimum resource requirements. You must also make sure that you have licenses for the VMware components of your deployment.

vSphere Requirements

Before you can install Data Director for Hadoop, you must have set up the following VMware products.

Install vSphere 5.0 (or later) Enterprise or Enterprise Plus.

Enable the vSphere Network Time Protocol on the ESXi hosts. The Network Time Protocol (NTP) daemon ensures that time-dependent processes occur in sync across hosts.

Resource Requirements for the vSphere Management Server and Templates

Resource pool with at least 4GB RAM.

Port group (or dvportgroup) with at least 6 uplink ports that has connectivity with the dvportgroups used to deploy your Hadoop clusters.

30GB or more (recommended) disk space for the management server and Hadoop template virtual disks.

Resource Requirements for the Hadoop Cluster

By default, when you deploy the Data Director for Hadoop OVF, the deployment process allocates resources for the Hadoop cluster from the resource pool you create in vSphere. If you deploy using the default resource pool, the Hadoop cluster you create must not exceed the resources you allocate to the vSphere Management Server and Templates (see above). You can opt to not have the deployment process create a default resource pool, and manually create resource pools for your clusters with differing resource allocations as needed.

Resource pool limit is not less than the total memory required by the Hadoop cluster.

Datastore free space is not less than the total size needed by the Hadoop cluster, plus swap disks for each Hadoop node that is equal to the memory size requested.

Dvportgroup configured across all relevant ESX hosts, and has connectivity with the dvportgroups in use by the management server.

HA is enabled for the master node if HA protection is needed.


vSphere enterprise license because VMware High Availability (HA) and VMware Distributed Resources Scheduler (DRS) must be enabled.

One license per Hadoop worker node virtual machine. Hadoop nodes that only contain master roles (hadoop_namenode, hadoop_jobtracker) or client services (hadoop_client, pig, hive, hive_server) are exempt from licensing requirements. For example, if a cluster contains 1 master node, 10 worker nodes and 1 client node, 10 licenses are required.

You manage Data Director for Hadoop licenses from vSphere. You cannot manage the licenses with the Data Director licensing interface.