Deploying the Data Director for Hadoop vApp is the first step in getting your Hadoop cluster up and running.

Install and configure vSphere.

Verify that you have the required licenses. You manage these licenses from vSphere. You cannot use the Data Director license management interface.

vSphere license. In a production environment, enterprise license is required to protect the master node with vSphere High Availability (HA) and Fault Tolerance (FT).

One license for each Hadoop worker node virtual machine. For example, if a cluster contains 1 master node, 10 worker nodes and 1 client node, 10 licenses are required.

Download the Data Director for Hadoop OVA from the VMware download site.

Verify that you have at least 11GB available for the OVA. You need additional resources for the Hadoop cluster.

See Prerequisites for a complete list.

1

In the vSphere Client, select File > Deploy OVF Template

2

Click Browse and select the location to which you downloaded the OVA.

3

View the OVF Template Details page and click Next.

4

Accept the license agreement and click Next.

5

Specify a name for the vApp, select a target datacenter for the OVA, and click Next.

6

Select a vSphere resource pool for the OVA and click Next.

You must select a top-level resource pool. Child resource pools are not supported.

7

Select shared storage for the OVA if possible and click Next.

If shared storage is not available, local storage is acceptable.

Note

For master nodes, shared storage is the best choice. Using shared storage, you can protect the master node's virtual machine using vMotion, HA, and FT. For worker nodes use local storage. With local storage, the throughout is scalable and the cost of storage is lower.

8

Leave DHCP, the default network setting, or select static IP and provide the network settings and click Next.

9

Make sure the Initialize Resources check box is checked and click Next.

If the check box is checked, the resource pool, datastore, and network assigned to the vApp are added to the Serengeti server for use by the Hadoop cluster you create. If the check box is unchecked, the resource pool, data store, and network connection assigned to the vApp will not be added to Serengeti for use by Hadoop clusters.

If you choose not to automatically add the resource pool, datastore, and network when deploying the vApp, you must use the Data Director for Hadoop Web console or the CLI console to specify resource pool, datastore, and network information before creating a Hadoop cluster. See Managing Resources for Hadoop Clustersto learn how to add resources for use by Hadoop clusters.

10

Make sure the vCenter Extension Service is selected in the Configure Service Bindings screen and click Next.

The Management Server establishes a password-protected connection to the vCenter Server to perform operations on the virtual machine.

The vCenter Server starts deploying the Serengeti server. This process may take several minutes. When deployment completes, 2 virtual machines are in the vApp, which have the name:

The Management Service virtual machine, which is started as part of the OVA deployment.

The Hadoop Template virtual machine, which is not started. Serengeti clones Hadoop nodes from this template when provisioning a cluster. You do not need to start or stop this virtual machine. No Hadoop distributions is included in the template.

Log in to Data Director for Hadoop Web console. See Log in to Data Director for Hadoop Web Console.

If you did not leave the Initialize Resources check box checked, you must add resources to the Serengeti server prior to creating a Hadoop cluster. See Managing Resources for Hadoop Clusters.