After you complete deployment of the Hadoop distribution, you can create a Hadoop cluster to process data. You can create multiple clusters in your vFabric Data Director for Hadoop environment, but your environment must meet all prerequisites.

Deploy the Data Director for Hadoop vApp.

Ensure that you have adequate resources allocated to run the Hadoop cluster.

Configure one or more Hadoop distributions.

1

Log in to the Data Director for Hadoop Web Console.

2

Select Clusters, and click the Add icon.

3

Specify the information for the cluster that you want to create.

Option

Description

Cluster name

The name to identify the cluster.

Master Node Group

The master node is a virtual machine that runs the Hadoop NameNode and TaskTracker service. This node manages HDFS data and assigns tasks to Hadoop JobTracker services deployed in the worker node group.

Select a resource template from the drop-down menu, or click Customize to customize a resource template.

For the master node, use shared storage so that you protect this virtual machine with VMware HA and FT.

Worker Node Group

Worker nodes are virtual machines that run the Hadoop DataNodes and TaskTracker service. These nodes store HDFS data and execute tasks.

Select the number of nodes and the resource template from the drop-down menu, or click Customize to customize a resource template.

For worker nodes, use local storage.

Note

You can add nodes to the worker node group by using Scale Out Cluster. You cannot reduce the number of nodes.

Client Node Group

A client node is a virtual machine that contains Hadoop client components. From this virtual machine you can access HDFS, submit MapReduce jobs, run Pig scripts, or run Hive queries.

Select the number of nodes and a resource template from the drop-down menu, or click Customize to customize a resource template.

Note

You can add nodes to the client node group by using Scale Out Cluster. You cannot reduce the number of nodes.

Hadoop distro

Select the Hadoop distribution.

Network

Select the network that you want the cluster to use.

The management server clones the template virtual machine to create the nodes in the cluster. When each virtual machine starts, the agent on that virtual machine pulls the appropriate Serengeti software components to that node and deploys the software.