When you deploy the Hadoop for Data Director OVA, no Hadoop distribution is included in the management server. You must add and configure the Hadoop distribution from the command line. You can configure multiple Hadoop distributions.

In most cases, you download the Hadoop distribution from the Internet. If you are behind a firewall, you may need to modify your proxy settings to allow the download.

If you have a local Hadoop distribution and your server does not have access to the Internet, you can manually upload the distribution. See Work with Different Hadoop Distros in the Serengeti Users Guide.

Deploy the Data Director for Hadoop vApp. See Deploy the Hadoop for Data Director vApp.

Set the password for the management server. See Create a New Password for the Management Server.

1

Log in to the Management Server using PuTTY or another SSH client.

2

Run the /opt/serengeti/config-distro.rb Ruby script, as follows.

cd sbin
config-distro.rb --name distro_name --hadoop hadoop_package_url --pig pig_package_url --hive hive_package_url

The script downloads the files.

3

When download completes, explore the newly created /opt/serengeti/www/distros directory, which includes the following directories and files.

Item

Description

name

Directory that is named after the distribution, for example, apache.

manifest

manifest file generated by config-distro.rb that is used to download the Hadoop distribution.

manifest.example

Example manifest file. This file is available before you perform the download. The manifest file is a JSON file with three sections, name, version, and packages. The Serengeti User's Guide includes information about the manifest.

4

Return to the Data Director for Hadoop Web Console and click Distros to verify that a distribution is configured.

The distribution and the corresponding role are displayed.

The distribution is added to the Management Server, but not installed in the Template virtual machine. The agent that is preinstalled on each virtual machine copies the distribution components that the user specifies from the Management Server to the nodes during the Hadoop cluster creation process.

You can now create and deploy a Hadoop cluster using the Hadoop distribution. See Creating Hadoop Clusters.

If you did not choose to automatically assign resources to Serengeti when you deployed the Data Director for Hadoop vApp, you must first add a resource pool, datastore, and network for use by the Hadoop cluster you intend to create. See Managing Resources.