The VMware vSphere storage architecture consists of layers of abstraction that hide and manage the complexity and differences among physical storage subsystems.

This storage architecture is shown in Storage Architecture.

Storage Architecture
The storage architecture displays how virtual storage can be allocated without exposure to physical storage technologies.

To the applications and guest operating systems inside each virtual machine, the storage subsystem appears as a virtual SCSI controller connected to one or more virtual SCSI disks as shown in Storage Architecture. These controllers are the only types of SCSI controllers that a virtual machine can see and access, and include BusLogic Parallel, LSI Logic Parallel, LSI Logic SAS, and VMware Paravirtual.

The virtual SCSI disks are provisioned from datastore elements in the datacenter. A datastore is like a storage appliance that delivers storage space for virtual machines across multiple physical hosts.

The datastore abstraction is a model that assigns storage space to virtual machines while insulating the guest from the complexity of the underlying physical storage technology. The guest virtual machine is not exposed to Fibre Channel SAN, iSCSI SAN, direct attached storage, and NAS.

Each virtual machine is stored as a set of files in a directory in the datastore. The disk storage associated with each virtual guest is a set of files within the guest's directory. You can operate on the guest disk storage as an ordinary file. It can be copies, moved, or backed up. New virtual disks can be added to a virtual machine without powering it down. In that case, a virtual disk file (.vmdk) is created in VMFS to provide new storage for the added virtual disk or an existing virtual disk file is associated with a virtual machine.

Each datastore is a physical VMFS volume on a storage device. NAS datastores are an NFS volume with VMFS characteristics. Datastores can span multiple physical storage subsystems. As shown in Storage Architecture, a single VMFS volume can contain one or more LUNs from a local SCSI disk array on a physical host, a Fibre Channel SAN disk farm, or iSCSI SAN disk farm. New LUNs added to any of the physical storage subsystems are detected and made available to all existing or new datastores. Storage capacity on a previously created datastore can be extended without powering down physical hosts or storage subsystems. If any of the LUNs within a VMFS volume fails or becomes unavailable, only virtual machines that touch that LUN are affected. An exception is the LUN that has the first extent of the spanned volume. All other virtual machines with virtual disks residing in other LUNs continue to function as normal.

VMFS is a clustered file system that leverages shared storage to allow multiple physical hosts to read and write to the same storage simultaneously. VMFS provides on-disk locking to ensure that the same virtual machine is not powered on by multiple servers at the same time. If a physical host fails, the on-disk lock for each virtual machine is released so that virtual machines can be restarted on other physical hosts.

VMFS also features failure consistency and recovery mechanisms, such as distributed journaling, a failure-consistent virtual machine I/O path, and machine state snapshots. These mechanisms can aid quick identification of the cause and recovery from virtual machine, physical host, and storage subsystem failures.

VMFS also supports raw device mapping (RDM). RDM provides a mechanism for a virtual machine to have direct access to a LUN on the physical storage subsystem (Fibre Channel or iSCSI only). RDM is useful for supporting two typical types of applications:

SAN snapshot or other layered applications that run in the virtual machines. RDM better enables scalable backup offloading systems using features inherent to the SAN.

Microsoft Clustering Services (MSCS) spanning physical hosts and using virtual-to-virtual clusters as well as physical-to-virtual clusters. Cluster data and quorum disks must be configured as RDMs rather than files on a shared VMFS.

Raw Device Mapping
This image illustrates how raw device mapping provides the virtual machine direct access to the LUN via the datastore.

An RDM is a symbolic link from a VMFS volume to a raw LUN. The mapping makes LUNs appear as files in a VMFS volume. The mapping file, not the raw LUN, is referenced in the virtual machine configuration.

When a LUN is opened for access, the mapping file is read to obtain the reference to the raw LUN. Thereafter, reads and writes go directly to the raw LUN rather than going through the mapping file.