This article was first published on SearchServerVirtualization.TechTarget.com.

Imagine living in a crowded apartment with a bunch of people that think they own the place. Operating systems and applications can be quite inconsiderate at times. For example, when they’re running on physical machines, these pieces of software are designed to monopolize hardware resources. Now, add virtualization to the picture, and you get a lot of selfish people competing for the same resources. In the middle is the virtualization layer – acting as a sort of landlord or superintendent – trying to keep everyone happy (while still generating a profit). Such is the case with disk I/O on virtualization host servers. In this Tip, I’ll discuss some options for addressing this common bottleneck.

Understanding Virtualization I/O Requirements

Perhaps the most important thing to keep in mind is that not all disk I/O is the same. When designing storage for virtualization host servers, you need to get an idea of the actual disk access characteristics you will need to support. Considerations include:

  • Ratio of read vs. write operations
  • Frequency of sequential vs. random reads and writes
  • Average I/O transaction size
  • Disk utilization over time
  • Latency constraints
  • Storage space requirements (including space for backups and maintenance operations)

Collecting this information on a physical server can be fairly simple. For example, on the Windows platform, you can collect data using Performance Monitor and store it to a binary file or database for later analysis. When working with VMs, you’ll need to measure and combine I/O requirements to define your disk performance goals. The focus of this tip is on choosing methods for storing virtual hard disk files, based on cost, administration and scalability requirements.

Local / Direct-Attached Storage

The standard default storage option in most situations is that of using local storage. The most common connection methods include PATA, SATA, SCSI, and SAS. Each type of connection comes with associated performance and cost considerations. RAID-based configurations can provide fault-tolerance and can be used to improve performance.

· Pros:

  • Generally cheaper than other storage options
  • Low latency, high bandwidth connections that are reserved for a single physical server

· Cons:

  • Potential waste of storage space (since disk space is not shared across computers)
  • Limited total storage space and scalability due to physical disk capacity constraints (especially when implementing RAID)
  • Difficult to manage, as storage is decentralized

Storage Area Networks (SANs) / Fibre Channel

SANs are based on Fibre Channel connections, rather than copper-based Ethernet. SAN-based protocols are design to provide high throughput and low latency, but require the implementation of an optical-based network infrastructure. Generally, storage arrays provide raw block-level connections to carved-out portions of disk space.

· Pros:

  • Can provide high performance connections
  • Improved compatibility – appears are local storage to the host server
  • Centralizes storage management

· Cons:

  • Expensive to implement – requires Fibre Channel-capable host bus adapters, switches, and cabling
  • Expensive to administer – requires expertise to manage a second “network” environment

Network-Based Storage

Network-based storage devices are designed to provide disk resources over a standard (Ethernet) network connection. They most often support protocols such as Server Message Block (SMB), and Network File System (NFS), both of which are designed for file-level disk access. The iSCSI protocol provides the ability to perform raw (block-level) disk access over a standard network. iSCSI-attached volumes appear to the host server as if they were local storage.

· Pros:

  • Lower implementation and management cost (vs. SANs) due to utilization of copper-based (Ethernet) connections
  • Storage can be accessed at the host- or guest-level, based on specific needs
  • Higher scalability (arrays can contain hundreds of disks) and throughput (dedicated, redundant I/O controllers)

· Cons:

  • Simplified administration (vs. direct-attached storage), since disks are centralized
  • Applications and virtualization platforms must support either file-based access or iSCSI

Storage Caveats: Compatibility vs. Capacity vs. Cost

In many real world implementations of virtualization, an important bottleneck is storage performance. Organizations can use well-defined methods of increasing CPU and memory performance, but what about the hard disks? Direct-attached, network-based, and SAN-based storage options can provide several viable options. Once you’ve outgrown local storage (from a capacity, performance, or administration standpoint), you should consider implementing iSCSI or file-based network-based storage servers. The primary requirement, of course, is that your virtualization layer must support the hardware and software you choose. SANs are a great option for organizations that have already made the investment, but some studies show that iSCSI devices can provide similar levels of performance at a fraction of the cost.

The most important thing to remember is to thoroughly test your solution before deploying it into production. Operating systems can be very sensitive to disk-related latency, and disk contention can cause unforeseen traffic patterns. And, once the systems are deployed, you should be able to monitor and manage throughput, latency, and other storage-related parameters.

Overall, providing storage for virtual environments can be a tricky technical task. The right solution, however, can result in happy landlords and tenants whereas the wrong solutions result in one seriously overcrowded apartment.