This article was first published on

In the early days of virtualization, it was common for users to run a few VMs in test and development environments. These VMs were important, but only to a small set of users. Now, it’s common for organizations to run mission-critical production workloads on their virtual platforms. Downtime and data loss can affect dozens or hundreds of users, and the rule is to ensure that virtual machines are at least as well protected as their physical counterparts. So how can this be done? In this article, I’ll present some information related to developing a backup strategy for virtual machines. In a related article, “Implementing Disaster Recovery for Virtual Machines,” I’ll look at some additional options for performing host-based backups.

Determining Recovery Requirements

If there’s a golden rule to follow related to implementing backups, it’s to start with enumerating your recovery requirements. After all, that’s the goal of performing backups: To allow for recovery. Considerations should include:

  • Data loss: What is an acceptable amount of data loss, in a worst-case scenario? For some applications and services, it might be acceptable to lose several hours worth of data if it can lower backup costs. In other cases, near-realtime backups might be required.
  • Downtime windows: What is an acceptable amount of downtime? Some workloads will require rapid recovery in the case of the failure of a host. In other cases
  • Virtual machine configuration details: What are the CPU, memory, disk, and network requirements for the VM? These details can help prepare you for moving a workload to another physical host.
  • Identifying important data: Which information really needs to be backed up? In some cases, full VHD backups might make sense. More often, critical data such as web server content, data files, and related information is sufficient.
  • Budget and Resources: Organizations have limits based on the amount of available storage space, bandwidth, human resources, and technical expertise. These details must be factored in to any technical solution.

Once you have the business-related requirements in mind, it’s time to look at technical details.

Backups for Guest OS’s

One common approach to performing backups for VMs is to treat virtual machines as if they were physical ones. Most organizations have invested in some method of centralized backup solution for their physical servers. Since VMs will often be running a compatible guest OS, it’s usually easy to install and configure backup agent within them. Configuration details will include the frequency of backups, which data to protect, and associated monitoring jobs.

The technical details can vary significantly, based on the needs of the environment. Some examples might include:

  • Small Environments: When managing a few virtual machines (such as in development and test environments), simple scripting or automation might be enough to meet backup requirements. For example, test results and data files might be stored on a shared network drive so they can be reviewed even when the VMs are unavailable.
  • Medium-Sized Environments: The job of supporting dozens or hundreds of virtual machines will require the use of a centralized, automated backup solution. Data is usually sent over a dedicated backup network and stored in one or more network locations.
  • Large Environments: When scaling to support many hundreds of virtual machines, managing direct-attached storage becomes nearly impossible. Organizations often invest in Storage Area Network (SAN) technology to support the increased bandwidth and disk space requirements. It may become difficult to identify important data when working with a vast array of different types of VMs. Organizations that can afford the storage resources may consider backing up the entire contents of their virtual hard disks to ensure that they can quickly recover them.

Again, regardless of the approach, the goal should be to meet business-level recovery requirements. Technical constraints such as limited storage space and limited bandwidth will play a factor in the exact configuration details.

Benefits of iSCSI

An important virtualization management-related concern is that of keeping track of virtual hard disks. The default option in many environments is to rely upon local storage. The problem is that it can quickly become difficult to enumerate and backup all of these different servers. For many environments, SAN-based resources are too costly for supporting all virtual machines. The iSCSI standard provides an implementation of SCSI that runs over standard Ethernet (copper-based) networks. To a host computer or a guest OS, an iSCSI-attached volume appears like a local physical volume. Block-level operations such as formatting or even defragmenting the volume are possible.

From a backup standpoint, systems administrators can configure their host and/or guest OS’s to use network-attached storage for storing virtual hard disk data. For example, on the host system, virtual hard disks may be created on iSCSI volumes. Since the actual data resides on a network-based storage server, this approach lends itself to performing centralized backups. One important caveat is that organizations should thoroughly test the performance and reliability of their iSCSI infrastructures before relying on their for production workloads. Issues such as latency can cause reliability issues.

Other Options

In this article, I presented details related to perform virtual machine backups from within Guest OS’s. Of course, this is only one option. Another useful approach is to perform backups at the level of the host OS. I’ll cover that topic in my next article, “Implementing Disaster Recovery for Virtual Machines.”