ZFS explained and demoed (briefly)

2011-10-26 4076 words, 20 minutes
clone
compression
deduplication
replication
snapshot
solaris
zfs

ZFS may be Yet Another File System. To me, ZFS is Zee File System brought by Sun Solaris. It is remarkable because it comes with numerous interesting features. To name a few: redundancy, checksum, compression, deduplication, snapshots.

This article is a kind of cheat sheet on what is ZFS, how it is organized and how to build a consistent storage area using this particular file system. It is mostly an attempt to clarify and write down what I read about ZFS and how it has to be used to achieve various storage goals.

About storage and file system

The most common way of storing data is to get some storage hardware, present it to some operating system and organize the date from there.

Most of people have a single internal disk drive with Windows installed on their PC, running over NTFS. Sometimes, they plug in/out removable storage on USB, still managing their data from NTFS.

On the industry side, you often see several disks plugged into a customized PC, named a server, through a dedicated piece of hardware that enables redundancy ; a RAID controller. In some rare case, the RAID is done on the software side. Anyway, at some point, the storage is seen as a global protected area to the operating system in which the data is formatted in a dedicated manner: the file system. On a Windows Server, you’ll get a NTFS partition. On Linux, you’ll get an EXT? partition. On OpenBSD, you’ll get an UFS slice. Etc.

There are cases when the storage zone is not attached inside the server. Sometimes, a huge set of disks is presented to a numerous number of servers in an independent manner. Either unformatted (Fibre Channel, iSCSI…) through a storage network (NAS) or externally directly attached (DAS). Sometimes, it is presented through the network in an already organized manner (NAS) via a network filesystem (NFS, CFIS…)

ZFS is a kind of a mix. It provides redundancy, which RAID provides, and ensure disk failure doesn’t corrupt the data. It provides compression, which some third-party software provides, in a transparent on-the-fly manner so that applications using the storage don’t have to deal with it. It provides deduplication so that redundant stored data is organized to use less disk space. It provides snapshots and replication, which SAN provides, so that data states can be kept and stored on some external data space. There is no NAS feature in ZFS. But since your operating system can publish the ZFS volumes over the network, you get that feature too.

The ZFS way of managing data

First of all, you’ll need an operating system that knows about ZFS. You can check Solaris, FreeBSD or (probably) any Linux distributions. In my example, I’m going to use Solaris 11 64-bit in trial mode. The OS is booted in VMware Fusion using 3 extra virtual disks of 20GB each.

Should you wonder which ZFS feature you have, you can go with:

# zpool upgrade -v
This system is currently running ZFS pool version 31.
(...)
For more information on a particular version, including supported releases,
see the ZFS Administration Guide.

Regarding the file system, ZFS is like onions… Onions have layers.

ZFS Storage Pool

The pool is the first layer of ZFS. It groups the most basic components of storage: disks. Whether there are real disks, slices or files, a pool will be defined on top of a group of such objects to create a redundant storage space.

If you know about RAID system, think about pools as RAID groups.

Pool creation is just a matter a selecting a bunch of disks and grouping them in your preferred manner. First of all, get the list of the available disks:

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c8t0d0 <VMware,-VMware Virtual -1.0  cyl 2085 alt 2 hd 255 sec 63>
          /pci@0,0/pci15ad,1976@10/sd@0,0
       1. c8t1d0 <VMware,-VMware Virtual S-1.0-20.00GB>
          /pci@0,0/pci15ad,1976@10/sd@1,0
       2. c8t2d0 <VMware,-VMware Virtual S-1.0-20.00GB>
          /pci@0,0/pci15ad,1976@10/sd@2,0
       3. c8t3d0 <VMware,-VMware Virtual S-1.0-20.00GB>
          /pci@0,0/pci15ad,1976@10/sd@3,0
       4. c8t4d0 <VMware,-VMware Virtual -1.0  cyl 1022 alt 2 hd 64 sec 32>
          /pci@0,0/pci15ad,1976@10/sd@4,0
       5. c8t5d0 <VMware,-VMware Virtual -1.0  cyl 1022 alt 2 hd 64 sec 32>
          /pci@0,0/pci15ad,1976@10/sd@5,0
Specify disk (enter its number): ^D

You can concatenate every disks into a single pool (RAID-0 like):

# sudo zpool create pool0 c8t1d0 c8t2d0 c8t3d0
zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
pool0  59,6G    91K  59,6G     0%  1.00x  ONLINE  -
(...)
# zpool status pool0
  pool: pool0
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool0       ONLINE       0     0     0
          c8t1d0    ONLINE       0     0     0
          c8t2d0    ONLINE       0     0     0
          c8t3d0    ONLINE       0     0     0

errors: No known data errors
# zpool destroy pool0

You can secure the pool using mirrored disks at the cost of “loosing” half the disks set space (RAID-1 or RAID-10 like):

# zpool create pool1 mirror c8t1d0 c8t2d0
# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
pool1  19,9G    97K  19,9G     0%  1.00x  ONLINE  -
(...)
# zpool status pool1
  pool: pool1
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0

errors: No known data errors
# zpool destroy pool1

You can finally get a big secured storage array using RAID-Z. Basically RAID-Z is an improved RAID-5 system that solves the “write hole” issue. You can get RAID-Z2 or RAID-Z3 that adds more spare disks at the cost of loosing the space size:

# zpool create puddle raidz c8t1d0 c8t2d0 c8t3d0
# zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
puddle  59,5G   174K  59,5G     0%  1.00x  ONLINE  -
(...)
# zpool status puddle
  pool: puddle
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        puddle      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
            c8t3d0  ONLINE       0     0     0

errors: No known data errors
# zpool destroy puddle

There are many options to pools. You can “improve” pools reliability with spare, log or cache disks. A nice thing to do is to get really fast storage, like vRam or SSD, for log and caching. I’m not sure about how much is required to get decent performance upgrade. But if you look at WD Enterprise-class disks, they have 32MB of cache for 2TB of data. So I guess getting an extra 16GB of RAM or 64GB SSD coupled with 10Krpm disks rather than “only” 15Krpm disks would be a nice deal.

The cache is used to accelerate read operations:

# zpool add puddle cache c8t4d0
# zpool status puddle
  pool: puddle
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        puddle      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
            c8t3d0  ONLINE       0     0     0
        cache
          c8t4d0    ONLINE       0     0     0

errors: No known data errors

The log is used to ack write operations quickly. This enable faster synchronous write operations:

# zpool add puddle log c8t5d0
# zpool status puddle
  pool: puddle
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        puddle      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
            c8t3d0  ONLINE       0     0     0
        logs
          c8t5d0    ONLINE       0     0     0
        cache
          c8t4d0    ONLINE       0     0     0

errors: No known data errors

Should you required a lot of space for log operations, you may use cheap disk mirror rather than single expensive SSD. Or use SSD protected stripes for massive performance needs.

Finally, let’s have a look at all the available options of the pool:

# zpool get all puddle
NAME    PROPERTY       VALUE       SOURCE
puddle  size           59,5G       -
puddle  capacity       0%          -
puddle  altroot        -           default
puddle  health         ONLINE      -
puddle  guid           10123729866998312523  default
puddle  version        31          default
puddle  bootfs         -           default
puddle  delegation     on          default
puddle  autoreplace    off         default
puddle  cachefile      -           default
puddle  failmode       wait        default
puddle  listsnapshots  off         default
puddle  autoexpand     off         default
puddle  dedupditto     0           default
puddle  dedupratio     1.00x       -
puddle  free           59,5G       -
puddle  allocated      184K        -
puddle  readonly       off         -

A pool is a ZFS file system, useable as-is. So you can get other properties:

# zfs get all puddle
NAME    PROPERTY              VALUE                    SOURCE
puddle  type                  filesystem               -
puddle  creation              lun. oct. 10 11:12 2011  -
puddle  used                  123K                     -
puddle  available             39,0G                    -
puddle  referenced            34,6K                    -
puddle  compressratio         1.00x                    -
puddle  mounted               yes                      -
puddle  quota                 none                     default
puddle  reservation           none                     default
puddle  recordsize            128K                     default
puddle  mountpoint            /puddle                  default
puddle  sharenfs              off                      default
puddle  checksum              on                       default
(...)

ZFS Dataset

In the ZFS literature, you’ll often read about “datasets” ; and it was a quite an opaque layer to me.

I didn’t get what this would exactly refer too. In the Sun’s documentation (not page 3 ;-), you can read “A generic name for the following ZFS entities: clones, file systems, snapshots, or volumes”. I went thinking of datasets as a abstracted ZFS storage objects. Either a pool, a file system, a clone, a snapshot. To me, “a dataset” would mean “a ZFS object”.

ZFS Filesystem

In ZFS, a filesystem is a subsystem that inherits properties from its pool and can override some particular ones. A ZFS pool can contain multiple filesystem. Each filesystem is independent from the other and can share or use different options. A filesystem can deal with quotas, compression, deduplication, mount points… and can be shared over the network. By default, a filesystem is automatically mounted and inherit the root of its pool.

Creation of a filesystem is quite straight forward:

# zfs create puddle/mudd
# zfs list -r puddle
NAME          USED  AVAIL  REFER  MOUNTPOINT
puddle        170K  39,0G  36,0K  /puddle
puddle/mudd  34,6K  39,0G  34,6K  /puddle/mudd
# zfs destroy puddle/mudd

Configuring features is a matter of (un)setting parameters. Some parameters, like compression and deduplication, don’t apply on already existing data. For example, if you copy a data on “puddle/mudd”, then activate compression, only new data will be compressed ; not the ones present before compress activation.

# zfs create puddle/basic
# zfs create -o compress=gzip puddle/compress
# zfs create puddle/dedup
# zfs set dedup=on puddle/dedup
# zfs list
NAME                      USED  AVAIL  REFER  MOUNTPOINT
puddle                    265K  39,0G  38,6K  /puddle
puddle/basic             34,6K  39,0G  34,6K  /puddle/basic
puddle/compress          34,6K  39,0G  34,6K  /puddle/compress
puddle/dedup             34,6K  39,0G  34,6K  /puddle/dedup
# zfs list -o name,used,dedup,compression,dedup
NAME                      USED          DEDUP  COMPRESS          DEDUP
puddle                    265K            off       off            off
puddle/basic             34,6K            off       off            off
puddle/compress          34,6K            off      gzip            off
puddle/dedup             34,6K             on       off             on
# df -h
Filesystem            Size  Used Avail Use% Mounted on
(...)
puddle/basic           40G   35K   40G   1% /puddle/basic
puddle/compress        40G   35K   40G   1% /puddle/compress
puddle/dedup           40G   35K   40G   1% /puddle/dedup

By default, a filesystem will require minimum storage and will grow up to the poll size. This means that fill-in a filesystem with data can prevent another filesystem on the same poll from being filled-in. Should you want to carefully manage you disk space, you may use the “quota” and “reservation” features from ZFS:

“quota” ensures that a dataset won’t grow more in size than the applied quota ;
“reservation” ensures that a dataset will get at least that particular storage size.

Note that sub-datasets will inherits disk management from their parents.

ZFS snapshot

A snapshot is a (read-only) image of a filesystem at a particular time. It happens at the filesystem layer. At any time, regarding that you own enough disk space on the pool, you can keep a consistent copy of the filesystem and happen modifications on it that can be rolled back.

Create a snapshot from a dataset:

# zfs list puddle/data
NAME          USED  AVAIL  REFER  MOUNTPOINT
puddle/data  19,7M  12,8G  19,7M  /puddle/data
# zfs snapshot puddle/data@1632
# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@1632                0      -  19,7M  -

From here, the snapshot uses no additional data since no modification has been done.

If you add data to the filesystem, you can see that both datasets are changed:

# zfs list puddle/data
NAME          USED  AVAIL  REFER  MOUNTPOINT
puddle/data  20,3M  12,8G  20,3M  /puddle/data
root@solaris:~# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@1632            20,6K      -  19,7M  -

The small amount of data that have changed in the snapshot dataset is (AFAIK) a list of pointers to data that a new to “puddle/data” and that should be deleted if the snapshot was to be rolled-back.

If you delete your data from your file system, you’ll see that:

# zfs list puddle/data
NAME          USED  AVAIL  REFER  MOUNTPOINT
puddle/data  19,7M  12,8G  34,6K  /puddle/data
root@solaris:~# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@1632            19,7M      -  19,7M  -

As you can see, the filesystem dataset still uses the space but refers to nearly nothing whereas the snapshot refers to the initial amount of data of the filesystem.

If you copy some more data to the filesystem, you’ll get:

# zfs list puddle/data
NAME          USED  AVAIL  REFER  MOUNTPOINT
puddle/data   129M  12,7G   109M  /puddle/data
# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@1632            19,7M      -  19,7M  -

You can see that I just added 109MB of data to puddle/data which is what the dataset refers too. You can also see that it refers to a bit more that this (129MB) which is the old data, referenced by the snapshots, plus the new data. Of course, if I copy data from “/puddle/data”, only the “actual” data will be copied.

Should you want to restore a single file that exists in the snapshot, you may browse to “/puddle/data/.zfs/snapshot/1632/” which is named according to the snapshot name.

In the case you plan to create a nightly snapshot, you can also recall to the last one by calling with a generic name:

# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@1632            19,7M      -  19,7M  -
# zfs snapshot puddle/data@1656
# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@1632            19,7M      -  19,7M  -
puddle/data@1656                0      -   110M  -
# zfs rename puddle/data@1632 puddle/data@yesterday
# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@yesterday       19,7M      -  19,7M  -
puddle/data@1656            21,3K      -   110M  -

When you have enough snapshots to go back to, you can delete old ones using:

# zfs destroy puddle/data@yesterday

Finally, to restore the whole content of the filesystem at the time of the snapshot (AKA rolling-back):

# zfs list puddle/data
NAME          USED  AVAIL  REFER  MOUNTPOINT
puddle/data   129M  12,7G   129M  /puddle/data
# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@1701            28,0K      -   110M  -
# zfs rollback puddle/data@1701
# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@1701            1,33K      -   110M  -
# zfs list puddle/data
NAME          USED  AVAIL  REFER  MOUNTPOINT
puddle/data   110M  12,7G   110M  /puddle/data

The data have been restored and the snapshot is kept up. Should you want not to use it anymore, you’d have to destroy it manually.

ZFS clone

A clone is a (read-write) image of a filesystem at a particular time. Based on a snapshot, a clone is a filesystem that can be used as a “normal” filesystem. It’s initial data are shared from its parent snapshot and can evolve independently. A clone may be use to either happen modifications different from the snapshot filesystem or to access the data from usual software, like backups. A clone can be promoted so that it replaces the initial filesystem from which the snapshot was run.

# zfs snapshot puddle/basic@now
# zfs clone puddle/basic@now puddle/clone
# zfs list -r puddle
NAME              USED  AVAIL  REFER  MOUNTPOINT
puddle           26,3G  12,7G  40,0K  /puddle
puddle/basic     14,0G  12,7G  14,0G  /puddle/basic
puddle/clone     1,33K  12,7G  14,0G  /puddle/clone
(...)

From here, “/puddle/basic” and “/puddle/clone” content is the same but may evolve independently. One could backup “puddle/clone” on tape or low price storage to keep an image of “puddle/basic” at the time of the snapshot. One could also attach “puddle/clone” to some development system to run some testings on production data.

Cleaning happens as usual:

# zfs destroy puddle/clone
# zfs destroy puddle/basic@now

ZFS replication

Data transfer is done using standard UNIX tools and network protocols. But ZFS can also replicates its data as streams. The send/receive operations are based on snapshots and can be done either locally or though the network. The main difference with a clone is that:

the replicated data is not linked to the snapshot anymore ;
the replicated data can be located on some other pool.

One can send the data to some local hardware, for example a tape system:

# zfs snapshot puddle/data@now
# zfs send puddle/data@now > /dev/tape0

One can restore the data from the local hardware onto a non-existent dataset:

# zfs receive puddle/restored < /dev/tape0
# zfs list puddle/data@now puddle/restored puddle/restored@now
NAME                  USED  AVAIL  REFER  MOUNTPOINT
puddle/data@now          0      -   110M  -
puddle/restored       110M  12,5G   110M  /puddle/restored
puddle/restored@now      0      -   110M  -

Note that the restore process created a snapshot instance that you may wish to delete.
If the destination already exists, the “-F” flag can be used to force update.

One can duplicate or move a dataset to another local pool:

# zfs send puddle/data@now | zfs recv rpool/data_exported
# zfs list puddle/data puddle/data@now rpool/data_exported rpool/data_exported@now
NAME                      USED  AVAIL  REFER  MOUNTPOINT
puddle/data               110M  12,5G   110M  /puddle/data
puddle/data@now              0      -   110M  -
rpool/data_exported       110M  7,74G   110M  /rpool/data_exported
rpool/data_exported@now      0      -   110M  -

One can send the data, through the network, to another ZFS system. No matter what the storage hardware is, as soon as both ZFS versions are compatible. This can be use to secure the data to another location:

# zfs send puddle/data@now | ssh remotehost zfs recv pool/data

The previous command lets you send the whole content of the snapshots. This would initiate the disaster recovery data copy. But as time passes and modifications occur to the initial storage zone, you may want to update the remote dataset. To optimize the data transfer, you would only send modifications between the previous synchronization, previous snapshot, and now, current snapshot:

# zfs snapshot puddle/data@monday
# zfs send puddle/data@monday | zfs recv puddle/disaster
(... modifications happen on /puddle/data ...)
# zfs snapshot puddle/data@tuesday
(... modifications happen on /puddle/data ...)
# zfs send -i puddle/data@monday puddle/data@tuesday | zfs recv puddle/disaster
# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
puddle/data@now                 0      -   110M  -
puddle/data@monday              0      -   110M  -
puddle/data@tuesday             0      -   787M  -
puddle/disaster@monday       290K      -   110M  -
puddle/disaster@tuesday         0      -   787M  -
# zfs list puddle/data puddle/disaster
NAME              USED  AVAIL  REFER  MOUNTPOINT
puddle/data       787M  11,1G   787M  /puddle/data
puddle/disaster   787M  11,1G   787M  /puddle/disaster

Send the incremental data over the network and you can daily replicate the data to another datacenter, achieving disaster data protection ; since you have enough bandwidth to get the data between two snapshots.

The ZFS solutions

Now that we got how ZFS works, let’s try to match I.T. issues with ZFS solutions.

Securing storage data

When you have storage for sensitive data, you need to ensure that hardware failure won’t trash your data. The easy way to do that is to use RAID-Z. Depending on your needs, you may use mirror or stripping method. When using stripping method, remember that RAID-Z improves RAID-5 write penalty. When using mirroring, don’t forget that trashy data will be properly copied on the mirrored disk.

To ensure loosing a disk won’t impact your service and data, use RAID-Z.

Provide storage to applications

ZFS and Solaris will allow you to remotely provide storage as iSCSI, CIFS or NFS. Depending on the application that needs to access the data, you have to choose the best option. On the server itself, applications will access the data as a ZFS filesystem.

To provide storage to VMware ESX servers, export ZFS filesystems using iSCSI or NFS.
To provide storage to Windows servers, export ZFS filesystems using iSCSI or CIFS.
To provide storage to Windows workstations, export ZFS filesystems using CIFS.
To provide storage to UNIX or Linux hosts, export ZFS filesystems using iSCSI or NFS.

Freeze data state

Before upgrading software (patches, updates…) or apply modifications to data (upgrade version…), one may wish to keep the data safe. The usual way is to backup the data, apply the modifications and revert the data or delete the backup (depending on the operation’s result). With ZFS, you should make use of the snapshot feature. Providing that you have the storage available and that you can tell you application to dump a stable state of your data (to ensure the snapshot is coherent), just start a snapshot on the dataset, apply the modifications, check the results and commit or roll-back.

To keep data state to be able to roll-back in a fast way, use the ZFS snapshot feature.

Provide independent working data sets

Either for testing or developing purpose, you may have to provide access to the same data sets to various people or team but should ensure the modifications won’t overlap each other. The usual way of doing it is to give access to the data repository that users will locally copy to apply their modifications on ; hence using (too) much storage. Using ZFS, and the clone feature, you will be able to freeze a data set content and provide it to various teams so that they use it without duplicating too much storage. On the application level, you may provide the clone filesystem so that a development instance of your application runs with real production data in an isolated environment.

To provide data copy without duplicating the storage, use the ZFS clone feature.

Minimize backup impact

There are times when backup requires to stop an application from working during the backup length time. Should you want to minimize the offline duration, you may want to use the snapshot and clone features of ZFS. Put you application in backup mode, run a snapshot, take the application back into business ; this should take less than a minute. While in snapshot mode, clone the data and provide it to the backup system. Your application will continue to work while the backup system deals with the data. When the backup is done, delete the clone and the snapshot.

To minimize the impact of backup on application’s availability, combine ZFS snapshot and clone features.

Have the data remotely secured

In the “old” times, critical data were backed up on tapes ; those tapes were sent to some other location to ensure that massive injury on the datacenter wouldn’t impact the backups. Nowadays, with a remote connected site, a secondary ZFS storage system and a well sized network connection, you can automatically send your data from one site to the other using the ZFS replication feature. In the same manner you would run backups, make sure you data are coherent, run a snapshot from their storage dataset and replicate them to the remote ZFS secondary system. When done, delete the snapshot. There are two important things to keep in mind : the overall volume and the modification rate. The initial data replication will represent the initial total volume of data ; every replication will depend on the modification rate from the previous replication step.

Keep in mind that, transferring 100MB of data over a 1Mbps network link would take about 15 minutes when it would take 4 days to transfer 50GB ; transferring 1TB of data would take about 2 hours on a 1Gbps link when it would take about a month on a 1Mbps network line.

To remotely secure your data, use the ZFS snapshot and replication features.

Optimize data storage

Depending on the data type you’re storing and the server’s CPU, you may wish to on-line compress and/or deduplicate the data. Using those ZFS features, your mileage may vary. To store flat text files, you may use compression to achieve a massive storage economy. To store virtual machine disk images, you should use deduplication. An efficient way to setup a file server for Windows or UNIX workstation is to configured a ZFS dataset with the compression option set. To provide storage to your virtualization environment, it is recommended to configure you dataset with deduplication option set. There are no magical spells ; only testings will tell you what’s the best options.

To optimize storage for flat data, use the ZFS on-line compression feature.
To optimize storage for virtualized environment, use the ZFS on-line deduplication feature.

Conclusion

ZFS is quite a complete filesystem. It is really feature-full and can be used in various situations. I hope this little ZFS tour will bring you interest for the filesystem.

Source

http://download.oracle.com/docs/cd/E19082-01/817-2271/index.html
http://dlc.sun.com/osol/docs/content/ZFSADMIN/docinfo.html
http://www.solarisinternals.com/wiki/index.php/ZFS _Best_Practices_Guide