Search This Blog

Tuesday, October 23, 2007

The Future of File Systems: Jeff Bonwick and Bill Moore Explain ZFS

In this interview, ACM Queue speaks with two Sun engineers who are
bringing file systems into the 21st century. Jeff Bonwick, CTO for
storage at Sun, led development of the ZFS file system, which is now
part of Solaris. Bonwick and his co-lead, Sun Distinguished Engineer
Bill Moore, developed ZFS to address many of the problems they saw
with current file systems, such as data integrity, scalability, and
administration. Bonwick and Moore explain what makes ZFS such a big
leap forward. "One of the design principles we set for ZFS was: never,
ever trust the underlying hardware. As soon as an application generates
data, we generate a checksum for the data while we're still in the same
fault domain where the application generated the data, running on the
same CPU and the same memory subsystem. Then we store the data and the
checksum separately on disk so that a single failure cannot take them
both out. When we read the data back, we validate it against that
checksum and see if it's indeed what we think we wrote out before. If
it's not, we employ all sorts of recovery mechanisms. Because of that,
we can, on very cheap hardware, provide more reliable storage than you
could get with the most reliable external storage. It doesn't matter
how perfect your storage is, if the data gets corrupted in flight --
and we've actually seen many customer cases where this happens -- then
nothing you can do can recover from that. With ZFS, on the other hand,
we can actually authenticate that we got the right answer back and,
if not, enact a bunch of recovery scenarios. That's data integrity.
Another design goal we had was to simplify storage management. When you
're thinking about petabytes of data and hundreds, maybe even thousands
of disk drives, you're talking about something that no human would ever
willingly take care of. ZFS is composed of several layers,
architecturally, but the core of the whole thing is a transactional
object store. The bulk of ZFS, the bulk of the code, is providing a
transactional store of objects. You can have up to 264 objects, each
264 bytes in size, and you can perform arbitrary atomic transactions
on those objects. Moreover, a storage pool can have up to 264 sets of
these objects, each of which is a logically independent file system.
Given this foundation, a lot of the heavy lifting of writing a Posix
file system is already done for you..." More Information See also 'What is ZFS?': Click Here

No comments: