Title

Extendable storage framework for reliable clustered storage systems

Date of Completion

January 2010

Keywords

Engineering, Computer|Engineering, Electronics and Electrical

Degree

Ph.D.

Abstract

The total amount of information stored on disks has increased tremendously in recent years with data storage, sharing and backup becoming more important than ever. The demand for storage has not only changed in size, but also in speed, reliability and security. These requirements not only create a big challenge for storage administrators who must decide on several aspects of storage policy with respect to provisioning backups, retention, redundancy, security, performance, etc. but also for storage system architects who must aim for a one system fits all design. Storage policies like backup and security are typically set by system administrators for an entire file system, logical volume or storage pool. However, this granularity is too large and can sacrifice storage efficiency and performance—particularly since different files have different storage requirements. In the same context, clustered storage systems that are typically used for data storage or as file servers, provide very high performance and maximum scalability by striping data across multiple nodes. However, high number of storage nodes in such large systems also raises concerns for reliability in terms of loss of data due to failed nodes, or corrupt blocks on disk drives. Redundancy techniques similar to RAID across these nodes are not common because of the high overhead incurred owing to the parity calculations for all the files present on the file system. In the same way, data integrity checks are often omitted or disabled in file systems to guarantee high throughput from the storage system. This is because not all the files require redundancy or data protection mechanism, and achieving higher throughput outweighs the need to have these computationally expensive routines in place. ^ In this thesis, we begin by studying the I/O access patterns of different applications that typically use clustered storage. The study helps us understand the application requirements from a file system. We then propose a framework for an attribute-based extendable storage system which will allow storage policy decisions to be made at a file-level granularity and at all levels of the storage stack, including file system and device managers. We propose to do this by using a file's extended attributes that will enable different defined tasks via plugins or functions implemented at various levels within the storage stack. The applications can set extended attributes for their files, or directories, and extract a complete content-aware storage functionality from the storage system stack. We present a stackable user-space file system which will make it easier to write and install these plugins. Using our stackable file system technique, these plugins can be programmed in user-space and mounted by non-privileged users. We also provide two scenarios where our framework can be used to provide an overall improved performance for a reliable clustered storage system. ^