Sources ------- Text: 'Wide-area cooperative storage with CFS' by Frans Dabek, et al. Chord HOWTO, http://www.pdos.lcs.mit.edu/chord/howto.html. Practical: installation of CFS from the CVS repository. Summary ------- A CFS network consists of a number of CFS nodes. CFS users publish (upload) file systems on the CFS network. The file system is signed by a private key belonging to the publisher. The corresponding public key is used as an identifier for readers. CFS provides read-only storage. To update a previously published file system, a publisher must replace the entire file system with the new version. CFS has no explicit delete operation: instead, CFS stores data for an agreed-upon finite interval. A publisher must periodically ask CFS for an extension. To remove data, a publisher stops asking for extensions. For reading, a user of CFS is presented with a regular, read-only file system interface, provided by a CFS 'client'. File systems that have been published on the CFS network are accessed by mounting them in a local directory, using the public key obtained from the publisher to identify the file system. Files, directories and meta-data in a published file system are split into blocks and distributed over the CFS nodes. The CFS network deals with matters such as replication and caching of the blocks. Concepts -------- Node A peer in the CFS network. Somes nodes act as servers, others are part of a client. Server A node that is used to store and serve CFS content. Virtual servers A server can be split into a number of virtual servers in order to control how much data it serves. This is explained later. Client Provides read access to the CFS network. A client consists of two components: a node and an NFS server process that provides the user with a regular, read-only file system interface. Publishing Tool I'm calling it the 'publishing tool' since it is otherwise only known as 'sfsrodb'. This is a tool with which data can be published on the CFS network. It takes a local directory tree, and publishes it (uploads it) as a file system on CFS. File System A directory tree published on a CFS network. Like a regular file system it has files, meta-data (such as directories) and a root directory. Block File systems are split into blocks when published on the CFS network. Each block has a maximum size in the order of tens of kilobytes. Protocol -------- File System Structure A file system is split into a number of blocks, and each block is uploaded independently to the CFS network under its own block id. Most of the CFS network is unaware of file systems, only of blocks. The publishing tool and the client software are responsible for interpreting blocks as file system data. The file system format is similar to that of the UNIX V7 file system, but uses CFS blocks and block identifiers in place of disk blocks and disk addresses. When a file system is published, its files and directories are each split into blocks. The publishing tool inserts the file system's block into the CFS system, using a hash of a block's content (content-hash) as the block's identifier. A parent block contains the identifiers of its children. The root block is signed with the private key (presumably 'a' private key???) of the publisher, and inserted into CFS using the corresponding public key as the root block's identifier. Clients name a file system using the public key of the root block; they can check the integrity of the root block using that key, and the blocks lower in the tree with their content-hashes. A CFS file system is read-only as far as clients are concerned. However, a file system may be updated by its publisher. This involves updating the file system's root block in place, to make it point to the new data. CFS authenticates updates to the root block by checking that it is signed by the same key as the old version. Since it is signed by the same key, external references (i.e. the public key) can remain unchanged. Chord Part of each CFS node is a Chord component. Chord is a distributed lookup system to locate the servers responsible for a block. Chord implements a hash-like operation that maps from block identifiers to servers. Chord assigns each server an identifier drawn from the same 160-bit identifier space as block identifiers. During a Chord lookup operation a number of servers are visited in succession. The amount of space Chord uses in each server, and the number of messages used by a lookup operation are logarithmic in the total number of servers in the CFS network. Load Balancing CFS ensures that the burden of storing and serving data is divided among the servers in rough proportion to their capacity. It does so by a combination of caching and spreading each file's data over many servers. CFS splits each file system (and file) into blocks and distributes those blocks over many servers. This arrangement balances the load of serving popular files over many servers. CFS hides block lookup latency by pre- fetching blocks. During a Chord lookup operation for a block a number of servers are visited in succession. Lookups from different clients for the same block will tend to visit the same servers late in the lookup. CFS takes advantage of this by caching blocks along the lookup path. A Chord lookup for a block terminates when a cached copy is found before reaching the server responsible for the block. Some servers may have more storage or network capacity than others. The amount of data served by each server is controlled by the number of 'virtual servers' it consists of. The CFS protocol (as described in other parts of this document) operates at the virtual server level. A CFS server administrator configures the server with a number of virtual servers in rough proportion to the server's storage and network capacity. Replication CFS replicates blocks over a number of servers. CFS places replicas on servers likely to be at unrelated network locations to ensure independent failure. Quotas CFS limits the amount of data that can be published from any particular IP address. This also helps to counter malicious users that try to use up all the storage capacity in the CFS network Other CFS doesn't yet provide a search facility, but a scalable distributed search engine for CFS is under development. Installation ------------ CFS runs on top of SFS (self-certifying file system). Installing SFS requires root privileges, special accounts. We encountered many installation bugs and problems with portability. So far we have managed a Linux and FreeBSD install. Solaris is still posing problems. Deployment ---------- Reading from CFS is through regular file system operations. However, for this to work an SFS 'client daemon' must be running. Root privileges are required to start the daemon. The daemon allows 'automounting' of CFS file systems trees under a special local directory, e.g. '/sfsrewt'. Once mounted, the CFS file system can be accessed with regular file system operations. A file that is part of a file system published with public key can be accessed under /sfsrewt/chord:/pathname. For example: /sfsrewt/chord:/dirname1/dirname2/file. In the shell it is possible to 'cd' into the mounted file systems and perform various (read-only) operations there. However, some operations (such as 'find') do not appear to work very well.