In TFS, files are divided into variable-size chunks. Each chunk is identified by an immutable
and globally unique 64 bit chunk handle assigned by the master at the time of chunk creation.
Chunk servers store the chunks on the local disks and read/write chunk data specified by a
chunk handle and byte range. For the data reliability, each chunk is replicated on multiple
chunk servers. By default, we maintain three replicas in the system, though users can
designate different replication levels for different files.
The master maintains the metadata of file system, which includes the namespace, access control
information, the mapping from files to chunks, and the current locations of chunks. It also
controls system-wide activities such as garbage collection of orphaned chunks, and chunk
migration between chunk servers. Each chunk server periodically communicates with the master
in HeartBeat messages to report its state and retrieve the instructions.
TFS client module is associated with each application by integrating the file system API, which
can communicate with the master and chunk servers to read or write data on behalf of the
application. Clients interact with the master for metadata operations, but all data-bearing
communication goes directly to the chunkservers.
The system is designed to minimize the master’s involvement in file accessing operations. We do
not provide the POSIX API. Besides providing the ordinary read and write operations, like GFS,
we have also provided an atomic record appending operation so that multiple clients can append
concurrently to a file without extra synchronization among them. In the system implementation,
we observe that the record appending operation is the key operation for system performance. We
design our own system interaction mechanism which is different from GFS and yields better
record appending performance.