wiki:AFS

(Modified from a document originally written by Greg Hudson)

Athena workstations use a filesystem called AFS and the current implementation is  OpenAFS. Running AFS allows workstations to access files under the /afs hierarchy. Of particular interest are the MIT parts of this hierarchy: /afs/athena.mit.edu, /afs/sipb.mit.edu, /afs/net.mit.edu, /afs/ops.mit.edu, and /afs/zone.mit.edu. (/afs/dev.mit.edu was decommissioned in Februrary 2012).

Unlike NFS, AFS includes two layers of indirection which shield a client from having to know what hostname a file resides on in order to access it. The first layer of indirection is "cells", such as athena.mit.edu. Each workstation has a directory of cells in /usr/vice/etc/CellServDB, which it can use to look up the database servers for a cell name. If a cell's database servers change, each client's CellServDB has to be updated, but the canonical paths to files in that cell do not change. A canonical CellServDB file is maintained at grand.central.org (an OpenAFS community resource). IS&T Server Operations maintains a local canonical CellServDB at /afs/athena.mit.edu/service/CellServDB. A cron job on zulu compares this file to our most recent copy and alerts debathena-root if the files differ. Debathena developers must take action to incorporate the changes into a new version of debathena-afs-config and push it out to the APT repo. Some AFS clients can check for DNS SRV records for cells which are not listed in CellServDB.

The second layer of indirection is the volume location database, or VLDB. Each AFS cell's contents are divided into named volumes of files which are stored together; volumes refer to other volumes using mountpoints within their directory structure. When a client wishes to access a file in a volume, it uses the VLDB servers to find out which file server the volume lives on. Volumes can move around from one file server to another and clients will track them without the user noticing anything other than a slight slowdown.

AFS has several advantages over traditional filesystems:

  • Volumes can be moved around between servers without causing an outage.
  • Volumes can be replicated so that they are accessible from several servers. (Only read-only copies of a volume can be replicated; read/write replication is a difficult problem.)
  • It is more secure than traditional NFS. (Secure variants of NFS are not widely implemented outside of Solaris.)
  • AFS clients cache data, reducing load on the servers and improving access speed in some cases.
  • Permissions can be managed in a (not strictly) more flexible manner than in other filesystems.

AFS has several unusual properties which sometimes causes software to behave poorly in relationship to it:

  • AFS uses a totally different permissions system from most other Unix filesystems; instead of assigning meanings to a file's status bits for the group owner and the world, AFS stores an access control list in each directory and applies that list to all files in the directory. As a result, programs that copy files and directories will usually not automatically copy the permissions along with them, and programs that use file status bits to determine in advance whether they have permission to perform an operation will often get the wrong answer.
  • It is not possible to make a hard link between files in two different AFS directories even if they are in the same volume, so programs which try to do so will fail.
  • It is possible to lose permissions on an AFS file because of changing ACLs or expired or destroyed tokens. This is not possible for a local filesystem and some programs don't behave gracefully when it happens in AFS.
  • It is possible for close() to fail in AFS for a file which was open for writing, either because of reaching quota or because of lost permissions. This is also not possible for a local filesystem, and as a result, many programs don't deal gracefully with this situation.
  • AFS is a lot slower than local filesystem access, so software which peforms acceptably on local disk may not perform acceptably when run out of AFS. Some software may even perform unacceptably simply because a user's home directory is in AFS, even though the software itself comes from local disk.
  • It is not possible to create sockets in AFS.
  • Some file locking operations fail in AFS.

All in all, the biggest issues with AFS is that most consumer-grade software (desktop environments like GNOME are by far the worst offenders) naively assumes that all user home directories are on local disk, fully accessible by root, in a filesystem with traditional UFS permissions, where locks, FIFOs, and sockets can be created.

AFS uses Kerberos 4 to authenticate internally. Since it is not reasonable for AFS kernel code to read Kerberos credential caches directly, AFS-specific credentials are stored into the kernel as "tokens". The kernel looks up tokens using a "process authentication group" or PAG, which is stored in the user's group list. If there is no PAG in the user's group list, the kernel falls back to looking up tokens by uid, which would mean that two separate logins would use the same tokens and that a user who does an "su" no longer uses the same tokens. Athena workstations do their best to ensure that each login gets a fresh PAG.