Tuesday, 21 October 2008

Tuning NFS

This is a brief blog outlining some general tweaking guidelines for NFS. Generally as NFS itself is relatively simplistic. You will notice the most change in performance by tweaking you network environment and the storage systems.
Below are a couple of idea which will hopefully improve your NFS performance.

One of the ways to improve performance in an NFS environment is to limit the amount of data the file system must retrieve from the server. This limits the amount of network traffic generated, and, in turn, improves file system performance. Metadata to be retrieved on the client and updated on the server includes:

  • access time
  • modification time
  • change time
  • ownership
  • permissions
  • size

Under most local file systems this data is cached in RAM and written back to
disk when the operating system finds the time to do so. The conditions under which NFS runs are far more constrained. An enormous amount of excess network traffic would be generated by writing back file system metadata when a client node changes it. On the other hand, by waiting to write back metadata, other client nodes do not see updates from each other, and applications that rely on this data can experience excess latencies and race conditions. Depending on which applications use NFS, the attribute cache can be tuned to provide optimum performance. The following mount options affect how the attribute cache retains attributes:

  • acregmin – The amount of time (in seconds) the attributes of a regular file must be retained in the attribute cache.
  • acregmax – The amount of time (in seconds) the attributes of a regular file may remain in cache before the next access to the file must refresh them from the server.
  • Acdirmin – Same as acregmin, but applied to directory inodes
  • Acdirmax – Sam as acregmax, but applied to directory inodes

There are also two settings: actimeo, which sets all four of the above numbers to the same value, and noac, which completely disables the attribute cache. By increasing these values, one can increase the amount of time attributes remain in cache, and improve performance.

One of the most important client optimization settings are the NFS data transfer buffer sizes, specified by the mount command options rsize and wsize. E.g.:

# mount server:/data /data -o rsize=8192,wsize=8192

This setting allows the NFS server to reduce the overhead of client-server communication, allowing it to send larger transactions to the NFS server when the server is available. By default, most NFS clients set their read and write size to 8Kb, allowing a read or write NFS transaction to transfer up to 8Kb of file data. This transaction consists of an NFS read/write transaction request and a set of packets. In the case of a write, the payload is data carrying packets; if it’s read, they are response packets. By increasing the read and write size, fewer read/write transactions are required, which means less network traffic and better performance.

However setting the rsize and wsize to a figure above your mtu (usually set to 1500) will cause IP fragmentation when using NFS over UDP. IP Fragmentation and re-assembly require a significant amount of CPU resource at both ends of a network connection.
In addition, packet fragmentation also exposes your network traffic to greater unreliability, since a complete RPC request must be retransmitted if a UDP packet fragment is dropped for any reason. Any increase of RPC retransmissions, along with the possibility of increased timeouts, are the single worst impediment to performance for NFS over UDP.
Packets may be dropped for many reasons. If your network is complex, fragment routes may differ, and may not all arrive at the Server for reassembly. NFS Server capacity may also be an issue, since the kernel has a limit of how many fragments it can buffer before it starts throwing away packets. With kernels that support the /proc filesystem, you can monitor the files /proc/sys/net/ipv4/ipfrag_high_thresh and /proc/sys/net/ipv4/ipfrag_low_thresh. Once the number of unprocessed, fragmented packets reaches the number specified by ipfrag_high_thresh (in bytes), the kernel will simply start throwing away fragmented packets until the number of incomplete packets reaches the number specified by ipfrag_low_thresh.

Two mount command options, timeo and retrans, control the behavior of UDP requests when encountering client timeouts due to dropped packets, network congestion, and so forth. The -o timeo option allows designation of the length of time, in tenths of seconds, that the client will wait until it decides it will not get a reply from the server, and must try to send the request again. The default value is 7 tenths of a second. The -o retrans option allows designation of the number of timeouts allowed before the client gives up, and displays the Server not responding message. The default value is 3 attempts. Once the client displays this message, it will continue to try to send the request, but only once before displaying the error message if another timeout occurs. When the client reestablishes contact, it will fall back to using the correct retrans value, and will display the Server OK message.

If you are already encountering excessive retransmissions (see the output of the nfsstat command), or want to increase the block transfer size without encountering timeouts and retransmissions, you may want to adjust these values. The specific adjustment will depend upon your environment, and in most cases, the current defaults are appropriate.
Most startup scripts, Linux and otherwise, start 8 instances of nfsd. In the early days of NFS, Sun decided on this number as a rule of thumb, and everyone else copied. There are no good measures of how many instances are optimal, but a more heavily-trafficked server may require more. You should use at the very least one daemon per processor, but four to eight per processor may be a better rule of thumb. If you want to see how heavily each nfsd thread is being used, you can look at the file(s) in /proc/fs/nfsd. The last ten numbers on the n'th line in that file indicate the number of seconds that the thread usage was at that percentage of the maximum allowable. If you have a large number in the top three deciles, you may wish to increase the number of nfsd instances. This is done upon starting nfsd using the number of instances as the command line option, and is specified in the NFS startup script (/etc/rc.d/init.d/nfs on Red Hat) as RPCNFSDCOUNT. See the nfsd(8) man page for more information.

In general, server performance and server disk access speed will have an important effect on NFS performance. However the above will help you to start tweaking the system to improve the performance of your NFS resources.

These are just a couple of suggestions as to how to improve the general performance of your NFS systems. The list above is no where near exhaustive, however it points you in the right direction. Tuning the underlying technologies is generally the easiest way to see gain in performance with NFS and NFS itself in not that complex a file sharing protocol.

Tuesday, 14 October 2008

Linux NFS General Setup Guide

The Network File System (NFS) was originally developed by Sun Microsystems in 1983, to allow a user on a client PC to access files over the network with as much ease as if accessing a local disk.


A unique aspect of NFS, which makes it appear seamless to the end-user, is that connecting to a NFS share does not require a password, and files on the server appear in every respect to be a users' own files. Security is enforced by limiting access to trusted hosts, and by using the standard Linux file system permissions. The UIDs and GIDs of users are mapped from the server to the client. Therefore, if a user on a client has the same UID and GID as a user on the server, they have access to files in the NFS share owned by that UID and GID. NFS is easy to grasp, which leads to quick and easy configuration. It is worthwhile to mention, that NFS does not use any encrypted communication, making it possible for data to be caught in transmission by a third-party. Also, improper management of users on a client can incorrectly give file access to the wrong user. However it is critical that the administrators of all allowed hosts are trusted.



In a general Linux scenario, in which one machine (the client) requires access to data stored on another machine (the NFS server) the general procedure for setting up the NFS environment is as follows:


  1. The server runs NFS daemon processes (running by default as nfsd) in order to make its data generically available to clients.
  2. The server administrator determines what to make available, exporting the names and parameters of directories (typically using the /etc/exports configuration file and the exportfs command).
  3. The server security-administration ensures that it can recognize and approve validated clients.
  4. The server network configuration ensures that appropriate clients can negotiate with it through any applicable firewall system.
  5. The client machine requests access to exported data, typically by issuing a mount command.
  6. If all goes well, users on the client machine can then view and interact with mounted filesystems on the server within the parameters permitted.


NFS shares are stored in the /etc/exports file, and are specified one per line. Each share can contain several hosts/options declarations. The general syntax is:


/share_path hosts_1(options) hosts_2(options) ... hosts_n(options)


For example:


/share *(ro,all_squash) 192.168.1.152(rw,root_squash)


(For more examples, see the exports man page).


To mount a NFS share on a client, add the share to the /etc/fstab file. The syntax is:


host:/share_path /local_mount_point nfs options dump fsck


For example:


server.openminds.local:/data /share nfs defaults 0 0


The hosts permitted to connect to a share can be specified in several ways (taken from the exports(5) man page):


  • Single host - This is the most common format. You may specify a host either by an abbreviated name resolvable in DNS or the hosts file, the fully qualified domain name, or an IP address.


  • Netgroups - NIS netgroups may be given as ‘@group’. Only the host part of each netgroups member is considered in checking for membership. Empty host parts or those containing a single dash (-) are ignored.


  • Wildcards - Machine names may contain the wildcard characters * and ?. This can be used to make the exports file more compact; for instance, *.openminds.local matches all hosts in the domain openminds.local. As these characters also match the dots in a domain name, the given pattern will also match all hosts within any sub-domain of openminds.local (e.g. host.subdom.openminds.local).


  • IP networks - You can also export directories to all hosts on an IP (sub-) network simultaneously. This is done by specifying an IP address and netmask as address/netmask where the netmask can be specified in dotted-decimal format, or as mask length (for example, either `/255.255.255.0' or `/24').
    N.B. Wildcard characters generally do not work on IP addresses, though they may work by accident when reverse DNS lookups fail.


There are many options for shares. Here are some of the most common:


  • root_squash - Maps the root user to the nobody user. This prevents a user logged in as root on a client to gain root file access permissions on the server.
  • no_root_squash - Does not map the root user to the nobody user. The root user on a client has the same rights as the root user on the server.
  • all_squash - Maps all the UIDs and GIDs to the nobody user. This is useful if the share is to have anonymous access, much like an anonymous FTP server.
  • anonuid, anongid - If root_squash and all_squash are used, the UIDs and GIDs are mapped to the specified UID and GID instead of the nobody user.
  • ro - Forces all files on the share to be read-only. This is the default behavior.
  • rw - Allows write access to the share.
  • sync - Ensures data is written to disk before another request is serviced.


The NFS server is controlled with the /sbin/rcnfsserver script, and the client is controlled with the /sbin/rcnfs script. The major options are:

  • start
  • stop
  • restart

For example, to start the server, execute:

# rcnfsserver start