Caching MBoxes (10 Feb 2003)
Designs for stuff. More so that I don't forget really.
For mail servers that handle mboxes (POP/IMAP) it's a real pain when you have to parse the whole mbox every time. Especially when the mboxes are large. Especially, especially when said mboxes are NFS mounted.
So the simple observation is that mboxes are append only and other mail clients will only change something in the middle if they are deleting a message. Thus:
- When you parse an mbox, cache what information you need (like the length and seek position of each message) and store it. Also store an MD5 of the first n bytes of the last message. (n should be big enough to cover the headers that mail servers insert that contain uniqueids)
- When you open an mbox, check to see if the cache file is more recent.
- If it is, just load it.
- Otherwise, check to see if the MD5 sum still matches.
- If it doesn't then a message has been deleted. Hopefully people will generally only use one mail client so messages won't be deleted from the mbox by other clients too often. So just reparse the mbox (and cache the result, of course)
- If the MD5 still matches then you only need to parse from the last known place in the mbox to get the new messages.
- When you delete messages `yourself' (e.g. a DELE or EXPUNGE command) then you can update the cache to save reparsing next time.
The above design is pretty much implemented and has a POP3 server wrapped around it. It still needs a fair amount of work tidying it up but I might stick it up here at some point. It was going to be an IMAP4 server, but having seen the IMAP protocol I don't think that's going to happen.
NFS is generally pretty delicate. And while other projects aim to fix it properly I'm going to leave it well alone.
So, the general design at the moment is to put a box (call it bastion) in front of the NFS server (call it falcon) that handles all the traffic for it. The clients use a tuntap to direct NFS traffic down an RC4 encrypted TCP tunnel to bastion. Bastion then sends decrypts it and sends the packets onto falcon, which is none the wiser.
- I use RC4 because bastion is going to be doing a lot of decryption and RC4 is fast. The network is reasonably secure from sniffing and RC4 is still a decent algorithm.
- Some security comes from the fact that you have to have a valid secret key for bastion before you can talk to the NFS server. Thus you cannot just plug a laptop into the wall, you at least have to get a key from a valid client.
- Hiding the key on the client is a real pain. A TCPA motherboard would help a lot, but we don't have any. Bascailly, keys are going to be compromised.
- Bastion can intercept NFS mount packets and only let pass ones which it considers valid. This allows user-level authentication but the details are still to be worked out. Possibly a wrapper around PAM and SSH would manage most of the details.
- At the moment, it's wide open so anything is raising the bar at least