Why Application Level Filtering in Tor is Bad (10 Feb 2005)
Background for those who need it: Tor is an onion routing network for TCP streams that allows users to be fairly anonymous while using HTTP/IRC etc. The TCP connections are bounced round the Tor nodes and come out somewhere unrelated to the real source of them.
Of course, over such networks abuse happens. At the moment the most concerning is the spamming of Usenet via Google Groups. Not all Tor nodes allow themselves to be the final (exit) node in the chain as that node is where the connection appears to be coming from (at the IP level) to whoever is the target of the connection. Those that do only allow certain destination ports - 80 is a very common one. Thus people can use Tor to access Google Groups and post spam to Usenet.
(It's suspected, for a number of reasons, that people are doing this in order to trigger complaints to the ISP of the exit nodes and thus it's an attack on the Tor network as a whole. Some people truly believe that anonymity is evil.)
Tor exit nodes can refuse to connect to Google Groups and this is reported in the Tor network wide directory of nodes. Thus clients can check which nodes will support a connection to the destination that they require and choose those nodes as exit nodes. However, a running game of blocking websites used for abuse is probably an unwinnable game. Also, why shouldn't people be able to read Google Groups over Tor? It's only posting that is concerning.
Thus some people (e.g. myself) have suggested that the exit nodes should be able to parse outgoing connections (HTTP being a very good example) and reject POST requests and the like. Here's why this is a bad idea.
This policy could be described in the directory, as IP based policies currently are but they can't be used because the first Tor node (client) cannot know if the browser is going to need to POST before creating the connection, and the exit node is chosen at that point. Thus the exit nodes are chosen randomly and some will have POST blocked.
Tor users then experience random failure of posting. Sometimes it will work, sometimes is doesn't. So the whole network will be dragged down to the level of the most restrictive exit node - because anything else will randomly fail.