An old adage about the Internet is that it "interprets censorship as damage and routes around it." The more you try to restrict access to something, the more ways people find to get to it.
Governments like those in Iran or mainland China place restrictions on the Internet with software, and individuals work their ways around those restrictions with more software. The end result is an arms race: here, a country blocks YouTube or Facebook; within days (or even hours), people inside and outside that country engineer ways to work around the block.
There's no one way to do this, but they all have a few things in common. They require some participation by people on the other side of the firewall, who can allow requests for non-blocked content to be used to deliver blocked content. How they do this varies, but that one technique lies at the heart of just about all efforts to circumvent censorship.
I'm going to look at several of the major software technologies used to perform that kind of circumvention. Some of them require nothing more than installing a simple software package; some are more convoluted. Each of them comes with risks and shortcomings, which in turn also must be worked around.
Tor (The Onion Router)
Tor is nominally used for the sake of anonymity, but also works as a circumvention tool, and its decentralized design makes it resilient to attacks. It started as a U.S. Naval Research Laboratory project but has since been developed by a 501(c)(3) nonprofit, and is open source software available for a variety of platforms. Human Rights Watch, Reporters without Borders, and the United States International Broadcasting Bureau (Voice of America) all advocate using Tor as a way to avoid compromising one's anonymity. With a little care, it can also be used to route around information blocking.
The concept behind Tor is simple enough. Out there are a whole slew of servers that are part of the worldwide Tor network. Connect to one as a proxy, and your Internet requests are routed at random through other servers in the Tor network. Requests between Tor nodes are encrypted. By the time the request emerges from Tor's network and is sent on to the server in question, its origins have been heavily obfuscated. If you want, it is possible to pick a specific entry and exit node, or even to forcibly exclude specific exit nodes.
The advantages ought to be clear. For one, there's no immediate way to tell where the connection is originating from, geographically: a request made in the United States could emerge from the Tor network somewhere in Poland. Another major feature of Tor is the hidden service protocol, which makes it possible to use the Tor cloud to anonymously publish a Web site or provide other network services, although only for people directly connected to the Tor network. Tor also works with just about any Internet application, since it works via the SOCKS proxy interface.
Tor's makers provide a number of different software bundles to make Tor easy to work with. Among them is Torbutton, a Firefox add-on that makes it possible to connect to Tor through Firefox via a single mouseclick. Vidalia provides a graphical control panel for Tor, giving easy access to most of its commonly changed options. Other people have also found interesting ways to repackage Tor's functionality. The makers of the Ironkey encrypted USB drive package the Firefox browser with the drive, and include the ability to route Firefox through Tor via the Torbutton.
Tor's not without drawbacks, although they can be compensated for in varying degrees. For one, Tor doesn't (and in many ways, can't) prevent traffic entering and exiting the network from being monitored. This impacts anonymity indirectly. It isn't as easy to trace back the traffic from a given exit node, but any personally identifiable information sent through Tor can be spied on at either the entrance or exit. It's a little like using a pay phone to make a tip to the police when that phone's been bugged by criminals themselves.
Tor also doesn't by itself encrypt any traffic between an exit node and the final endpoint (for instance, an email server), so the only way to guarantee the contents of the data all the way through is to use end-to-end encryption, e.g., SSL. Tor's creators have tried to counterbalance all this by providing a FAQ about how not to compromise your anonymity.
Note that services published via the hidden service protocol don't have these problems, since they're confined to the Tor network -- but that's also their biggest drawback, since only Tor users can reach them.
Because Tor's creators have tried to strike a balance between protecting anonymity and making concessions to the rest of the Internet, a number of things built into Tor can work against the people who use it. Port 25 is blocked by default, for instance, so it's not possible to use Tor as an anonymous spam relay. Many peer-to-peer ports are also blocked, since using P2P software on Tor is considered a breach of etiquette, and hogs bandwidth needed by all.
Most importantly, Tor makes it possible for Web sites to set different access policies for Tor users vs. regular users by publishing lists of Tor nodes through a queryable service.
Developed by Bennett Haslelton of the anti-Internet-censorship site Peacefire.org, Circumventor works a little bit like Tor in that each machine running the Circumventor software is a node in a network.
Circumventor is most commonly used to get around the Web-blocking system in a workplace or school. The user installs Circumventor on an unblocked PC -- e.g., their own PC at home -- and then uses their home PC as a proxy. Since most blocking software works by blocking known Web sites and not random IP addresses, setting up a Circumventor instance ought to be a bit more effective than attempting to use a list of proxies that might already be blocked.
Installing Circumventor requires that you set up three different components in succession -- a copy of Perl (the language used for Circumventor's core scripts), the OpenSA Web server, and the Circumventor scripts themselves. The default Circumventor distribution is designed to be installed on Windows machines, but the core scripts can be run under Linux; they just have to be set up by hand.
Another way to make use of Circumventor without actually installing it is to use the StupidCensorship.com site. From there, a user can sign up for a mailing list that provides updates on public Circumventor sites, which change constantly. Since it's entirely likely that those who are in charge of managing block lists are themselves subscribed to the mailing list, users have to stay vigilant and try different proxies as they are added. The same site also works as a proxy itself, provided it's not blocked, and runs both PHProxy and Glype.
One point of concern about Circumventor: the software at the core of the project, CGIProxy, has not been updated since December 2008.
The Glype proxy has been created in the same spirit as Circumventor. It's installed on an unblocked computer, which the user then accesses to retrieve Web pages that are normally blocked. It's different from Circumventor in that it needs to be installed on a Web server running PHP, not just any old PC with Internet access. To that end, it's best for situations where a Web server is handy or the user knows how to set one up manually.
Setting up Glype itself is easy, though -- the admin unpacks the files into a folder on a Web server that supports PHP, and the rest is almost entirely self-configuring.
There are two basic ways to use Glype: as-is with minimal options, or with a configuration panel installed that lets you control a great many under-the-hood settings. The as-is version only lets you change a couple of basic options, such as whether or not to load cookies or embedded objects (e.g., Flash), or if the target URL or fetched pages should be encoded to avoid being intercepted by other filters. Most static Web pages -- e.g., Wikipedia, text-only news sites -- work fine without tinkering. If only one person is using Glype, this basic version should more than do the job.
The expanded options, though, typically come into play when setting up a Glype instance that's being used by others. Add the control panel -- which involves nothing more than uploading a few more pages to the Glype site -- and an admin can set policies on a great many things. The Glype instance can perform activity logging, local caching of retrieved Web pages, enforce load-limiting measures, add a footer to any retrieved document, block specific IP addresses, prevent direct hotlinking to proxied pages, or create unique URLs for each page visited, which increases privacy.
I mentioned before that static pages work fine, but anything beyond that can be a crapshoot. YouTube, for instance, loads pages but not videos -- although a separately-provided plug-in fixes this. (Video sites in general are unreliable when accessed through Glype.) Consequently, Glype seems to work best when dealing with "straight" Web pages. The last revision of the program was in January 2009, so it's not clear if issues like these are going to be fixed in the core code.
Public And For-Pay Web Proxies
The above software programs, and others like them, are also available through networks of public proxies. Proxy.org lists a great many Glype-powered proxies on its front page, with the option to choose one at random. The obvious problem with such a system is the complete lack of a pedigree for such proxies. You have no idea what you're connecting to or who's listening. Using SSL across such a connection is probably mandatory -- assuming the proxy you're using even supports such a thing. (Many don't.)
Some proxy providers sell access to more advanced tiers of service. Proxify and Socksify, a brother-and-sister pair of services based out of N.Y., work along this model: they have a basic, free level of proxying service through their Web site, but they sell premium access as well. Premium access in this case includes built-in SSL support, higher bandwidth, and no restrictions on content types (the free service blocks video and audio streams). Sockisfy, sold separately, lets the user connect applications directly to the proxy network instead of going through a Web interface.
Over time, users have discovered a whole slew of other, indirect ways to circumvent Web-blocking systems. They're catch-as-catch-can, and are mostly used when nothing else is available.
One common method is to use the Coral Cache, or Coral Content Distribution Network, a peer-to-peer Web mirroring system originally designed to relieve congestion on heavily-trafficked Web servers. If the various CCDN servers are not blocked, a user can see a copy of a Web site in the Coral Cache by appending .nyud.net:8090 (or .nyud.net:8080) to the end of the domain name in the URL. Many Web-filtering programs already block the Coral Cache by default, however, which makes it of relatively limited use.
Google can sometimes be used as a proxy-defeating system through a clever hack: the page-translation service. If you request a page via Google Translate, select English as the target language, and use an arbitrary original language -- for instance, Arabic, when the original page isn't in Arabic at all -- you can get some pages to load as-is. This doesn't work with all sites, though; for instance, with the New York Times, it triggers a redirect to a "Page not found" error. Also, the user has to know the target URL in the first place -- although that doesn't exclude the possibility of, for instance, retrieving a site's homepage and then drilling down from there to the needed page. And finally, this assumes that Google itself is accessible at all.
The Future: The Unending Arms Race
One important question that comes up in the wake of the use of proxies: how much trust can people place in any given proxy method? The very nature of proxies makes them tough to trust, and each incarnation I've looked at has different trust issues. It comes down to a tradeoff between decentralization and control.
A decentralized proxy network like Tor is harder to shut down, but it's also that much harder for the pedigree of any one part of the network to be verified. (At least Tor's creators are aware of this.)
A commercial service like Proxify in theory has more oversight over its own nodes, but it's an open question if they are that much more trustable by dint of being that much more centralized. Proxify's terms of service agreement is also very explicit that they provide the service as-is and entirely on their own terms -- something most anyone providing a proxy would want to spell out ahead of time to avoid legal entanglements.
The only truly trustable proxy would be one set up independently -- although with that, what you gain in personal control, you lose in resilience to outside attack, since one node is far easier to shut down than six hundred.
There seems little question that the struggle between censors and citizens will remain an arms race, with censorship worked around almost as quickly as it's put into place. The question remains: if such filters are so unreliable and so routinely dodged, why do governments or other groups bother to try blocking information at all?
The answer is simple: it's symbolic, not tactical. It's more about what forms of speech a given government or organization wants to show disfavor for, and not about actually preventing information from reaching people. In the long run, it's impossible to suppress any one piece of information completely -- but few people want to be seen as tacitly condoning things that aren't in their best interest, and so up go the firewalls.
Since it's unlikely those attitudes will change anytime soon -- especially in regimes like North Korea, where information control is the very lifeblood of the state -- the arms race will continue. And the growing sophistication of the services available out there only means there will be that many more ways to route around the damage.