Deconstructing the Great Firewall of China

Posted by on March 8th, 2016
May 11th, 2016

In our last post on censorship, we surveyed a range of countries around the world that engaged in content filtering on the Internet. Among them was one system of censorship whose sophistication stood head and shoulders above the rest, and that was the Great Firewall of China. The many methods of enforcement that make up the Great Firewall come together to ensure that online content aligns with the party line. We’ll devote this post to further exploring the technicalities of many of these practices. If this analysis piques your interest in China’s vast censorship machine, check out our posts on the Great Cannon, China’s new man-in-the-middle attack tool, and the ongoing war between the Great Firewall and opposing circumvention tools.

Technical Components

In China, the Internet developed with choke points built into it. Virtually all Internet traffic between China and the rest of the world travels through a small number of fiber-optic cables that enter the country at one of ten different backbone access points, seven of which had only been added in January 2015. A limited number of international entry points, coupled with the fact that all Internet service providers in China are licensed and controlled by the Ministry of Industry and Information Technology, mean that Chinese authorities can analyze and manipulate Internet traffic much more easily than, say, the United States. Built in 1999, the Great Firewall is the blanket term for the collection of techniques used to filter traffic in China. However, it’s a bit of a misnomer since not all filtering occurs strictly at China’s borders, nor does all traffic flow through a firewall, as we’ll soon see. In this section, we’ll dive into the technology behind the major technical elements that comprise the Great Firewall.

IP blocking

The method of blocking IP addresses is generally very low cost and easy to deploy. Equipped with a blacklist of undesirable IP addresses, routers drop all packets destined to blocked IPs, which could include the address of a sensitive site like the New York Times, or of a public DNS resolver like Google’s. In China, an IP blacklist is injected via BGP using null routing. Null routes for destinations on a provided IP blacklist are propagated into the network, forcing routers to drop all traffic bound for blocked IPs and effectively creating a black hole. Although null routing can only block outbound traffic and permits inbound traffic, it’s usually enough to block a website because most Internet communication can be established only with two-way interaction.

IP blocking is a particularly lightweight censorship solution — the government can maintain a centralized blacklist without much involvement from the ISPs, and thus without much risk of leakage. Null routing also adds only a tiny load to ISPs’ filtering routers, and no special devices are needed. However, IP blocking has a few main problems: first, the blacklist of IP addresses has to be kept up to date, which could be difficult if the blocked content provider wants to make it hard for the ISP to block their sites, as changing or rotating IP addresses is fairly trivial. Second, China runs the risk of accidentally leaking these null routes to neighboring ISPs outside the country, as Pakistan did with null routes to YouTube, blocking YouTube for most of the world in 2008. Finally, the system can also very easily suffer from overblocking, since many innocuous websites may share an IP address or address block with a banned site.

In the past, censored websites like Falun Dafa’s have used this overblocking tendency to their advantage by deliberately introducing collateral damage. www.falundafa.org began to resolve to the same IP address as www.mit.edu, and as a result the Great Firewall blocked that address. MIT’s OpenCourseWare site was thus inaccessible for Chinese users, leading to such a public outcry that the block was revoked.

DNS tampering and hijacking

Because changing domain names is not nearly as trivial as changing IP addresses, DNS-related techniques are often used in conjunction with IP blocking. DNS tampering involves falsifying the response returned by the DNS server, either through intentional configuration or DNS poisoning. The server can lie about the associated IP address, any CNAMEs related to the domain, the authoritative servers for the domain and the existence of the domain itself. As a result, users are given false responses for censored sites like Twitter, and websites are blocked at the domain level. Used together, DNS tactics and IP blocking can effectively seal off censored sites and servers on both the domain and IP levels. As an example, DNS poisoning has been used not only to block sensitive content but also to promote home-grown businesses: for two months in 2002, google.cn redirected to Baidu, China’s search engine equivalent.

Apart from DNS tampering, routers can also disrupt unwanted communication by hijacking DNS requests containing banned keywords and injecting forged DNS replies. Researchers found that DNS-based censorship occurs in China’s border ASes, usually two or three hops into the country, using a blocklist of around 15,000 keywords. Injections of fake DNS A record responses will successfully block sites even when users use third-party DNS resolvers outside the country, since the Great Firewall will still answer queries sent to those resolvers.

The first reports of DNS injections date back to 2002, when spoofed responses took the form of the same poisoned record for all blocked domains. By 2007, this had evolved into using keyword filtering and the use of at least eight different IP addresses in injected replies. DNS servers are also affected by this practice, since the servers themselves cache tampered DNS responses received from within the Chinese network.

Figure-1
Figure 1: Routers hijack DNS requests containing sensitive keywords by injecting forged DNS replies.
DNS servers along the route will also cache the tampered DNS responses.

Once sites are blocked, there’s little recourse. While the Great Firewall censors actively uncover and censor new undesirable domain names, they don’t particularly attend to unblocking — researchers observed that more than two-thirds of censored domains had expired registrations.

Collateral DNS Damage

DNS-related techniques can certainly be powerful, though they also have the potential to have unintended consequences. In deploying DNS injections, The Great Firewall does not distinguish between traffic coming in and out of the country. As a result, large-scale collateral damage occurs, affecting communication beyond the censored networks when outside DNS traffic traverses censored links. Collateral damage can occur whenever a path passes through a censored network, even if both the source and destination are in non-censored networks.

Paths from recursive resolvers to root name servers located in China seldom suffer from collateral damage, since the roots are heavily anycasted and DNS queries to the root rarely transit Chinese networks. In contrast, substantial collateral damage occurs when resolvers query top-level domain (TLD) name servers whose transit passes through China. For example, let’s say a US-based DNS resolver needs to resolve a query for www.epochtimes.de and thus needs to contact one of the DNS TLD authorities in Germany. If the path to the TLD authority passes through China, the Great Firewall will see this query and inject a false reply, which the US resolver will accept, cache and return to the user, preventing the user from reaching the correct web server. Without DNSSEC validation, the resolver will generally accept the faked answer because it usually arrives earlier than the legitimate one; as a result, access to the site is blocked.

Research showed that Chinese DNS injection affected 15,225 open resolvers (6% of tested resolvers) outside China, from 79 countries. In addition, the TLD suffering from the most collateral damage was .de, due to the fact that a large amount of Germany-bound transit from the United States and Japan passed through China.

The use of anycast DNS authorities, where a single IP address may represent a widely deployed system of servers, further complicates the picture. Even though two resolvers in different networks are attempting to contact the same IP address, they may reach different physical servers along very different paths, some of which may pass through censored networks.

Another unintended consequence of DNS-related censorship practices is that huge loads of traffic can be accidentally redirected to innocent, unprepared sites. Take, for example, the purposeful misdirection of torproject.org, an organization that provides a number of tools to bypass censorship. Researchers found that a number of DNS server responses in China all redirected Tor Project traffic to a unique alternative domain, which turned out to be the site of a pet grooming service in Florida. The site’s webmaster didn’t know the reason behind their large volume of Chinese traffic until being contacted by researchers!

Another example of DNS poisoning gone wrong in China occurred on January 21, 2014, when a large number of domains were mistakenly resolved to a single IP address owned by Dynamic Internet Technology, a small U.S. company that provides services to bypass the Great Firewall. Many hypothesized that the Great Firewall might have intended to block the IP but instead accidentally used that IP to poison a number of domains. It was estimated that as much as two-thirds of Chinese traffic — 200 million users — was disrupted for more than an hour, and the effects reverberated for as long as 12 hours as the cached bogus records expired. Users in China couldn’t access a range of major local websites including Sina and Baidu. The misdirection had the effect of an enormous DDoS attack on Dynamic Internet Technology, which saw hundreds of thousands of visitors per second.

Judging by the work of researchers and recent events, it’s clear that DNS-based censorship has evolved into a powerful tool that can have significant repercussions even outside China’s borders. But we’re not done yet — at the heart of the Great Firewall is deep packet inspection.

Deep packet inspection and keyword filtering

Most content inspection schemes work by passing all traffic through a proxy that refuses to serve results for forbidden material. However, a proxy-based system that can cope with the traffic volumes of a major network, or an entire country, would be extremely expensive and difficult to scale.

The alternative content inspection method deployed by China uses components from an Intrusion Detection System (IDS). Filtering routers pass copies of passing traffic to out-of-band devices based on IDS technology. The packets continue on their path unhindered while the IDS technology inspects the copies to determine whether the content of the packets, including the requested URL, matches the Chinese government’s blacklist of keywords. Specifically, since late 2008, Chinese censors only inspect the first HTTP GET request arriving after a TCP handshake, ignoring HTTP responses and even GET requests without a preceding TCP handshake. This is likely for the sake of efficiency and speed. In addition, the Great Firewall can reassemble both IP fragments and TCP segments for HTTP connections. It’s able to do all of this by maintaining state — a powerful functionality that most other censorship systems have not yet achieved.

On-path systems (as opposed to in-path barriers) like China’s have the advantages of being more efficient and less disruptive if they fail. However, they also have the disadvantages of being less flexible and stealthy than in-path systems, where all traffic flows through a firewall. This is because they can’t prevent in-flight packets that have already been sent from reaching their destinations; only injections of spoofed traffic can be used to terminate connections.

Figure-2
Figure 2: Filtering routers pass copies of passing traffic to out-of-band IDS devices that inspect for blacklisted keywords.
Sensitive content is blocked by injections of forged TCP resets.

If the IDS technology detects undesirable content and determines that a connection from a client to a web server is to be blocked, the router injects forged TCP resets (with the RST flag bit set) into the data streams so that the endpoints abandon the connection. After blocking the connection, the system maintains flow state about source and destination IP addresses, port number and protocol of denied requests in order to block further communication between the same pair of machines, even for harmless requests that would not previously have been blocked. It continues the block using more injections of forged TCP resets that are constructed with values based on the SYN/ACK packet observed even before the GET request. These timeouts can last for up to hours at a time and escalate if more attempts are made to access the censored content. The timeouts can also have the effect of blocking other users or websites located in the same address block.

Because the Great Firewall doesn’t stop packets from traveling to their destinations, it’s very possible that one or multiple legitimate responses from the destination web server make their way back to the client before the TCP reset arrives. As a result, blocking takes the form of multiple spoofed TCP reset packets, each slightly different in an attempt to ensure that the client terminates the TCP connection in all possible cases. In the majority of connections, four spoofed packets are returned, each with a different sequence and acknowledgement number. The ACK numbers of three of these spoofed reset packets either correspond to the sequence number in the original client packet or are offset by the full size (1460 bytes) of one or two packets, providing for the case where the reset beats all legitimate packets to the client, and the cases where one or two legitimate full-size response packets have already reached the client. The fourth spoofed reset packet arrives without a corresponding ACK number, which would suppress the connection in cases where non-standard packet lengths are received on systems that will accept a reset without an ACK number.

Figure-3
Figure 3: Blocking takes the form of multiple spoofed TCP reset packets, each slightly different in an attempt to
ensure that the client terminates the TCP connection in all possible cases.

So where in China’s network does the filtering happen? Research indicates that filtering occurs more on the AS level rather than strictly at border routers. The majority of filtering devices are located in border and backbone ASes that peer with foreign networks, as they can most easily serve as traffic choke points. However, there are exceptions, and even the two largest ISPs in China differ in their approaches to censorship. Specifically, CNC Group (owned by China Unicom) places the majority of its filtering devices in the backbone as expected, but ChinaNet (owned by China Telecom) offloads much of the burden of filtering to its provincial networks, with the result that many of its filtering devices are located in internal ASes. ChinaNet is a much larger network operator with three times more peerings with foreign ASes than CNC Group. Placing all filtering devices in the backbone could create a bottleneck for such a large operator as ChinaNet, and this may partly explain ChinaNet’s distributed approach. While the Chinese government provides guidance on content and keywords to be censored, it’s ultimately the ISPs that make decisions around the actual implementation of online censorship.

We can see signs of ChinaNet’s censorship approach in a test running from our Cloud Agents in China to facebook.com. Traces from the Shanghai and Wuhan agents are stopped in the China Telecom network (AS 4812) and the China Telecom Backbone (AS 4134), respectively. Research has found that only 49 of 374 filtering interfaces in AS 4134 actually belong to the backbone; the rest belong to provincial branch companies, so it’s actually quite likely that these traces are being filtered in provincial networks. Further, the traces from the Guangzhou and Chengdu agents are stopped in provincial networks, confirming that ChinaNet does indeed conduct filtering there.

Try the interactive data below.
Figure 4: Traces from the Guangzhou and Chengdu agents are filtered in ChinaNet’s provincial networks.

For all its sophistication, the Great Firewall still has its shortcomings. The keyword filtering method can suffer from overblocking — for instance, because the names of party leaders (like Hu, Xi and Wen) are often sensitive keywords, Chinese terms like xue xi (study), hu luo bo (carrot), and wen du ji (thermometer) are also likely to be banned.

The Great Firewall has its holes too: Researchers observed that filtering is inconsistent, allowing up to one fourth of offending packets through during busy Internet traffic periods. As a result, some believe that keyword filtering functions more as a “panopticon” than a firewall. A metaphor borrowed from architecture, a panopticon is a type of building that allows a watchman to observe all occupants without the occupants knowing whether they are being watched. In other words, the Great Firewall’s keyword filtering mechanism doesn’t need to block every illicit word, but only enough to promote self censorship. The presence of censorship, even if easy to evade, promotes self censorship, ultimately achieving an Internet aligned with the Chinese government’s goals.

Other Techniques

Apart from the techniques inextricably entwined with the infrastructure of the Chinese Internet, authorities also employ a number of other strategies to plug the remaining holes in the Great Firewall.

Filtering at the client machine

In 2005, Skype and TOM Online partnered together to produce TOM-Skype, a custom version of Skype, at the request of Chinese authorities. TOM-Skype was generally the only version available within the country, since skype.com and all related domains were redirected to skype.tom.com. This version routinely collects, logs and captures millions of records that include personal information and contact details for chat messages or voice calls placed to TOM-Skype users, including those from the regular Skype platform. TOM-Skype automatically scans incoming and outgoing chat messages for sensitive keywords on a blacklist. While it once blocked sensitive messages, now when a TOM-Skype user sends or receives a chat message that contains a blacklisted keyword, the conversation is allowed to continue and is uploaded and stored on TOM-Skype servers in China for surveillance purposes. Unfortunately, researchers discovered that these messages, along with millions of records containing personal information, were stored on insecure publicly-accessible web servers together with the encryption key required to decrypt the data. As a result, this tremendous vault of sensitive information was essentially made public.

Skype and Microsoft came under fire for their complicity in China’s surveillance and censorship practices, and in November 2013, Skype ended its joint venture with TOM, lifting all censorship restrictions on their China product so that all information began to be communicated directly to Microsoft via HTTPS.

While the deployment of this filtering technique may have been challenging, it had the advantage of not consuming any network resources or requiring any enhancement to network hardware. The expansion of keyword filtering to the client machine is also particularly interesting, as it establishes yet another site for the battle over information taking place between governments and political activists. While new technologies provide an innovative platform for netizens to communicate globally, they can also provide governments with the ability to monitor, track and even suppress political activity.

Manual enforcement

An estimated 50,000 employees make up the Chinese Internet police force that manually monitors online content, directly deleting undesirable content or ordering websites, content hosts and service providers to delete offending material. In addition, the government hires around 300,000 Internet commentators that make up the 50 Cent Party. Paid at the rate of 50 cents RMB per post, these commentators post content and comments that promote the Communist party and disparage government critics and political opponents.

Self censorship

The Chinese government has also been successful in fostering a culture of self censorship on the Internet. Not only are ISPs expected to monitor and filter content on their networks according to state guidelines, but all Internet companies operating in China are also required by law to self censor their content. As a result, many large Internet companies also employ their own computer algorithms and human editors to identify and remove objectionable material. If companies can’t successfully censor their content, they face harsh penalties: warnings, fines, temporary shutdowns and possible revocation of their business licenses. Netizens themselves are also expected to toe the party line online, and similarly face serious consequences — you could lose your job, be held in detention or go to prison. There’s even a euphemism for the stern warning you could receive: being “invited to have a cup of tea” with government officials.

A Formidable Force

After our review of China’s extensive system of censorship, it’s obvious that it’s a powerful, evolving force to be reckoned with. As the multitude of censorship tools that make up China’s Great Firewall grow in sophistication and reach, netizens both inside and outside China will likely become increasingly concerned about the security and privacy of the Internet they traverse every day. A number of examples we explored in this post — including collateral damage from the Great Firewall and monitoring communications with TOM-Skype — have had significant repercussions even outside China’s borders, often in stealthy ways that only a tiny minority of users notice. As new technologies and information battlegrounds emerge, and as nations’ digital boundaries begin to blur, it will become increasingly important to understand the implementation and ramifications of Internet control and manipulation both inside and outside a country’s borders.

Processing...