China Power | Security | East Asia

China’s Censors Could Shape the Future of AI-Generated Content

When the Chinese regime’s information controls intersect with artificial intelligence, they can distort the global information landscape.

China’s Censors Could Shape the Future of AI-Generated Content
Credit: Illustration by Catherine Putz

Within months of being launched, ChatGPT – an artificial intelligence (AI) powered chatbot created by U.S. company OpenAI – attracted tens of millions of users. A version of the technology has since been integrated into a limited preview version of Microsoft Bing. Technology writers are now speculating about the impact that AI-assisted search engines will have on competition between the U.S. tech giants Google and Microsoft. The rapid speed with which ChatGPT has been adopted represents a broader trend: While AI tools have grown in popularity in recent years, 2023 has been declared the year in which AI becomes a more visible part of daily life.

Any examination of the design, use, and effects of artificial intelligence must give ample consideration to trends in China. AI-driven tools are used widely inside the country for politicized content monitoring, censorship, and public surveillance. And as the world moves into a new phase of AI integration, the practices pioneered by technology firms at the behest of the Chinese Communist Party (CCP) could have ramifications for internet users, policymakers, and companies well beyond China’s borders.

The following dynamics related to AI and China deserve special attention in the year to come.

1. Censorship Within AI-generated Content in China

Algorithmic tools reflect the data they are trained on. Thus, censorship on political, social, and religious topics is almost certain to affect AI-generated content in China, and there is evidence that it already has.

Enjoying this article? Click here to subscribe for full access. Just $5 a month.

If a machine-learning tool is mostly drawing information from within China’s so-called Great Firewall, then its outputs will reflect the omissions and biases of the country’s heavily censored and propaganda-infused information landscape. One 2021 study by researchers Margaret Roberts and Eddie Yang, for example, found differences in perspective between a natural-language-processing algorithm based on the global, uncensored Chinese-language Wikipedia and an alternative that was trained on entries from Baidu’s Baike online encyclopedia. The globally trained algorithm analyzed terms like “election” and “democracy” positively, or associated them with nouns like “stability.” By contrast, those trained on Baidu Baike evaluated “surveillance” and “CCP” positively and associated terms like “democracy” with negative words like “chaos.”

Other AI systems may incorporate censorship due to human intervention imposed on top of machine-generated content. When Chinese tech giant Baidu launched its ERNIE-ViLG text-to-image generator in 2022, users like dissident artist Badiucao quickly noticed gaps and manipulation. A study published in September by the MIT Technology Review explained the contours of some of this censorship: no images of Tiananmen Square, no Chinese leaders, and no terms like “revolution” or “climb walls” – a metaphor for using anticensorship tools to access blocked websites.

Baidu reports that ERNIE-ViLG was trained on a global set of content, not just China-based information. This means that the censorship and omissions observed in the text-to-image generator must have been actively induced by the program’s developers as they tried to comply with government regulations and company policies. Indeed, while the government and CCP provide extensive rules and guidelines on censorship, Chinese tech and social media companies each have their own proprietary blacklists and approaches to censorship in practice. Variations among these companies’ AI tools may become more apparent over time.

2. Management of Chatbots in China 

As users around the world experiment with ChatGPT, users in China have had only limited access to the tool. It is not yet blocked by the Great Firewall, but signing in requires a phone number from a subset of countries that does not include China. A variety of workarounds and copycats – some legitimate, others more dubious – have emerged on the Chinese internet, and many require a fee. Around February 10, however, links to these workarounds reportedly stopped appearing in search results on Tencent’s WeChat platform and Alibaba’s Taobao marketplace.

Meanwhile, several local AI-based chatbot projects are underway and expected to be unveiled for public use this year. Baidu’s ERNIE-Bot, for instance, is reportedly due to launch next month. Given the company’s heavily censored search engine and the findings regarding its AI text-to-image generator, censorship and other manipulation is likely to be evident in the chatbot’s output as well. Another AI chatbot, ChatYuan, has been running as a mini-program within Tencent’s WeChat ecosystem, and its founder acknowledged to reporters that it would “filter certain keywords” with more layers of review than might be expected overseas. Some of the ChatGPT knockoffs noted above were also found to avoid topics that are considered politically sensitive in China.

Nevertheless, even a nominally censored chatbot could produce unpredictable results. Given that ERNIE-Bot is reportedly trained on global data, users should watch for any inadvertent slip-ups that run counter to the CCP’s preferences. Just as disinformation researchers have generated troubling results by asking ChatGPT for essays from the perspective of the CCP or well-known conspiracy theorists, users could attempt to turn the tables on Chinese chatbots. What responses might ERNIE-Bot offer if prompted to discuss democracy, China’s constitution, or Xi Jinping from the perspective of dissidents and rights lawyers like Liu Xiaobo or Gao Zhisheng, or Xi’s intra-CCP rivals like Bo Xilai? And if the response violates Chinese government censorship directives, what penalties might await the company and its users, who are required to register with their real name?

3. Influence of Chinese Censorship on Global AI-generated Content 

China is the country with the world’s largest contingent of internet users and its largest population of Chinese speakers, raising important questions about how its massive and heavily censored output might influence AI-generated content on a global level, particularly in the Chinese language. Will AI tools trained on the full constellation of available Chinese-language content implicitly display a bias that favors the CCP?

Microsoft’s Bing has emerged as the first global search engine to incorporate ChatGPT and conversational AI into its service. It also has some prior history of censorship from its China-based version creeping into global search functions. In December 2021, the Canadian research group Citizen Lab conducted tests on autosuggestions in Bing and found statistically significant censorship in Chinese-language searches for North American users, and even in some English results in the United States. The precise factors contributing to this phenomenon were not entirely clear, and Microsoft claimed to have addressed a misconfiguration, but Citizen Lab reported that as of May 2022, some anomalies persisted. The researchers concluded their report by warning that “the idea that Microsoft or any other company can operate an Internet platform which facilitates free speech for one demographic of users while intrusively applying political censorship to another demographic of its users may be fundamentally untenable.”

Enjoying this article? Click here to subscribe for full access. Just $5 a month.

Although Microsoft’s situation is unique given that it continues to operate a censored version of Bing within China, Google and other global search engines may encounter different forms of spillover from Beijing’s censorship or deliberate manipulation by pro-CCP actors. Researchers last year raised concerns over Beijing’s ability to amplify Chinese state-produced content in Google News and YouTube search results for terms like “Xinjiang” or conspiracy theories related to the origins of COVID-19. It remains unclear whether the added complexity of an AI chatbot will make search functionality more or less vulnerable to manipulation.

4. Beijing’s Use of AI to Produce Global Disinformation 

The CCP and its agents are relative newcomers in the disinformation space compared with their Russian counterparts, but since 2018, multiple campaigns involving networks of fake accounts that spread falsehoods or artificially amplify Chinese state content have been documented. Although the impact of these efforts has been rather limited to date, researchers have found consistent evidence of experimentation, adaptation, and growing sophistication. Pro-Beijing actors can be expected to actively incorporate AI technology into their global disinformation operations in the future.

Disinformation researchers at the company NewsGuard recently explored what this might look like. They asked ChatGPT to generate responses from the perspective of the Chinese government or a CCP official on topics like the mass detention of Uyghurs in Xinjiang or conspiracy theories that COVID-19 originated in the United States. The results closely mimicked CCP propaganda while using an authoritative tone, but cited no sources. The researchers noted that an ordinary user asking for information on these topics would likely get a more balanced response, but the experiment demonstrated the ability of bad actors to use the technology as a “force multiplier to promote harmful false narratives around the world.”

The threat is not just hypothetical. A report published this month by the cybersecurity firm Graphika uncovered the actual employment of AI-generated avatars in a disinformation campaign linked to the Chinese regime. The firm stated that this was the first known instance of such use of the technology by a state actor. The campaign featured video clips from the fictitious outlet Wolf News, with male and female anchors presenting reports in line with CCP propaganda narratives on gun violence in the United States and China-U.S. relations. The videos were circulated by a network of fake accounts tied to China, known as Spamouflage, which Graphika has tracked for years and exposed as a persistent source of pro-CCP disinformation. The firm said its researchers initially thought the anchors were paid actors, but then traced them to a British website offering commercial AI-generated avatars, typically for use in advertisements.

The videos did not receive many views and included significant English-language errors. But as Graphika notes, combining the use of video avatars with a better script generated by natural-language systems like ChatGPT could yield more convincing and effective content.

A Critical Need for Transparency

A defining feature of China’s censorship system is its opacity. Much of what is known about the day-to-day functioning of the apparatus has come from leaks of censorship directives, testimony from former employees, anonymous comments to the media from current staff, and the sorts of outside research and investigations referenced above. Particularly, while many international tech firms are deficient in transparency, their Chinese counterparts are generally even less open regarding the functionality and content-moderation systems of their products and services, including their AI-generative applications. For example, Baidu’s ERNIE-ViLG text-to-image generator does not publish an explanation of its moderation policies, unlike the international alternatives DALL-E and Stable Diffusion.

Given the clear potential for abuse, any pressure applied to Chinese technology firms for greater transparency would benefit users. International competitors should integrate strong human rights principles into developing and implementing new AI-generated tools and set a high global standard for transparency and public accountability. Meanwhile, independent investigations and rigorous testing to detect and understand pro-CCP content manipulation will remain critical to informing users and creating better safeguards for free expression and access to diverse information.

It is perhaps a sign of the times that these constructive endeavors will also likely be assisted by AI technology.