China proposes blacklist of training data for generative AI models

China has published proposed security requirements for firms offering services powered by generative artificial intelligence, including a blacklist of sources that cannot be used to train AI models.

October 13, 2023 10:11 am | Updated 10:11 am IST

Generative AI learns how to take actions from past data, and creates new content like text or images based on that training.

Generative AI learns how to take actions from past data, and creates new content like text or images based on that training. | Photo Credit: REUTERS

China has published proposed security requirements for firms offering services powered by generative artificial intelligence, including a blacklist of sources that cannot be used to train AI models.

Generative AI, popularised by the success of OpenAI's ChatGPT chatbot, learns how to take actions from past data, and creates new content like text or images based on that training.

The requirements were published on Wednesday by the National Information Security Standardization Committee, which includes officials from the Cyberspace Administration of China (CAC), the Ministry of Industry and Information Technology, and the police.

The committee proposes conducting a security assessment of each body of content used to train public-facing generative AI models, with those containing "more than 5% of illegal and harmful information" to be blacklisted.

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

Such information includes "advocating terrorism" or violence, as well as "overthrowing the socialist system", "damaging the country's image", and "undermining national unity and social stability".

The draft rules also state that information censored on the Chinese internet should not be used to train models.

Its publication comes just over a month after regulators allowed several Chinese tech firms, including search engine giant Baidu, to launch their generative AI-driven chatbots to the public.

The CAC has since April said it wanted firms to submit security assessments to authorities before launching generative AI-driven services to the public.

In July, the cyberspace regulator published measures governing such services that analysts said were far less onerous than measures outlined in an April draft.

The draft security requirements published on Wednesday require organisations training these AI models to seek the consent of individuals whose personal information, including biometric data, is used for training purposes.

They also lay out detailed guidelines on how to avoid intellectual property violations. Countries globally are grappling with setting guardrails for the technology. China sees AI as an area in which it wants to rival the U.S, and has set it sights on becoming a world leader in the field by 2030.

Top News Today

Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.