Chinese Talent Show Exposes Stupidity of AI

Chinese Talent Show Exposes Stupidity of AI — and Some Humans

Which vote share is higher: 13.8% or 13.11%? Most generative AI models — and a surprisingly large number of people — insist it’s the latter.

By Li Xin and Lin Liuhan

Jul 18, 2024#artificial intelligence

A dispute over the results of a popular Chinese reality TV show has highlighted that most artificial intelligence models — and a surprisingly large number of people — struggle to answer even the most basic math questions.

In an episode of the talent show “Singer” that aired on July 13, Chinese pop star Sun Nan placed third after receiving 13.8% of the studio audience vote. American performer Chanté Moore finished just behind him on 13.11%.

Soon after, a surreal debate started to unfold on Chinese social media, as a group of netizens questioned the rankings on the basis of their mistaken belief that 13.11% is a higher vote share than 13.8%. In their view, Moore should have finished above Sun.

The dispute has continued to rage throughout this week, with related hashtags racking up hundreds of millions of views on microblogging platform Weibo as of Thursday.

Some users set up online polls in an attempt to settle the matter of which number is larger. In one such vote, 92% of 5,191 respondents said that 13.8 is higher than 13.11. The remaining 8% insisted that 13.11 is higher.

Others turned to AI for an authoritative answer. But this simply added to the confusion, as most generative AI models also struggle to answer simple math queries.

Domestic media outlet Yicai conducted an experiment by having 12 leading AI models answer a similar question, asking the bots which number was higher: 9.9 or 9.11. Eight of them said that it was 9.11.

The chatbots to answer incorrectly included OpenAI’s ChatGPT 4.0, ByteDance’s Doubao, and Moonshot AI’s Kimi. Only four bots gave the correct answer, namely: Alibaba’s Tongyi Qianwen, Baidu’s Ernie, MiniMax, and Tencent’s Yuanbao.

The bots that gave the correct answer tended to do so based on similar reasoning, whereas the ones that got it wrong offered a wide range of justifications for their answers, the report said. When reporters challenged their initial answers, most of the models acknowledged their mistakes and provided the correct answer upon clarification.

When Sixth Tone asked ChatGPT whether 13.11 or 13.8 was a higher number, it gave seemingly random answers for differing reasons. It seemed to find the question easier to answer when asked to compare 13.11 and 13.80, rather than 13.8.

“When comparing 13.80 to 13.11, 13.80 is greater,” the bot said to one prompt. But when the question was rephrased, it gave the nonsensical answer: “13.11 is larger than 13.8. This is because 13.11 has a higher value when comparing the digits after the decimal point.”

For many in China, the debate has served as a reminder that generative AI models still have deep flaws. The technology has sparked enormous excitement in the country, with AI increasingly being integrated into everything from livestreaming to white-collar crime. But AI bots cannot be trusted blindly, experts warned.

In the future, AI models cannot simply be trained by feeding them vast quantities of data, but will need to be built in a more systematic manner that allows them to handle complex reasoning, Lin Dahua, a leading scientist at the Shanghai Artificial Intelligence Laboratory, told Yicai.

This will be particularly important if AI begins to play a greater role in fields such as finance, Lin added. In the meantime, users will need to use common sense to determine whether to trust the answers provided by the technology.

(Header image: Sasiistock/Getty Creative/VCG, re-edit by Sixth Tone)