OpenAI's GPT-4.5 claims fewer hallucination errors

abc.net.au

OpenAI has introduced its latest chatbot, GPT-4.5, claiming it will "hallucinate less" than earlier versions. Hallucinations refer to errors where the chatbot provides incorrect or misleading information, such as misidentifying people as criminals. To support its claim, OpenAI developed a measurement system called SimpleQA. SimpleQA consists of thousands of questions that have a single correct answer, designed to challenge the chatbot's knowledge. OpenAI tested GPT-4.5 with this tool and found it produced incorrect answers 37% of the time. This was a notable improvement over the previous model, GPT-4o, which hallucinated 62% of the time. However, critics like Daswin de Silva from La Trobe University argue that the testing is flawed since it focuses on simple questions and does not reflect the chatbot's performance on more complex queries. OpenAI admits that its current testing method does not capture accuracy in longer responses, which are often the primary use for chatbots. Alternative evaluation methods exist, but they also have weaknesses, as they can lead to systems being trained specifically to score well on those tests. Experts suggest that incorporating human intervention could improve accuracy evaluations. Random checks by people could serve as a form of quality control. Others believe a more successful measure of a chatbot’s effectiveness is its actual usage, not just test scores. Despite improvements, experts emphasize that no generative AI will be completely free of hallucinations. The current model of just scaling up data and computing power to improve performance may hit its limits. As technology evolves, new methods might be necessary for training AI. Some researchers assert that hallucinations can sometimes be beneficial. For instance, creative outputs, like generated images or unique ideas, rely on the AI's ability to "imagine" or "hallucinate" beyond existing data. Meanwhile, the issue of bias in AI models persists, as they tend to reflect cultural values from their training data. Consequently, hallucinations in AI are expected to remain a challenge for the foreseeable future.


With a significance score of 4, this news ranks in the top 9% of today's 17778 analyzed articles.

Get summaries of news with significance over 5.5 (usually ~10 stories per week). Read by 9000 minimalists.


loading...