LLMs fail at reading clocks effectively, study finds

gizmodo.com

New research from Edinburgh University shows that large language models (LLMs) struggle with basic tasks like telling time. The study tested seven popular models on their ability to interpret images of clocks and calendars. The models correctly read the time on analog clocks less than 25% of the time. They faced challenges with various clock designs, indicating difficulties in detecting clock hands and interpreting angles. Google's Gemini 2.0 performed best on clock tasks, while OpenAI's GPT-o1 was accurate 80% of the time on calendar questions. Despite these results, both models still made significant errors.


With a significance score of 2.8, this news ranks in the top 26% of today's 18453 analyzed articles.

Get summaries of news with significance over 5.5 (usually ~10 stories per week). Read by 9000 minimalists.


loading...