LLMs fail at reading clocks effectively, study finds

gizmodo.com

New research from Edinburgh University shows that large language models (LLMs) struggle with basic tasks like telling time. The study tested seven popular models on their ability to interpret images of clocks and calendars. The models correctly read the time on analog clocks less than 25% of the time. They faced challenges with various clock designs, indicating difficulties in detecting clock hands and interpreting angles. Google's Gemini 2.0 performed best on clock tasks, while OpenAI's GPT-o1 was accurate 80% of the time on calendar questions. Despite these results, both models still made significant errors.


With a significance score of 2.8, this news ranks in the top 25% of today's 18366 analyzed articles.

Get summaries of news with significance over 5.5 (usually ~10 stories per week). Read by 9000 minimalists.


More on this topic:

    [4.1]
    OpenAI launches Deep Research feature for ChatGPT (popsci.com)
    1d 17h
    [4.0]
    Stanford develops tool to evaluate health AI models (statnews.com)
    20h
    [3.9]
    Google researchers enhance AI response accuracy with context signal (searchenginejournal.com)
    9h
    [3.5]
    Over 60% of AI chatbot responses are incorrect (thedailystar.net)
    24h
    [3.3]
    Manus outperforms DeepSeek in several AI tasks (tomsguide.com)
    22h
    [3.1]
    AI image tools reinforce gender bias in professions (news.yahoo.com)
    1d 12h
    [3.0]
    Financial institutions strategize to optimize AI costs (forbes.com)
    18h
    [2.8]
    LLMs fail at reading clocks effectively, study finds (gizmodo.com)
    1d 18h