• JayGray91@lemmy.zip
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    2
    ·
    23 hours ago

    I think it’s odd in the sense that it’s supposed to be software so it should already know what 36 plus 59 is in a picosecond, instead of doing mental arithmetics like we do

    At least that’s my takeaway

    • shawn1122@lemm.ee
      link
      fedilink
      English
      arrow-up
      13
      ·
      edit-2
      17 hours ago

      This is what the ARC-AGI test by Chollet has also revealed of current AI / LLMs. They have a tendency to approach problems with this trial and error method and can be extremely inefficient (in their current form) with anything involving abstract / deductive reasoning.

      Most LLMs do terribly at the test with the most recent breakthrough being with reasoning models. But even the reasoning models struggle.

      ARC-AGI is simple, but it demands a keen sense of perception and, in some sense, judgment. It consists of a series of incomplete grids that the test-taker must color in based on the rules they deduce from a few examples; one might, for instance, see a sequence of images and observe that a blue tile is always surrounded by orange tiles, then complete the next picture accordingly. It’s not so different from paint by numbers.

      The test has long seemed intractable to major AI companies. GPT-4, which OpenAI boasted in 2023 had “advanced reasoning capabilities,” didn’t do much better than the zero percent earned by its predecessor. A year later, GPT-4o, which the start-up marketed as displaying “text, reasoning, and coding intelligence,” achieved only 5 percent. Gemini 1.5 and Claude 3.7, flagship models from Google and Anthropic, achieved 5 and 14 percent, respectively.

      https://archive.is/7PL2a

      • Goretantath@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 hours ago

        Its funny because i approach life with a trial and error method too, not efficient but i get the job done in the end. Always see others who dont and give up like all the people bad at computers who ask the tech support at the company to fix the problem instead of thinking about it for two secs and wonder where life went wrong.