• maria [she/her]@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    4
    ·
    edit-2
    5 days ago

    The fact that it responded only with “no” implies a convo exchange previously, in which it was prompted to either

    • respond exactly with “no”

    or

    • keep the response short

    it seems like the first case applies here, since it actually gives a little post-amble at the image gen response.

    Apparently, with chatgpt, it foesnt actually look st the generated image. Otherwise it would be able to tell, that the users image is equivalent to the generated one (since the tokens would be literally identical, so its like asking an llm “are these two paragraphs the same text?”

    aaaaaaanyway- dont use VLMs to check if an image was generated! there r actual models trained for that task. VLMs r not.

    • Hazzard@lemmy.zip
      link
      fedilink
      English
      arrow-up
      8
      ·
      5 days ago

      Yeah, I see these kinds of misunderstandings all the time with people asking ChatGPT to do something with an image, and then it failing and apologizing and doing the same. The LLM doesn’t do anything with the image, it’s calling some other service to do it. It can’t apologize for the output, or try harder to “make sure” that glass of wine is full to the brim, what it says and does in these cases is entirely disconnected.

      Even “recognizing” details in an image, some other service is parsing the image and writing a text description for the LLM. It’s not the same service as the one that does the generation, no part of this pipeline would ever have the chance to realize “hey, this is the same image”.

      • maria [she/her]@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        5 days ago

        yea- tru…

        tool calls really do kindsa obfuscate what exactly is going on in a continuous-feeling system

        it makes one issue seem like the result of the main system, even if thads not rlli tru - - -

        me yappin bout how bad todays chatbot web-search is and how it could be inproved

        mygroarg im surprised how little current day chatgpt cn actulli do >v< like - web search!

        many peeps, includin my brothr, nevr rlli learned to google, n now jus calls chatgpt for litrlly almost anything.

        while its always bettr to do research urself, when i do decide ti hav a web search done by one of these bots - i feel like… the results shud be nicer?

        like - chatgot cud fir exampl, reference direct quotes from the sources, which r then checked against the actual sources, n then theres an if statement

        • if the text does appear in the sources, show a lil ui element called “From the Source”, where it shows the quote amd source link, so the user knows its legit
        • if it doesn appear, chatgpt messed up, n dis “quote” must be removed, or the entire response must be regenerated. Mayb even show a lil “Invalid Citation!” so users kno whats up -

        but noooo all these companies jus luv presentin their llms as perfect oracles

        but heyyyyy whadddoikno - im jus a sili lil consumer., - - -

        more yap

        mygosh i think like - waaaaayyyy too much bout dis kindsa scaffolding every day,…

        current day llms r oversold n overhyped for what they cn do.

        • they cnt replace a sofware dev - but they cn make simple-to-medium complex html demos with bad ux ~
        • they cnt generate novel ideas (lets ignore AlphaEvolve for now…), but its great at languag comprehension n categorisation

        llms r an importnt steppin stone tiwards what we wud call “ai” - but woarg current consumer facin systems r spectacularly meh n llms r bein overused in places they shoudn