• Zetta@mander.xyz
    link
    fedilink
    arrow-up
    5
    arrow-down
    3
    ·
    edit-2
    4 days ago

    But it’s not the same, you don’t understand how LLM training works. The original piece of work is not retained at all, the training data is used to tune pre existing numbers, those numbers change slightly as training goes on.

    At no point in time is anything resembling the training data ever present in the 1’s and 0’s of the model.

    You are wrong, bring on the downvotes uninformed haters.

    FYI I also agree sampling music should be fine for artists

    • fluxion@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 days ago

      Yes, weights for individual words/phrases/token which, given a particular prompt/keyword, which might reproduce the original training data almost in it’s entirety given similar set of prompt or set of keywords. Hence why it is so obvious when these models have been trained on copyrighted material.

      Similarly, I don’t digitally store music in my head verbatim, I store some fuzzy version that i can still reproduce fairly closely when prompted, and still get sued if I’m charging money for performing or recording it, because the “weightings” in my neurons are just an implementation detail of how my brain works and not some active/purposeful attempt to transform the music in any appreciable way.

      • Zetta@mander.xyz
        link
        fedilink
        arrow-up
        1
        ·
        4 days ago

        given a particular prompt/keyword, which might reproduce the original training data almost in it’s entirety given similar set of prompt or set of keywords.

        What you describe here is called memorization and is generally considered a flaw/bug and not a feature, this happens with low quality training data or not enough data. As far as I understand this isn’t a problem on frointer llms with the large datasets they’ve been trained on.

        Eitherway, just like a photocopier an llm can be used to infringe copyright if that’s what someone is trying to do with it, the tool itself does not infringe anything.

    • rumba@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 days ago

      I agree with you, but I also would like to make a point.

      We’ve seen trained models produce exact text from sections in articles and draw anti-piracy watermarks over images.

      Just because it’s turning the content into associations doesn’t mean it can’t, in some circumstances, reproduce exactly what it was trained on. It’s not the intent, but it does happen.

      Midjourney drawing recognisable characters is far more problematic from the copyright and trademark side, but honestly, nothing is stopping you from doing that in Photoshop.

      Millions of unlicensed products are all over ebay, temu and etsy and we didn’t even need AI to make them.