I’m rather curious to see how the EU’s privacy laws are going to handle this.

(Original article is from Fortune, but Yahoo Finance doesn’t have a paywall)

  • Primarily0617@kbin.social
    link
    fedilink
    arrow-up
    15
    arrow-down
    1
    ·
    2 years ago

    it’s crazy that “it’s too hard :(” has become an acceptable justification for just ignoring the law within tech circles

    • BrianTheeBiscuiteer@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      I’m not an AI expert, and I wouldn’t say it is too hard, but I believe removing a specific piece of data from a model is like trying to remove excess salt from a stew. You can add things to make the stew less salty but you can’t really remove the salt.

      The alternative, which is a lot of effort but boo-hoo for big tech, is to throw out the model and start over without the data in question. These companies would do well to start with models built on public or royalty free data and then add more risky data on top of that (so you only have to rebake starting from the “public” version).

      • Grandwolf319@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 years ago

        Replace salt with poison or an allergenic substance and if fully holds. If a batch has been contaminated, then yes, you should try again.

        But now that the cat is out of the bag, other companies are less willing to let something be scrap able due to how valuable it can be.

        I think big tech knew this, that they can only build these models on unfiltered data before the AI craze.

      • GoosLife@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        2 years ago

        If there’s something illegal in your dish, you throw it out. It’s not a question. I don’t care that you spent a lot of time and money on it. “I spent a lot of time preparing the circumstances leading to this crime” is not an excuse, neither is “if I have to face consequences for committing this crime, I might lose money”.

    • Alien Nathan Edward@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      I just saw an article that said that ISPs are trying to whine their way out of listing the fees they charge because it’s too hard. Which is wild because they certainly know what I owe them after I sign the contract, but somehow it’s just impossible for them to determine right up until the moment that I’m obligated to pay it.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      0
      arrow-down
      1
      ·
      2 years ago

      It’s more like the law is saying you must draw seven red lines, all of them strictly perpendicular, some with green ink and some with transparent ink.

      It’s not “virtually” impossible, it’s literally impossible. If the law requires that it be possible then it’s the law that must change. Otherwise it’s simply a more complicated way of banning AI entirely, which means that some other jurisdiction will become the world leader in such things.

      • Primarily0617@kbin.social
        link
        fedilink
        arrow-up
        1
        ·
        2 years ago

        ok i guess you don’t get to use private data in your models too bad so sad

        why does the capitalistic urge to become “the world leader” in whatever technology-of-the-month is popular right now supersede a basic human right to privacy?

        • LittleLordLimerick@lemm.ee
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          2 years ago

          ok i guess you don’t get to use private data in your models too bad so sad

          You seem to have an assumption that all AI models are intended for the sole benefit of corporations. What about medical models that can predict disease more accurately and more quickly than human doctors? Something like that could be hugely beneficial for society as a whole. Do you think we should just not do it because someone doesn’t like that their data was used to train the model?

          • Primarily0617@kbin.social
            link
            fedilink
            arrow-up
            1
            ·
            2 years ago

            You seem to have an assumption that all AI models are intended for the sole benefit of corporations.

            You seem to have the assumption that they’re not. And that “helping society” is anything more than a happy accident that results from “making big profits”.

            What about medical models

            A pretty big “what if” when every single model that’s been tried for the purpose you suggest so far has either predicted based off the age of a medical imaging scan, or off the doctor’s signature in the corner of one.

            Are you asking me whether it’s a good idea to give up the concept of “Privacy” in return for an image classifier that detects how much film grain there is in a given image?

            • LittleLordLimerick@lemm.ee
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 years ago

              You seem to have the assumption that they’re not. And that “helping society” is anything more than a happy accident that results from “making big profits”.

              It’s not an assumption. There’s academic researchers at universities working on developing these kinds of models as we speak.

              Are you asking me whether it’s a good idea to give up the concept of “Privacy” in return for an image classifier that detects how much film grain there is in a given image?

              I’m not wasting time responding to straw men.

              • Primarily0617@kbin.social
                link
                fedilink
                arrow-up
                1
                ·
                edit-2
                2 years ago

                There’s academic researchers at universities working on developing these kinds of models as we speak.

                Where does the funding for these models come from? Why are they willing to fund those models? And in comparison, why does so little funding go towards research into how to make neural networks more privacy-compatible?

                I’m not wasting time responding to straw men.

                1. Please learn what a straw man argument is
                2. The technology you’re describing doesn’t exist, and likely won’t for a very long time, so all you’re doing is allowing data harvesting en-masse in return for nothing. Your hypothetical would have more teeth if it was anywhere close to being anything but a hypothetical.
      • Bogasse@lemmy.ml
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        2 years ago

        How is “don’t rely on content you have no right to use” litteraly impossible?

        We teach to children that there is a Google filter to include only the CC images (that they should use for their presentations).

        Also it’s not like we are talking small companies here, a new billion-making industry is being born and it could totally afford contracts with big platforms that would allow to use their content.

        • rebelsimile@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 years ago

          And the rest of the data Google has been viewing, cataloging and selling back to everyone for years, because they’re legally allowed to do so… you don’t see the irony in that?

          • Bogasse@lemmy.ml
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            2 years ago

            Are they selling back scrapped content? I thought it was only user behaviors through the ad network?

            About cataloging at least it is opt-out though robot.txt 🤷

            EDIT: plus, “we are already doing bad” is never a good argument to continue doing bad, if Google were to be in fault this could get the traction to slap their ass

            • rebelsimile@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 years ago

              Google crawls the internet, archives entire actual photos, large snippets (at least) from every website it sees, aggregates it into a different form and serves it back to people for profit. It’s the same business model, different results with the processing of the data.

              • bobettes_bob@kbin.social
                link
                fedilink
                arrow-up
                0
                ·
                2 years ago

                Google doesn’t sell the data they collect… They sell ads and use their data to better target people with said ads. Third parties are paying google to target their ads to the right people.

                • rebelsimile@sh.itjust.works
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  2 years ago

                  You go to google because of the data they collected from the open internet. Peoples’ photos, articles they’ve written, books, etc. They aggregate it, process it and serve it back to you alongside ads. They also collect data about you and sell that as well. But no one would go to Google if they hadn’t aggregated, processed and repackaged the internet’s data.

                  • bobettes_bob@kbin.social
                    link
                    fedilink
                    arrow-up
                    0
                    ·
                    2 years ago

                    They also collect data about you and sell that as well.

                    No they don’t. Why would they sell the data they use to target ads? If other corporations could just buy the data, they wouldn’t need to pay google to target the ads, they’d just buy the data and do it themselves, Google isn’t a data broker. They keep the data for them, it would be business suicide if they’d just sell all the data they collect.

        • LittleLordLimerick@lemm.ee
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 years ago

          How is “don’t rely on content you have no right to use” litteraly impossible?

          At the time they used the data, they had a right to use it. The participants later revoked their consent for their data to be used, after the model was already trained at an enormous cost.

          • Bogasse@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 years ago

            I have to admit my comment is not really relevant to the article itself (also, I read only the free part of it).

            It was more a reaction to the comment above, which felt more generic. My concern about LLMs is that I could never find an auditable list of websites that were crawled, which would be reasonable to ask for, I think.