TheThinkingMachine
Menu
  • Home
  • Academic
    • The Maths of AI – An introduction
    • Artificial Intelligence -a MIT Short course
  • Cyber-Defence
  • SPECTER
Menu

Was Mustafa right about content on the internet?

Posted on July 2, 2024July 2, 2024 by Webmaster

It is widely being reported across the Internet that

Mustafa Suleyman says “anyone can use information on internet to train their AI models for free” and “unless a publisher or news organisation explicitly request to ‘not to scrape or crawl’ their content other than indexing, AI companies can use it to train AI models” and “freely copying and using publicly available content has been the norm online since a long time”. He also said “I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been ‘freeware,’ if you like, that’s been the understanding” going on to expand it that “Unless a publisher or news organisation explicitly request to “not to scrape or crawl” their content other than indexing, AI companies can use it to train AI models”

Mustafa admitted that “There’s a separate category where a website, or a publisher, or a news organization had explicitly said ‘do not scrape or crawl me for any other reason than indexing me so that other people can find this content and admitted that’s a grey area, and I think it’s going to work its way through the courts.”

So, Was Mustafa right?

Nick Lockett, a barrister and solicitor and Intellectual property specialist, the first barrister to write about the law of the internet in 1993 said:
To use the phrase freeware was unwise because this has a specific meaning. Freeware

Nick Lockett, a barrister and solicitor and Intellectual property specialist, the first barrister to write about the law of the internet in 1993 said:
To use the phrase freeware was unwise because this has a specific meaning. Freeware is distributed at no monetary cost to the end user but there isn’t a standard set of rights, license, or EULA for freeware and each publisher theoretically defines its own rules in relation to modification, redistribution, reverse engineering etc, and may prohibited some of these steps, but it is always free to read, unless it is behind a firewall and freeware is usually used too promote the producers other items or to promote an upgraded paid-for version or too increase the producer’s publicity. Freeware is, as the name suggests, “free”. In software terms, it is free to use. In terms of pictures and text, it is free to use and free for humans to train on.

The question that is currently arising its whether it is free to train on if you are an AI System? Since the start of the Internet, before even images were possible online, it has been accepted that if you put your work on the internet, then anyone can come along and freely read it – they can also incorporate the ideas into commercial works, as long as they don’t copy it and redistribute it. So, reading a work and summarising it, or expanding on it. or combining the ideas with other ideas is not a copyright breach as long as the end result is not substantively the individual freeware originally supplied, just slightly changed by the additions.

Picture artists have little complaint. They have published their works on the internet since the first image browsers arrived and have gained publicity for their works and reputations this way, and countless human arts students have studied the great masters and, due to their internet publishing, the lesser known internet artists, and then developed their own styles. In many cases, they have been able to produce new “works in the style of X” without being accused of copyright infringement. What the artists are now complaining about is the fact that it is no longer art students mimicking their work, but with the aid of AI, everyone can now miimic their work….but they are simply the first people to be threatened, lawyers too are finding that the new AI tools can write great Cease and Desist letters and do the job of the paralegal.

If someone has put the artists work on the internet, then the artist either authorised it, expressly or impliedly, or should have taken adequate steps to remove those works for breach of copyright. If they didn’t, then they acquiesced to the works being present on the inteernet and now can’t complain that humans trained on them.

AI simply has a better memory and as I understand it, AI models don’t keep copies of the artists work, they simply analyse the style of the artist, and they store what is, in effect, notes of the analysis of that style. That’s no different from a good arts student, except that we are all now good arts students. It is the extent and scale that the legal cases complain of, and therefore with all due respect to the artists, that’s not copyright infringement in the UK.

It’s also fair use for students and the Courts must rule the same in respect of AI. If the Courts rule differently then within 2 or 3 years, if Ray Kurzweil is right, the Courts will be proven idiots. Ray Kurzweil is suggesting that by 2029, the first brain implants will be available which will allow human brains to download instant and extensive knowledge and theoretically to download the capability of suddenly being an artist. As someone whose drawings of horses have a distinctly Lowry-esque resemblance, I can’t to be able to download the Cartoonist module into my brain implant and draw the world as my brain already sees it.

So in a broad sense, I agree with Mustafa, that “any human can use information on internet to train for free…unless a publisher or news organisation explicitly requests to not to scrape or crawl their content, and artists should not put up their works unless they want humans to copy their style.

If humans can do it legally (which they can), then humans with brain implants can also legally do it …..and that means that AI systems can also train on it legally, because they’re only doing exactly the same as a human with a brain implant in 2029 will be able to do.

So, Yes, in my opinion, companies can use internet data freely available on the internet to train their staff or their AI models” and also, yes, “freely learning using publicly available content has been the norm online since a long time”, – it still doesn’t mean that Chat-GPT can provide a copy of an artists work without paying the royalty, but is it is simply a new novel work in the style of Andy Warhol, then sorry, but the bus left on that point decades ago, when Andy Warhol decided not to inject schools allowing their artists to draw “in the style of Andy Warhol”.

And…it’s probably likely that AI will make Patent Law unviable too!


(Just for the US lawyers: “Andy Warhol’s opinion has not been sought for the conclusion of this article. Different opinions may exist!”)

Category: ETHICS, NEWS

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • The Tesla trolley Problem – success
  • The Greens were wrong! It is Microbes not Fossil Fuels!
  • The Doctor has already seen you!
  • AI assisted North Korean cyber-criminals being hired in US, UK, Europe and Australia
  • AI learns to teach and improve AI

Recent Comments

No comments to show.

Recent Comments

    Tags

    Academic Papers AI Tools Escalating threat to democracy Regulation Techsistential Risk Work

    Archives

    • November 2024
    • October 2024
    • September 2024
    • July 2024
    • June 2024
    • April 2024
    • November 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • September 2022
    • April 2016

    Categories

    • ACADEMIC
    • AI Books
    • Asia
    • CASELAW
    • ETHICS
    • European
    • LEGISLATIVE
    • NEWS
    • RISK
    • TECHNOLOGY
    • UK
    • US
    © 2026 TheThinkingMachine | Powered by Minimalist Blog WordPress Theme