Last month, Mustafa Suleyman, the CEO of Microsoft's AI division, stated during a panel interview that in his opinion, generative AI services that get trained from the information found on almost all websites are considered "fair use". He added, "Anyone can copy it, recreate with it, reproduce with it. That has been 'freeware' that’s been the understanding."
Those statements generated a lot of debate from a number of people online, who felt Suleyman's opinion showed that companies like Microsoft, OpenAi, Google and others with AI systems did not care about the ownership of content from the sites that are being trained by Copilot, ChatGPT, Gemini, and others.
This week, Cloudflare, one of the biggest hosts for websites, announced it would make it easier for its customers to block their content from AI bots. in a blog post, it stated that all of its hosting customers, including its free users, can go into their site's dashboard on Cloudflare, then click on the Security option, and finally the Bots section.
They should see a new section called AI Scrapers and Crawlers with a toggle, Clicking on that toggle will block these AI bots from taking content from the website. Cloudflare says it will update this feature in the future "as we see new fingerprints of offending bots we identify as widely scraping the web for model training." It added:
We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection. We will continue to keep watch and add more bot blocks to our AI Scrapers and Crawlers rule and evolve our machine learning models to help keep the Internet a place where content creators can thrive and keep full control over which models their content is used to train or run inference on.
The blog post also offered some info on the top AI bots, in terms of requests from Cloudflare-hosted sites. The biggest one is Bytespider, which is used by China-based ByteDance, the parent company of TikTok, for use in its Chinese AI services like Doubao. Other top AI scraper bots include Amazonbot, which is reportedly used to get data for Amazon's Alexa service.
Bytespider is also the top AI bot in terms of the percentage share of Cloudflare websites with 40.40 percent. GPTBot, from OpenAI, is a close second with 35.46 percent,
Hope you enjoyed this news post.
Thank you for appreciating my time and effort posting news every single day for many years.
2023: Over 5,800 news posts | 2024 (till end of June): 2,839 news posts
Recommended Comments
There are no comments to display.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.