
In a significant move impacting how online content is accessed and used for artificial intelligence training, a major web infrastructure company has implemented a new default setting. This change means that AI crawlers, designed to collect data from websites, are now blocked by default from accessing sites utilizing their services.
This shift places powerful control directly into the hands of website owners and operators. Previously, allowing or disallowing specific crawlers often required manual configuration or reliance on general web standards like robots.txt
. Now, the baseline is set to disallow access for identified AI scraping agents.
However, this is not a permanent, unchangeable block. The platform provides site administrators with the flexibility to override this default. They can choose to specifically allow certain AI crawlers or even grant access to all of them, if they wish their content to be potentially used for AI training purposes.
This decision is poised to significantly impact the flow of data across the internet, particularly concerning the vast amounts of information needed to train large language models and other AI systems. It underscores the growing debate around data ownership, usage rights, and the ability of content creators and publishers to decide how their work fuels the rapid advancement of AI technology. Ultimately, while the initial stance is restrictive, the final decision on scraping access rests squarely with the website owner.
Source: https://www.helpnetsecurity.com/2025/07/01/cloudflare-blocks-ai-crawlers/