Google Just Told the Web What's Coming: Agents That Don't Follow robots.txt

Last week, Google quietly updated its crawlers documentation to introduce a new fetcher: Google-Agent. It's the user-agent behind Project Mariner and future agent products built on Google's infrastructure — AI agents that browse the web, take actions, and retrieve content on behalf of users.

If you publish content on the web, this is worth paying attention to. Not because Google-Agent is uniquely bad (it's actually more transparent than most of what we track) but because of what it signals about where the web is headed.

What Google-Agent Actually Is

Google-Agent is a "user-triggered fetcher." When a user asks a Google-hosted AI agent to do something on the web — find information, compare products, complete a task — Google-Agent is the thing that goes and does the browsing.

It identifies itself with a clear user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Google-Agent; +https://developers.google.com/crawling/docs/crawlers-fetchers/google-agent) Chrome/W.X.Y.Z Safari/537.36

It publishes its IP ranges in a dedicated user-triggered-agents.json file. It has a verifiable reverse DNS pattern (google-proxy-***-***-***-***.google.com). And Google is experimenting with a web-bot-auth protocol for cryptographic agent identity — essentially letting agents prove who they are.

This is more identification infrastructure than most AI companies have bothered to build.

The robots.txt Question

Here's where it gets complicated. Google's documentation states plainly that Google-Agent, like all user-triggered fetchers, "generally ignores robots.txt rules" because the requests originate from explicit user actions rather than autonomous crawling.

The logic is familiar: if a human user is directing the agent, the agent is acting more like a browser than a crawler. Browsers don't check robots.txt. Why should an agent acting on a user's behalf? Perplexity justified a similar behavior in Q2 of 2025 when it launched its own agentic functionality.

To AI companies, it's a reasonable argument. It's also one that should make publishers uncomfortable.

In our Q3 & Q4 2025 State of the Bots report, we found that approximately 30% of AI bot scrapes already violated explicit robots.txt restrictions. OpenAI's ChatGPT-User led the pack with a 42% non-compliance rate against disallowed content. The robots exclusion protocol, as we've documented extensively, "carries no enforcement mechanism and relies entirely on the goodwill of operators."

Google-Agent doesn't violate robots.txt. It categorically opts out of it. That's a meaningful distinction — Google is telling you upfront rather than quietly scraping disallowed paths — but the practical outcome for site owners is the same: your robots.txt file doesn't apply to agents.

And when Google establishes this norm, it makes it significantly easier for every other agent framework to follow suit.

Credit Where It's Due

As far as agents go, Google is doing something genuinely different from what we've been tracking across the rest of the AI ecosystem and it deserves recognition.

They're telling you who they are. They're publishing IP ranges. They're building verification mechanisms. Compare this to the nearly 40 third-party scraping vendors we identified in our research — companies that help other AI companies rotate IPs, spoof user-agents, and route through residential proxies specifically to avoid detection.

The ads integrity angle matters too. Google made $68.6 billion in advertising revenue in 2025. They have a massive financial incentive to make sure AI agents interacting with the web don't poison their ad metrics. The careful documentation, the identifiability infrastructure — these aren't just courtesy to publishers. Google needs the audit trail to protect its own business.

Maintaining ads integrity in the face of agent visitors was a problem we first identified in the Q2 State of the Bots report. Google perhaps decided that in the long-term interest of protecting the ads ecosystem, it is worth disclosing the agent as opposed to hiding it. That alignment of incentives is genuinely useful for everyone.

The Perplexity Problem

To understand why Google's transparency is noteworthy, look at what happens without it.

In March, a federal judge blocked Perplexity's Comet AI shopping agent from accessing Amazon. The core allegation: Perplexity deliberately disguised Comet as a regular Chrome browser session to evade detection. Amazon said it warned Perplexity at least five times starting in November 2024. This is the opposite of what Google is doing. Perplexity made a calculated decision to hide its agent's identity rather than announce it. And the court sided with Amazon.

The Amazon-Perplexity case exposes a tension that every company building AI agents is going to face: do you identify your agent honestly, or do you disguise it to avoid getting blocked? If you identify yourself, sites can refuse you access. If you don't, you might get sued — or worse, you erode the trust that makes the open web work at all.

Google chose identification while Perplexity chose disguise. The market is going to have to pick a lane, and right now there's no industry standard forcing the choice. That vacuum is where things get messy.

What This Actually Means for Publishers

The bots we've been tracking for the past year — GPTBot, ClaudeBot, PerplexityBot — are readers. They visit pages, pull content, and use it to generate answers. That alone has been rough for publishers: in our data, AI referral click-through rates dropped from 0.8% to 0.27% over the course of 2025. The content gets consumed, but almost no traffic comes back.

Google-Agent is something different. It's not a reader; it navigates pages, fills out forms, clicks buttons, completes tasks. An agent that finishes a purchase on your site might actually be more valuable than a human visit. An agent that summarizes your content without ever sending the user your way is the same extractive dynamic we've been documenting, just dressed up differently.

While the "user-triggered" framing lets Google draw a clean line today, that line will get blurry fast. A user saying "find me the best laptop deal" and an agent independently visiting 50 review sites starts to look a lot like crawling with extra steps. And as agents get more autonomous — chaining tasks together with less and less human oversight — the distinction between "user-triggered" and "autonomous" becomes mostly semantic. The outcome, however, is the same: the amount of "page views" on the internet is going to go up, and it's going to mainly be AI visitors.

Where TollBit Fits

Google identifying its agent is a good start. But identification doesn't answer what happens next: what content does that agent get? In what format? On whose infrastructure? This is what TollBit is for.

The core problem is that AI agents are showing up at sites built for humans. They're parsing HTML meant for browsers, hammering CDNs designed for human traffic patterns, and creating unpredictable load that degrades the experience for actual readers. Google-Agent at least tells you it's coming. Most don't.

Agent Site is where you start. You stand up a dedicated subdomain that serves AI agents content optimized for how they actually read — semantic markdown for simpler bots, interactive HTML preserving forms and buttons for advanced agents like Google-Agent that need to navigate and act. Agent traffic gets cleanly separated from your human-facing site. Your CDN bill stays predictable. Your real users don't feel the impact. And you decide which agents get access and on what terms — with the ability to layer on usage-based pricing through our Bot Paywall or programmatic licensing when you're ready to monetize.

That visibility layer matters more than people realize. Most publishers we talk to don't know which AI bots are hitting their site, how often, or what pages they care about. TollBit Analytics gives you that picture across your entire portfolio — broken down by AI platform, by bot, by individual page — so you can see where AI demand concentrates and make informed decisions about access.

We publish quarterly research on AI bot behavior. Read the latest at tollbit.com/state-of-the-bots or visit tollbit.com to see how you can prepare for the agentic web.

Google Just Told the Web What's Coming: Agents That Don't Follow robots.txt

What Google-Agent Actually Is

The robots.txt Question

Credit Where It's Due

The Perplexity Problem

What This Actually Means for Publishers

Where TollBit Fits

You may also be interested in

TollBit Enters Japan Through Strategic Partnership with Japan Business Press and BI.Garage

A Tale of Two Futures: Scraping or Tolls

Where Does the Ad Money Go?