April 2, 2026
Optimizing your site for agentic visitors: Markdown, context, tokens and more
The web was built for browsers. In a few years, the most frequent "visitor" to your site won't be a human behind a screen; it will be an AI agent.
For 30 years, we've optimized the web for visual rendering. We've packed pages with JavaScript, CSS, tracking pixels, images, and layout code so that humans get a rich, interactive experience.
But for AI systems, specifically those using Retrieval-Augmented Generation (RAG), that visual "noise" is an expensive tax.
At TollBit, we've formatted content in Markdown to aid in RAG since our first day. Here is why Markdown matters, how it transforms your content, why it's the key to saving your CDN budget, and the infrastructure TollBit adds beyond Markdown.
What is Markdown?
Markdown is a lightweight formatting syntax used to represent text in a clean, structured format. It keeps the meaningful structure of a document, such as headings, lists, links, and emphasis, while removing the presentation code used to render a webpage.
This makes Markdown both human readable and easy for machines to process.
HTML is required for rendering sites via a browser, but delivering content to AI in HTML is inefficient. HTML pages contain scripts, styling, ads, and other elements designed for rendering the page in browsers but Markdown only delivers the core content.
For AI systems that need to retrieve and process information quickly and efficiently, this difference matters.
Why Markdown? The token economy
Large Language Models (LLMs) don't "see" a webpage the way we do. They process tokens to read. Every unnecessary <div>, <span>, or <script> tag in your HTML is a token the model has to pay for and — more importantly — spend "attention" on.
For developers building RAG systems, the context window acts as the model's working memory. Every token in the input consumes space in that window. When large portions of that space are filled with HTML boilerplate instead of meaningful content, the model has less room to focus on the information that actually matters. This increases compute costs, reduces retrieval efficiency, and can degrade the quality of the final output.
Markdown solves this by stripping away the presentation and preserving only the semantic structure.
- Human-Readable, Machine-Processable: Markdown is lightweight. It uses simple markers like
#for headers and*for lists, which LLMs interpret with greater accuracy. - Signal over Noise: By converting a 200KB HTML page into a 10KB Markdown file, we ensure the model focuses on your content, not your navigation menu.
- Crawl budget efficiency: Because Markdown is lighter to parse, each page consumes less crawl budget, allowing AI systems to crawl and process more pages from a site.
Real-world transformation: from HTML to machine ready
The best way to understand the impact is to look at real data.
Below are 10 real publisher articles from sites on the TollBit network. These are not hypothetical examples or test pages. Each of these articles has been accessed by AI companies paying for access through TollBit as part of real RAG workflows.
For each article, we compared the original HTML page with the Markdown version returned through TollBit.




















| Publisher URL Type | Raw HTML Size | TollBit Markdown | Token Reduction |
|---|---|---|---|
| Mashable Article | 304 KB | 51.9 KB | 81.5% |
| Daily Record Article | 524.6 KB | 15.4 KB | 97.4% |
| Express.co.uk Article | 475.1 KB | 42.6 KB | 94.9% |
| Forbes Article | 1201.2 KB | 41 KB | 97.1% |
| Goal.com Article | 378 KB | 13.2 KB | 96% |
| IGN Article | 193.2 KB | 13 KB | 94.7% |
| Mirror.co.uk Article | 448.9 KB | 12.8 KB | 97.5% |
| Newsweek Article | 842.3 KB | 22.3 KB | 97.5% |
| Readers Digest Article | 256.9 KB | 77.1 KB | 56.1% |
| ZD Net Article | 756.7 KB | 34.9 KB | 96.7% |
Across these examples, TollBit removes unnecessary HTML overhead before the content reaches an AI system, delivering the structured article content. By serving this optimized version, TollBit-enabled publishers become an optimal alternative to web scrapers for AI companies.
Key Takeaways
HTML pages are overwhelmingly noisy for AI systems
Across these articles, the majority of what an AI system downloads from the open web is not the article itself. It contains scripts, layout elements, and other rendering code that adds little value for machine consumption.
Markdown dramatically reduces token load
Many of these examples show reductions above 95 percent. That means less token usage, more available context window space, and lower inference costs.
Faster ingestion improves RAG performance
Instead of scraping and processing large HTML pages, which can take anywhere from a few seconds to over a minute, AI systems can fetch structured content in 0.25 seconds through TollBit.
Through TollBit's Markdown, content is faster to ingest, cheaper to process, and produces better RAG results.
Reclaiming your bandwidth
When a scraper hits your site it is forcing your servers to serve high-res images, CSS libraries, and JavaScript bundles. You are paying the CDN egress bill for a visitor who will never see an ad or subscribe to your newsletter.
By routing AI traffic through TollBit's Markdown stream, you offload this burden.
- Lower CDN Bills: Serving a 10KB Markdown file is significantly cheaper than serving a 200KB HTML page.
- Reduced Infrastructure Load: AI systems can fetch optimized content through TollBit without repeatedly hitting the origin site.
- Human Performance: By moving bot traffic to a dedicated pipe, your site remains fast and responsive for your actual human readers.
The reality check: where Markdown falls short
Markdown is the best tool for document retrieval, but it isn't a silver bullet. It's important to understand its limitations:
- Document Hierarchy: While Markdown handles headers well, deeply nested data structures (like complex, multi-axis tables) can sometimes lose nuance compared to a full JSON schema.
- Interactive Content: If your page relies on a calculator, a dynamic map, or a live data feed, Markdown will only capture the static "snapshot." It cannot preserve the interactive logic of a web application.
- Visual Presentation: Markdown is about meaning, not design. Fonts, colors, layouts, and other visual elements are intentionally removed.
These limitations are not flaws. They simply reinforce that Markdown is best used as a machine-optimized representation of the core content, not a replacement for the human web.
What TollBit adds to the Markdown baseline
Simply converting text to Markdown is a utility. TollBit is an infrastructure layer that adds the controls publishers need to actually manage how their content is used by AI systems.
- Content Filtering & Controls: Publishers can toggle off specific elements. Want the text but not the image captions? Want to strip out sponsored widgets or "Recommended Reading" sidebars? Our filters do this at the source.
- Legal Certainty: Every Markdown response we deliver includes licensing metadata, ensuring your content is attributed and used within the terms you've set.
- Preserve Interactivity: Agents can request a site in Markdown or agent-optimized HTML that allows them to take actions on site much faster than regular sites.
The bottom line
The web is being re-indexed for a new generation of users: AI agents.
Continuing to serve them the same HTML pages designed for human browsers is inefficient. As AI systems become a primary way information is retrieved, publishers need infrastructure that allows them to deliver content in a format machines prefer while maintaining control over how it is used.
At TollBit, we've spent the last two years building the infrastructure to bridge this gap. Markdown is one piece of it. The additional value comes from the additional filtering controls, licensing, delivery layers, and future optimizations.
Get started with TollBit Agent Sites