TrellisBot is the automated web crawler that powers TrellisSearch. This page explains how it works, what it looks for, how to verify it, and how to control its access to your site.
01 / Overview
TrellisBot is an automated program that systematically browses the web to build and maintain the TrellisSearch index. It follows hyperlinks from page to page, fetches content, and extracts information that powers search results at trellissearch.com.
TrellisBot is designed to be a well-behaved, respectful crawler. It identifies itself clearly in every request, obeys robots.txt rules, respects crawl delays, and does not attempt to access password-protected or otherwise restricted content.
Open index: TrellisSearch is an independent search engine. Appearing in our index is separate from appearing in Google, Bing, or other search engines. Blocking TrellisBot only affects TrellisSearch.
02 / Identification
TrellisBot identifies itself in every HTTP request using the following user agent string:
TrellisBot/1.0 (+https://trellissearch.com/bot.html)
| Field | Value |
|---|---|
| Crawler name | TrellisBot |
| Version | 1.0 |
| robots.txt token | TrellisBot |
| Documentation URL | https://trellissearch.com/bot.html |
| Operator | Trellis Group LLC |
| Contact | support@trellissearch.com |
03 / Behavior
TrellisBot discovers new pages primarily by following hyperlinks found on pages it has already visited. It also processes XML sitemaps referenced in robots.txt files via the Sitemap: directive.
TrellisBot is designed to crawl politely and avoid placing excessive load on web servers. It limits concurrent connections per server, introduces delays between requests, and fully respects Crawl-delay directives. If your server is struggling with crawl traffic from TrellisBot, set a crawl delay in your robots.txt.
TrellisBot fetches up to 5MB of content per page. Content beyond this limit is not downloaded. For most pages this limit is never reached. Very large pages — such as those that embed large blocks of data or generated content — may be partially indexed.
text/html)application/pdf)text/plain)TrellisBot does not currently execute JavaScript. Pages that rely entirely on client-side rendering to display content may not be fully indexed. Server-side rendered or static HTML pages will be indexed most accurately.
How often TrellisBot revisits a page depends on how frequently that page changes. Pages that update often are checked more frequently; static pages are revisited less often. Revisit intervals adapt automatically based on observed change history, ranging from daily for very active pages to every 90 days for content that rarely changes.
TrellisBot extracts the visible text content of a page — headings, paragraphs, lists, and anchor text. It stores a portion of the page text for snippet generation and relevance scoring. Navigation menus, footers, and repeated boilerplate elements have less influence on indexing than the main body content of a page.
04 / Quality Signals
TrellisSearch uses a multi-signal ranking system that rewards genuine, human-readable content. The following characteristics positively influence how a page ranks:
Our philosophy: TrellisSearch intentionally favors smaller honest pages over aggressively optimized ones. A well-written page on a personal site can outrank a keyword-stuffed page on a large domain.
05 / Spam & Quality Penalties
TrellisSearch automatically detects and penalizes pages that attempt to manipulate rankings or provide little genuine value to users. The following will result in ranking suppression or removal:
Penalty severity: Detected spam signals result in automatic ranking suppression. Severe cases — such as hidden text or egregious keyword stuffing — can reduce a page's ranking score by up to 99%, effectively removing it from results.
06 / Technical Requirements
To ensure your pages are indexed correctly, keep the following in mind:
TrellisBot must be able to reach your page without authentication, CAPTCHA, or JavaScript-only rendering. Pages behind login walls, paywalls, or that require user interaction to display content will not be fully indexed.
| HTTP Code | What happens |
|---|---|
| 200 OK | Page is fetched and processed for indexing |
| 301 / 302 | TrellisBot follows redirects to the final destination |
| 404 Not Found | URL is marked as permanently gone and removed from queue |
| 403 Forbidden | Page is skipped; repeated 403s may result in domain being deprioritized |
| 429 Too Many Requests | TrellisBot backs off and retries later |
| 500 Server Error | Retry attempted; persistent errors skip the URL |
Pages with very little text content — typically fewer than 50 words — are considered low quality and may be skipped or ranked very low. This includes pages that are primarily navigation menus, error pages, or auto-generated index pages with no original content.
XML sitemaps significantly improve discovery speed. TrellisBot processes up to 10 sitemaps per domain. Sitemap index files are supported. Include your sitemap in robots.txt:
Sitemap: https://example.com/sitemap.xml
Use canonical tags to indicate the preferred version of a page when duplicate or similar content exists across multiple URLs. TrellisBot respects <link rel="canonical"> tags and uses the canonical URL as the indexed version.
TrellisBot does not execute JavaScript. If your site relies on JavaScript to render content, consider implementing server-side rendering (SSR) or providing static HTML fallbacks to ensure your content is indexable.
TrellisBot cannot read text inside images. If your page uses images to display important information — such as infographics, charts, screenshots of text, banners with text, or logos — that content is invisible to the crawler. Use alt attributes on your <img> tags to describe the image content in plain text:
<img src="infographic.png" alt="Chart showing 40% growth in renewable energy from 2020 to 2025">
Alt text serves two purposes: it gives TrellisBot context about what the image contains, and it improves accessibility for users with screen readers. Descriptive, accurate alt text is always better than generic placeholders like "image" or leaving the attribute empty on meaningful images.
Text in images: If critical page content — headings, product names, descriptions, contact information — only exists inside images with no alt text or surrounding HTML text, TrellisBot will not index that content. Pages that rely heavily on image-based text may rank poorly due to low detected word count.
07 / Verification
The User-Agent header in HTTP requests can be set to anything by anyone. To confirm that a request genuinely comes from TrellisBot rather than an impersonator, perform a reverse DNS lookup on the source IP address.
host <IP_ADDRESS>Impersonators: If a request claims to be TrellisBot but the reverse DNS lookup does not confirm a TrellisSearch hostname, the request is not from TrellisBot. You may safely block it.
08 / Control
TrellisBot fully respects the robots exclusion protocol. You can manage its access to your site using robots.txt, HTML meta tags, or HTTP response headers.
User-agent: TrellisBot
Disallow: /
User-agent: TrellisBot
Disallow: /private/
Disallow: /members/
User-agent: TrellisBot
Crawl-delay: 10
User-agent: *
Disallow: /
User-agent: TrellisBot
Disallow:
To stop a specific page from appearing in TrellisSearch results, add a robots meta tag to the <head> of that page:
<meta name="robots" content="noindex">
Supported directives: noindex, nofollow, nosnippet, none.
X-Robots-Tag: noindex
Crawling vs. indexing: Blocking TrellisBot via robots.txt prevents it from fetching a page, but that URL may still appear in results if other sites link to it. To remove a URL from results entirely, use noindex — but TrellisBot must be able to fetch the page to read that directive.
09 / Discovery
You do not need to submit your site for TrellisBot to find it — it discovers pages naturally by following links. However, you can speed up discovery by submitting your URL directly or by providing a sitemap.
Reference your XML sitemap in robots.txt to help TrellisBot discover all your pages efficiently:
Sitemap: https://example.com/sitemap.xml
robots.txtrobots.txt10 / Support
For questions about TrellisBot or to report a crawling issue, reach out at support@trellissearch.com.
If you believe TrellisBot is behaving incorrectly or causing server problems, please include your server logs and the specific URLs involved. We take such reports seriously and respond promptly.