Configuring robots.txt correctly for SEO (my guide)

21 July 2025

Reading time: 3 minutes

Senior SEO-specialist

The robots.txt is a small file with big impact. It determines what search engines may and may not crawl your website. An error in this file can result in blocked content, missed indexation or even ranking loss. In this article I will explain step by step how to correctly configure a robots.txt file for SEO.

Table of contents show

1. What is robots.txt?

The robots.txt is a text file that you place in the root of your domain (e.g. https://jouwdomein.nl/robots.txt). Search engines read this file on their first visit to determine which paths they may crawl.

Important:

It is not a guarantee that something will not be indexed (also use noindex for that)
It blocks crawling, not necessarily indexing
Erroneous rules can cause unintended SEO damage

2. Structure of a robots.txt file

A standard file looks like this:


            txt

User-agent: *

Disallow:

Sitemap: https://jouwdomein.nl/sitemap.xml

Copy to Clipboard

Explanation:

User-agent: * = applies to all bots
Disallow: without path = allow everything
Disallow: /admin/ = block everything in the /admin/ folder
Allow: /path/ = explicitly allow (useful for exceptions)

3. What are you blocking and not blocking?

Do block:

Admin/login pages (/wp-admin/, /cart/, /checkout/)
Internal search results (/search/)
Filter pages with unnecessary parameters (?color=, ?sort=)
Test/dev directories (/beta/, /test/)

Do not block:

CSS and JS files (needed for rendering control)
Important page types (SEO pages, blog, services)
Images (unless you deliberately want to keep them out of image search results)

Google must be able to render the site as users do. So don’t block styling or script files.

Getting started with SEO? Feel free to get in touch.

4. Examples of good configuration

For WordPress:


            txt

User-agent: *

Disallow: /wp-admin/

Allow: /wp-admin/admin-ajax.php

Disallow: /?s=

Disallow: /search/

Sitemap: https://jouwdomein.nl/sitemap_index.xml

Copy to Clipboard

For webshop (e.g., WooCommerce):


            txt

User-agent: *

Disallow: /cart/

Disallow: /checkout/

Disallow: /my-account/

Disallow: /?orderby=

Disallow: /*add-to-cart=*

Sitemap: https://jouwdomein.nl/sitemap.xml

Copy to Clipboard

5. Test your robots.txt file

Mistakes creep in quickly. Always test:

Google Search Console > Robots.txt tester
Screaming Frog > Configuration > Robots.txt
Chrome DevTools > “Blocked by robots.txt” error messages

6. Common mistakes

Error	Solution
Block everything with Disallow: /	Apply only in staging / temporary situations
CSS/JS blocking	Always leave accessible for correct rendering
No sitemap line included	Add sitemap at the bottom of the file
Disallow: /*? use without test	Keep parameters that do have value accessible
Using robots.txt instead of noindex	Use noindex for indexing control, robots.txt only for crawling

7. Robots.txt and staging/testing environments.

Want to shield test or staging environments?

Usage:


            txt

User-agent: *

Disallow: /

Copy to Clipboard

But: this only prevents crawling, not indexing. Combine with:

HTTP authentication (basic security)
noindex in <meta> tags
Block IP address via .htaccess or firewall

In conclusion

A correctly set robots.txt prevents crawl waste and protects your site from unintended indexing problems. Work with clear, controlled rules – and test with every change. Small file, big effect.