Configuring robots.txt correctly for SEO (my guide)

The robots.txt is a small file with big impact. It determines what search engines may and may not crawl your website. An error in this file can result in blocked content, missed indexation or even ranking loss. In this article I will explain step by step how to correctly configure a robots.txt file for SEO.
1. What is robots.txt?
The robots.txt is a text file that you place in the root of your domain (e.g. https://jouwdomein.nl/robots.txt). Search engines read this file on their first visit to determine which paths they may crawl.
Important:
- It is not a guarantee that something will not be indexed (also use noindex for that)
- It blocks crawling, not necessarily indexing
- Erroneous rules can cause unintended SEO damage
2. Structure of a robots.txt file
A standard file looks like this:
txt
User-agent: *
Disallow:
Sitemap: https://jouwdomein.nl/sitemap.xml
Explanation:
- User-agent: * = applies to all bots
- Disallow: without path = allow everything
- Disallow: /admin/ = block everything in the /admin/ folder
- Allow: /path/ = explicitly allow (useful for exceptions)
3. What are you blocking and not blocking?
Do block:
- Admin/login pages (/wp-admin/, /cart/, /checkout/)
- Internal search results (/search/)
- Filter pages with unnecessary parameters (?color=, ?sort=)
- Test/dev directories (/beta/, /test/)
Do not block:
- CSS and JS files (needed for rendering control)
- Important page types (SEO pages, blog, services)
- Images (unless you deliberately want to keep them out of image search results)
Google must be able to render the site as users do. So don’t block styling or script files.
Aan de slag met SEO? Neem gerust contact op.

4. Examples of good configuration
For WordPress:
txt
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /search/
Sitemap: https://jouwdomein.nl/sitemap_index.xml
For webshop (e.g., WooCommerce):
txt
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /?orderby=
Disallow: /*add-to-cart=*
Sitemap: https://jouwdomein.nl/sitemap.xml
5. Test your robots.txt file
Mistakes creep in quickly. Always test:
- Google Search Console > Robots.txt tester
- Screaming Frog > Configuration > Robots.txt
- Chrome DevTools > “Blocked by robots.txt” error messages
6. Common mistakes
Error | Solution |
Block everything with Disallow: / | Apply only in staging / temporary situations |
CSS/JS blocking | Always leave accessible for correct rendering |
No sitemap line included | Add sitemap at the bottom of the file |
Disallow: /*? use without test | Keep parameters that do have value accessible |
Using robots.txt instead of noindex | Use noindex for indexing control, robots.txt only for crawling |
7. Robots.txt and staging/testing environments.
Want to shield test or staging environments?
Usage:
txt
User-agent: *
Disallow: /
But: this only prevents crawling, not indexing. Combine with:
- HTTP authentication (basic security)
- noindex in <meta> tags
- Block IP address via .htaccess or firewall
In conclusion
A correctly set robots.txt prevents crawl waste and protects your site from unintended indexing problems. Work with clear, controlled rules – and test with every change. Small file, big effect.