On this way a site with untreated URLs with query strings gets on the one side such URLs into index, which don't belong here, on the other side the crawling budget for good URLs could be missed, cause overspend.
There are some passive techniques to deal with query strings in URLs. Actually i planned to publish existing techniques for dealing with query strings in URLs and my solution for SEO problems caused by query strings in URL into my ultimate htaccess SEO tutorial, but then this topic got some more details, so i decided to create an extra article about query strings in URL and SEO.
Preceding kinds of SEO dealing with query strings in URLs
- while Google means, it could deal with query strings in URLs, it recommends to adjust the bot's settings in Webmaster Tools for each of existing query strings.
-
URLs with query strings could be disallowed in the robots.txt with a rule like
Disallow: /?* Disallow: /*?
- If header information of HTML or PHP files, available with URLs with query strings, can be edited, so it is possible to add rules for indexing management and URL canonicalisation, like
<meta name="robots" content="noindex, nofollow"> <link href="Current URL, but without query string" rel="canonical">
Universal solution for SEO of URLs with query strings
1. URL architecture
The solution works on the server (Apache) side, in the .htaccess (or httpd.conf) file. Before we begin, lets look detailed on the URL architecture in the htaccess syntax. Any URL has three parts, which can be separately addressed by different htaccess variables:http://example.com/page/?query-string Here
- example.com is addressed by HTTP_HOST
- page/ is addressed by REQUEST_URI
- ?query-string is addressed by QUERY_STRING
2. Solution approach
The approach in general is:- to find all URLs with query strings,
- add to all of them a custom HTTP header,
- add to the HTTP header X-Robots tag noindex,
- add to the HTTP header a link rel="canonical" rule, populated with the same URL as the current, but without the query string.
3. Assembling the rules set
As the first we ensure, that mod_rewrite is set on and formulate the rewrite base, which could be other then the root:<ifModule mod_rewrite.c> RewriteEngine On RewriteBase /Then we catch URLs with any query string:
RewriteCond %{QUERY_STRING} . # The rule catching any query string could be like # RewriteCond %{QUERY_STRING} ^[a-zA-Z0-9]*$ # but for the sake of simplicity we use the first oneThen we do the trick: we create a rewrite rule for all catched URLs with any query string, which doesn't rewrite any URL, but add to all catched URL our custom header:
RewriteRule .* : [E=DO_SEO_HEADER:1]Well, on this place we've already done: we got all URLs with any query string and added to all of them a custom HTTP header. What remains is to populate the HTTP header with rules we need: First we add the X-Robots tag to all HTTP headers, triggered by firing of environment variable, we set before:
# Close the mod_rewrite check </ifModule> # Do mod_headers check <ifModule headers.c> Header set X-Robots-Tag "noindex, nofollow" env=DO_SEO_HEADERThen we add the canonical rule with the "pure" URL, composed of HTTP_HOST and REQUEST_URI. This rule is also by firing of environment variable, we set before.
Header set Link '%{HTTP_HOST}%{REQUEST_URI}e; rel="canonical"' env=DO_SEO_HEADER # Close mod_headers check </ifModule>
SEO query string solution is ready for action! Using this rules set your site's ranking will never be damaged by duplicated content, caused by URLs with query strings. Another benefit brought by this rules set is always exact indexing management: you will always have only canonical URL version in index. Use and enjoy!
TL; DR;
Universal solution for SEO troubles caused by URLs with query strings: add the following rules set to your .htaccess file:
PS: full rules set:
# Ensure Apache's mod_rewrite works <IfModule mod_rewrite.c> # Set rewrite engine on RewriteEngine On # Set rewrite base RewriteBase / # Catch any query RewriteCond %{QUERY_STRING} . # Rewrite all findings, also URLs with query strings so, that they get new header RewriteRule .* : [E=DO_SEO_HEADER:1] </IfModule> # Ensure Apache's mod_rewrite works <IfModule mod_headers.c> # Specify X-Robots tag with noindex option for URLs with query strings Header set X-Robots-Tag "noindex, nofollow" env=DO_SEO_HEADER # Specify canonical URL version Header set Link '%{HTTP_HOST}%{REQUEST_URI}e; rel="canonical"' env=DO_SEO_HEADER </IfModule>