Donnerstag, 11. September 2014

H for htaccess: part 5 of the HASCH the OnPage SEO framework

htaccess tutorial for seo
.htaccess (hypertext access) is a text file, placed mostly in the root folder of the given site and invisible cause of the point at the begin. .htaccess contains directives for server, server software, robots and browser about handling of files, folders and paths / URLs.

Generally there are 2 topics, where .htaccess can be used for SEO purposes:
  • Mod_alias and mod_rewrite directives (URL redirects and rewrites)
  • load time optimization
Site security has in my opinion only indirectly to do with SEO, so i decided not to make it to a topic of this article.

The last, fifth part of my HASCH OnPage SEO framework is about the SEO mission of .htaccess. I aim to create a kind of multipurpose explained and examples-illustrated checklist about .htaccess usage for mod_rewrite and robots manipulation and load time optimization as advanced SEO objectives. This ".htaccess for SEO" tutorial will be helpful (for me and you) on performing site audits and building new strictly SEO-minded sites. Read the tutorial →

Some notifications and advices in advance

These notifications, for newbies and advanced users, will make use of the following tutorial easier.
  • Some, specially shared hostings don't allow using of personal .htaccess files with lame excuse of security purposes - i don't recommend such hostings.
  • .htaccess comment lines begin with #. To let the commented rule work, delete the # from the line begin.
  • Make use of IfModule check only if it is necessary - it produces additional workload.
  • Each .htaccess rule creates additional processing overhead, so try to spare or generalize or unify rules. No rule is always better for performance as a rule!
  • Using of some .htaccess rules requires from time to time some PHP modules installed on server.
  • Depending of server configuration some .htaccess directives, specially those, who rule PHP settings, should or even must be placed into other files, like php.ini, user.ini and the like - ask your server administrator about whether and regarding which rules exactly it could be the case for you.
  • Using of .htaccess requires at least understanding of Regular Expressions, knowing of Apache and PHP and browser technologies. However i will try to go enough deeply in detail, and, also, explain the SEO sense of each .htaccess directive i list.
  • Some .htaccess rules, specially those, who optimize load time and server performance, need at least virtual server and will not work at shared hosting, cause it doesn't allow an access to the server configuration.
  • Some servers have own idle time before they make changes you even done in .htaccess valid and working. This time is always special for each server and you will get it to know for your server after some htaccess edits. So have no fear if just after the first .htaccess edit you get an evil looking internal server error. Wait a bit, refresh the remote window of your FTP-browser and, if you've done no syntax error or misspelling, your new rule must work.
  • To get valuable diagnose about your site's load time condition before you start with load time optimization through htaccess, i strictly recommend to run a site performance test with tools like GTmetrix. This free online tool unites Yahoo's YSlow and Google's PageSpeed Insights (the both are available as Firefox extensions too). This diagnosis gives an overview of suboptimal areas and directions of problem solving.
  • While you never ever should experiment with .htaccess of your production site, you must nevertheless have a site for tests, where you will be able immediately see all of your experiment results. Very helpful for immediate results testings are such Firefox extensions like Firebug (the net-tab), and Live HTTP Headers. Before run tests, read these .htaccess debug tips.
I assume you are ready now, so let's dive:

Preamble: best art to use .htaccess is not to use .htaccess

This paradox statement is theoretically correct. I describe the sense of it, so you can decide, whether you are affected by it and whether you need to apply this workaround. This issue is about the load time optimization on very high level. The Apache web server is per default forced to search and query each web folder regarding existence of .htaccess and, if exists, read and perform the rules from it. This means, this could be the cause of many (optimizable) file queries. This behavior can be optimized: all .htaccess contents must be merged and putted into the main Apache configuration file httpd.conf. Beside of transferring of all .htaccess contents into the httpd.conf, don't forget to add the rule about disabling of AllowOverride. This behavior can be set per directory:
 <Directory /var/www/html/directory-with-disabled-htaccess>
  AllowOverride None
 </Directory>
There is a bash-script, which makes this job for you: it finds all .htaccess files and merges their contents into the single file, which can be included into the server configuration file with a one code line. Read this tutorial and download and use the script, if you want automatize the procedure of finding and merging .htaccess files.

Note: if you decide to use this solution and transfer your .htaccess rules into httpd.conf you'll need to use ^/...instead of ^...at the beginning of the RewriteRule lines, in other words, add a slash.

Benchmarks mention the optimization amount of this solution from 2% to 6%. It is clear, that the optimization rate grows with the amount of nested folders - the more .htaccess file queries you can avoid, the quicker loads the site. Another issue caused by this method is complicated maintenance of .htaccess rules. In the reality each one decides about the using of this load time optimization method outgoing of the own configuration and site loading. At this place i end up with paradoxes and go on with tutorial:)

Mod_alias and mod_rewrite for SEO

  • Mod_alias is the Apache module designed to handle simple URL manipulation tasks. If your task is a simple redirection one URL to another you must definitely use mod_alias, which is per default installed on any Apache server.
  • Mod_rewrite, an Apache module which allows sophisticated manipulating of URLs requests on the fly, is used for more complicated tasks such as manipulating the query string, utilizing Regular Expressions rules. It works in the server, virtual host or directory context.
Note: these rules must be enabled to make all other rewriting and redirection rules working. I put them here so i don't need to add them to each example later:

#These rules must be enabled to make all other redirecting rules working
Options +FollowSymlinks
RewriteEngine on

Now let us look at the circumstances, where SEO should make use of URL redirects and rewritings.

Avoiding of duplicated content with .htaccess redirects

The first SEO purpose, which comes me into mind is avoiding of duplicated content to keep the link juice and keep links in canonical version. The duplicated content could appear through following circumstances:

Issue: Site available with both of www and non-www addresses

1. Fix: How to redirect www domain version to the non-www and vise versa
RewriteBase /

#Force non-www:
#RewriteCond %{HTTP_HOST} www.(.*)$ [NC]
#RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

#Force www:
#RewriteCond %{HTTP_HOST} !^www. [NC]
#RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
#RewriteRule ^(.*)$ http://www\.%1/$1 [R=301,L]

#non-www to www, both http and https
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} !^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

RewriteCond %{HTTPS} on
RewriteCond %{HTTP_HOST} !^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]

Issue: Site available with domain name and other addresses like domain-name.tld/index.html or /index.php

2. Fix: how to redirect index.html to the root with .htaccess
#index.html and index.php to root

RewriteRule ^(.*)index\.(html|php)$ http://%{HTTP_HOST}/$1 [R=301,L]

Issue: Site available with both of URLs containing and no containing the trailing slash at the URL's end.

Trailing slash at the addresses end means directory, an address without the trailing slash at the end means a file, but without file's extension - it is advisable to let all URLs, which don't end with file extension, to end with the trailing slash.

3. Fix: how to add trailing slash to URLs with .htaccess
#Ensure there are no files, then add trailing slash

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ http://%{HTTP_HOST}/$1/ [L,R=301]

Issue: Site available with both of page.html and page.html?query_string

4. Fix: how to remove query string from URL with .htaccess
Use it with caution - no single query string will work after applying this fix. Never test with 301 enabled! Read these tips before!
#Remove query string from all URLs

RewriteCond %{QUERY_STRING} .
RewriteRule ^$ /? [R,L]

#Remove query string from special URL
RewriteCond %{QUERY_STRING} .
RewriteRule ^login\.php /login.php? [L]

Issue: Site available with URLs containing both of underscores and hyphens.

5. Fix: how to unify URLs dividers to hyphens
#Unifying URLs dividers to hyphens
 
RewriteRule !\.(html|php)$ - [S=6]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6-$7 [E=underscores:Yes]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6 [E=underscores:Yes]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5 [E=underscores:Yes]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4 [E=underscores:Yes]
RewriteRule ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3 [E=underscores:Yes]
RewriteRule ^([^_]*)_(.*)$ $1-$2 [E=underscores:Yes]
 
RewriteCond %{ENV:underscores} ^Yes$
RewriteRule (.*) http://www.domain-name.tld/$1 [R=301,L]

301 redirects with .htaccess

The next case, where a SEO needs redirections is if there is a need to redirect one URL to another, mainly cause the first URL is no longer valid (site relaunch, site architecture change, landing page redirect, temporarily redirect if a maintenance a site). Such redirections are made mainly by mod_alias. The redirect structure is always the same:
Redirect directive → redirect art → redirect source → redirect target, divided by space
There are 4 directives available:
  • Redirect (needs a number 301-303 or redirect art permanent or temp)
  • RedirectMatch (needs RegEx pattern to match the redirect source / redirect target)
  • RedirectPermanent (is equal 301)
  • RedirectTemp (is equal 302).
Note: Changes to 301 redirects can take a long time to show up in your browser. If testing restart your browser, or use a different one. Use [R] instead of [R=301] for test purposes. When ensure, that the rule works exactly as expected, change the rule to [R=301] for the production site. 

Custom error pages with .htaccess

The notable SEO impact of own error pages is clear - errors happens, and both of robots and human visitors want and, from the SEO point of view, should read something beside "this page is unavailable at the moment". There are 2 .htaccess-directives, which allow to load own error pages: ErrorDocument and Redirect. That is why i decided to place this topic after the redirects.

6. Example: how to load custom error pages with .htaccess
#Load own error page with ErrorDocument
ErrorDocument 404 /errors/404

#Redirect to trigger error and load own error page
Redirect 500 /errors/500

Canonical link for digital assets

The following examples add canonical link to the digital page assets like images or any text files like .pdf, .docx etc. The SEO sense of it is to relate digital assets with the mostly matching content.

7. Example: how to add canonical link to digital assets
#Add canonical link to digital assets, file by file

<Files product1.pdf>
Header add Link '<http://domain-name.tld/products/product1.html>; rel="canonical"'
</Files>
 
<Files product2.jpg>
Header add Link '<http://domain-name.tld/products/product2.html>; rel="canonical"'
</Files>

#Add canonical link to digital assets, more general

RewriteRule ([^/]+)\.pdf$ - [E=FILENAME:$1]
<FilesMatch "\.pdf$">
Header add Link '<http://domain-name.tld/products/%{FILENAME}e.html>; rel="canonical"'
</FilesMatch>

This is a good place to begin with our next topic, namely site load time optimization, cause one of the mightiest methods to optimize a page loading time is... to modify headers;)

Site load time optimization with .htaccess

Modifying HTTP headers to speed up the site loading

Utilizing mod_headers to increase the load time speed

You can get to know, what are your site's needs running speed tests, e. g. by GTmetrix, YSlow or PageSpeed Insights. All of these tests inform you about needful changes specially about what are the needful HTTP header modifications to gain the site's loading speed.

There are following HTTP headers modifications available to increase the site loading speed:
  • Add Cache-Control Headers
  • Add Future Expires Headers
  • Remove Last-Modified Header
  • Turn Off ETags
In the following example i keep this modifications order.

8. Example: how to modify HTTP headers with .htaccess
# 1000 days
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
Header set Cache-Control "max-age=86400000"
Header set Cache-Control "public"
Header set Expires "Thu, 15 Apr 2010 20:00:00 GMT"
Header unset Last-Modified
</FilesMatch>
 
# 3 days
<FilesMatch "\.(xml|txt)$">
Header set Cache-Control "max-age=259200, public, must-revalidate"
</FilesMatch>
 
# Force no caching for dynamic files
<FilesMatch "\.(php|htm|html)$">
Header set Cache-Control "max-age=0, private, no-store, no-cache, must-revalidate"
Header set Pragma "no-cache"

Header unset Pragma

#Unset ETag
Header unset ETag
FileETag None

#Set ETag on
#FileETag MTime Size
If you need other time settings, take the times from this .htaccess time cheatsheet.

Setting expire dates for cache with mod_expires and .htaccess

Cache expire dates can be set hardcoded per media type and with an array of file extensions:

9. Example: how to set cache expire dates with mod_expires and .htaccess
#Expire dates per media type
ExpiresActive on
ExpiresDefault "access plus 1 month"
# CSS
ExpiresByType text/css "access plus 1 year"
# Data interchange
ExpiresByType application/json "access plus 0 seconds"
ExpiresByType application/ld+json "access plus 0 seconds"
ExpiresByType application/vnd.geo+json "access plus 0 seconds"
ExpiresByType application/xml "access plus 0 seconds"
ExpiresByType text/xml "access plus 0 seconds"
# Favicon (cannot be renamed!) and cursor images
ExpiresByType image/x-icon "access plus 1 week"
# HTML components (HTCs)
ExpiresByType text/x-component "access plus 1 month"
# HTML
ExpiresByType text/html "access plus 0 seconds"
# JavaScript
ExpiresByType application/javascript "access plus 1 year"
# Manifest files
ExpiresByType application/manifest+json "access plus 1 year"
ExpiresByType application/x-web-app-manifest+json "access plus 0 seconds"
ExpiresByType text/cache-manifest "access plus 0 seconds"
# Media
ExpiresByType audio/ogg "access plus 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType video/mp4 "access plus 1 month"
ExpiresByType video/ogg "access plus 1 month"
ExpiresByType video/webm "access plus 1 month"
# Web feeds
ExpiresByType application/atom+xml "access plus 1 hour"
ExpiresByType application/rss+xml "access plus 1 hour"
# Web fonts
ExpiresByType application/font-woff "access plus 1 month"
ExpiresByType application/font-woff2 "access plus 1 month"
ExpiresByType application/vnd.ms-fontobject "access plus 1 month"
ExpiresByType application/x-font-ttf "access plus 1 month"
ExpiresByType font/opentype "access plus 1 month"
ExpiresByType image/svg+xml "access plus 1 month"

#Expire dates in array of file extensions
<ifmodule mod_expires.c>
<Filesmatch "\.(jpg|jpeg|png|gif|js|css|swf|ico|mp3)$">
    ExpiresActive on
    ExpiresDefault "access plus 2 days"
</Filesmatch>
</ifmodule>

The load time optimization sense behind HTTP headers modifications: if Last-Modified headers and the Etag header are removed from static files like images, javascripts or stylesheets, both of caches and browsers can't validate the cached file version versus the real. If simultaneously Expires and Cache-Control headers are included, it is possible to specify which files become cached for a given time. At the same time you get rid of all validation requests.

Create robots.txt directives through modifying HTTP headers in .htaccess

It is possible to create file depending robots.txt directives in .htaccess, with making use of X-Robots-Tag. The X-Robots-Tag rules in .htaccess target at a single file or a single file extension or an array of file extensions and make use of same parameters, as meta robots uses:

10. Example: how to add robotx.txt directives using .htaccess and X-Robots-Tag
#Robots directives for a single file type
<FilesMatch ".doc$">
Header set X-Robots-Tag "index, noarchive, nosnippet"
</FilesMatch>

#Robots directives for an array of file types
<FilesMatch ".(doc|pdf)$">
Header set X-Robots-Tag "index, noarchive, nosnippet"
</FilesMatch>

#Robots directives for an array of file types
<Files ~ "\.(png|jpe?g|gif)$">
Header set X-Robots-Tag "noindex"
</Files>

#Robots directives for a single file
<FilesMatch "robots.txt">
Header set X-Robots-Tag "noindex"
</FilesMatch>

#More then one rule at once
<FilesMatch "\.(doc|pdf|swf)$">
Header set X-Robots-Tag "unavailable_after: 4 Jul 2050 11:11:11 GMT"
Header set X-Robots-Tag "noarchive, nosnippet"
</FilesMatch>

# index with no archive for pdf, word and flash documents
<IfModule mod_headers.c>
<IfModule mod_setenvif.c>
SetEnvIf Request_URI "*\.pdf$" x_tag=yes
SetEnvIf Request_URI "*\.doc$" x_tag=yes
SetEnvIf Request_URI "*\.swf$" x_tag=yes
Header set X-Robots-Tag "index, noarchive" env=x_tag
</IfModule>

Help Google crawl mobile content and properly manage the cache delivery with Vary-HTTP-Header

Issue: you serve different content on the same URL. It happens, if the site delivers its content dynamically, depending on User-Agent and shows to mobile visitors something different as to the desktop visitors. It is needful to make a notification about it:
  • to let Google properly crawl both of desktop and mobile versions
  • to manage cache properly (to deliver cache version to correct user)
Look Matt Cutts talking about the need to use Vary-HTTP-Header.

11. Fix: how to notify Google about the different content on the same URL
#HTML content is different depending on User-Agent
Header append Vary User-Agent

Speed up your site and improve UX with link prefetching

Link prefetching in general is background pre-loading of webpages, which assume to be visited as next. The SEO benefit of link prefetching is clear: faster loading is equal higher ranking is equal better converting.

How to manage link prefetching for SEO / load time optimization purposes
  • Get to know from your Google Analytics mostly often visited entry points pages and / or menu points
  • Check with Google Analytics, which pages are visited after entry pages (visiting paths)
  • Add to your .htaccess following rules:
12. Example: how to accelerate your site loading with prefetching
<IfModule mod_headers.c>
#If visitor is at products.htm, product1.htm and product2.htm become prefetched
<Location /products.htm>
Header append Link "</product1.htm>; rel=prefetch"
Header append Link "</product2.htm>; rel=prefetch"
</Location>

#If visitor is at about-us.htm, contact.htm becomes prefetched
<Location /about-us.htm>
Header append Link "</contact.htm>; rel=prefetch"
</Location>
#If visitor is at the main page, menu points become prefetched and so on
<IfModule>

As the next level of prefetching you can use prerendering. Doing prerendering the visitor's browser not only downloads the page, but all assets of it too (images etc). The browser also starts to render the page in background using memory. The usage is the same as n case of prefetching, but i strongly recommend to use prerendering
  • just for very few pages
  • for pages you are pretty sure, they will be visited as next (your GA stats indicate this unambiguously).

Enabling HTTP persistent connections with .htaccess for performing multiple HTTP requests simultaneously using a single connection

Issue: browser establishes new connection for each HTTP request

Fix:
  • files merging and compression, creating CSS sprites and steps the like reduce the amount of needful HTTP requests,
  • spreading your site's assets over maximal 4 hostnames makes possible multiple connections at once (In this case you must be agreed with an additional DNS lookup and TCP 3-way handshake pro hostname)
  • enabling of persistent connections makes possible multiple HTTP requests from the single connection.
13. Example: How to enable persistent connections
  • Persistent connections must be enabled by server administrator in the server configuration file:
    #This code enables persistent connections in httpd.conf
    KeepAlive On
    
    # MaxKeepAliveRequests: The maximum number of requests to allow during a persistent connection.
    MaxKeepAliveRequests 100
    
    # KeepAliveTimeout: Number of seconds to wait for the next request from the same client on the same connection.
    #Try different values, but don't set too high
    KeepAliveTimeout 5
    #KeepAliveTimeout 100
    

  • After you are done with enabling persistent connections in the server configuration, or you are sure, that your server administrator enabled it, add following line to your .htaccess:
    #This code enables persistent connections in htaccess
    Header set Connection keep-alive
    
Enabling of HTTP persistent connections (KeepAlive) reduces CPU workload and gains fast site loading.

File compression with .htaccess

The sense of this action is clear: the lower the volume of transferring data - the lower the load time. The file compression on the fly manage 2 Apache modules: mod_gzip (an external module) and mod_deflate.

File compression utilizing mod_deflate

Mod_deflate is a core Apache module, installed by default. That is why i describe its usage firstly. The usage is a bit easier as of mod_gzip. There are 2 kinds of usage: the one declares files, which must be compressed with media type (application/xhtml+xml and the like), another one declares them with file extensions (css, js etc).

14. Example: how to compress files using mod_deflate and .htaccess
#Compressing per extension
<ifModule mod_deflate.c>
Header add X-Enabled mod_deflate
<FilesMatch "\.(js|css)$">
SetOutputFilter DEFLATE
</FilesMatch>
</ifModule>

#Compressing by media type
AddOutputFilterByType DEFLATE text/html text/plain text/xml application/xml application/xhtml+xml text/javascript text/css application/x-javascript

#or even so
<IfModule mod_filter.c>
AddOutputFilterByType DEFLATE "application/atom+xml" \
"application/javascript" \
"application/json" \
"application/ld+json" \
"application/manifest+json" \
"application/rss+xml" \
"application/vnd.geo+json" \
"application/vnd.ms-fontobject" \
"application/x-font-ttf" \
"application/x-web-app-manifest+json" \
"application/xhtml+xml" \
"application/xml" \
"font/opentype" \
"image/svg+xml" \
"image/x-icon" \
"text/cache-manifest" \
"text/css" \
"text/html" \
"text/plain" \
"text/vtt" \
"text/x-component" \
"text/xml"
</IfModule>

File compression utilizing mod_gzip

If you aren't the server administrator and your hoster doesn't install the mod_gzip, so there is nothing to do. If mod_gzip is installed, but disabled, you can enable it for compressing PHP files with a code line in your .htaccess or php.ini or user.ini, dependently of access:

15. Example: how to enable mod_gzip for PHP files
#Enabling mod_gzip for PHP files
php_value output_handler ob_gzhandler

Note: this rule is pretty CPU intensive!

If mod_gzip installed and enabled, add the compression rules:

16. Example: how to compress files with mod_gzip and .htaccess
#Compressing with mod_gzip
<ifModule mod_gzip.c>
Header add X-Enabled mod_gzip
mod_gzip_on Yes
mod_gzip_dechunk Yes
mod_gzip_item_include file \.(html?|txt|css|js|php|pl)$
mod_gzip_item_include handler ^cgi-script$
mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime ^application/x-javascript.*
mod_gzip_item_exclude mime ^image/.*
mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
BrowserMatch bMSIE !no-gzip !gzip-only-text/html
</ifModule>

Merging files with .htaccess to optimize load time

Most sites have more then one stylesheet and javascript file included. Any site speed tool advices to avoid additional HTTP requests with merging such digital assets and gives improved site speed rank if such files are merged.

17. Example: how to merge css and js files with .htaccess
To merge your javascript or stylesheets files into a single file:
  • create files e. g. main.merged.js or main.merged.css.
  • add to these files lists of your js and css files, looking so:
    • File: main.merged.js. Content: <!--#include file="js/jquery.js" --> <!--#include file="js/jquery.timer.js" -->
    • File: main.merged.css. Content: <!--#include file="css/screen.css" --> <!--#include file="css/mobile.css" -->
  • add to your .htaccess following rules:
<IfModule mod_include.c>

<FilesMatch "\.merged\.js$">
Options +Includes
AddOutputFilterByType INCLUDES application/javascript
SetOutputFilter INCLUDES
</FilesMatch>

<FilesMatch "\.merged\.css$">
Options +Includes
AddOutputFilterByType INCLUDES text/css
SetOutputFilter INCLUDES
</FilesMatch>

</IfModule>
Result: Apache will replace those lines with the content of the specified files.

How influences a charset definition the site load time?

After browser gets to know about the charset of the page, it can immediately start to parse the code, execute scripts and finally render the page. The earlier we notify the browser about the page's charset - the earlier visitors get the ready rendered page. There are in general 2 places for character set definition: in the head-area of the HTML code of the page, and in the .htaccess. I prefer the .htaccess as charset definition place, cause it is the earliest notification possibility.

18. Example: how to define the character set of the site using .htaccess
# UTF-8 for anything (text/html or text/plain)
AddDefaultCharset utf-8

# UTF-8 for some file formats
<IfModule mod_mime.c>
 AddCharset utf-8 .atom .css .js .json .rss .xml
</IfModule>

Advanced server-side caching with mod_cache, mod_disk_cache and .htaccess

I prefer mod_disk_cache and not mod_mem_cache, cause the mod_mem_cache doesn't share its cache between apache processes. This behavior causes high usage of memory and little speed gain. The both of mod_cache and mod_disk_cache are stable extensions of Apache web server, which are however not installed by default. So i publish the .caching rules for these modules with the existence check:

19. Example: how to speed up websites with advanced caching (mod_cache, mod_disk_cache)
<IfModule mod_cache.c>
<IfModule mod_disk_cache.c>
CacheDefaultExpire 3600
CacheEnable disk /
CacheRoot "/path/to/cache/"
CacheDirLevels 2
CacheDirLength 1
CacheMaxFileSize 1000000
CacheMinFileSize 1
CacheIgnoreCacheControl On
CacheIgnoreNoLastMod On
CacheIgnoreQueryString Off
CacheIgnoreHeaders None
CacheLastModifiedFactor 0.1
CacheDefaultExpire 3600
CacheMaxExpire 86400
CacheStoreNoStore On
CacheStorePrivate On
</IfModule>
</IfModule>

Disable DNS lookups in .htaccess

While since Apache 1.3 DNS lookups are per default disabled, it is necessary to disable them in previous versions.

Issue: in case of enabled DNS lookups each inquiry will run a DNS lookup for IP address of origin. Result: reduced site performance.

20. Fix: how to improve site performance with DNS lookups disabling in .htaccess

HostnameLookups Off

SSL as site performance issue and how to avoid it with .htaccess

While in the last time more and more people talk about SSL as a ranking factor, i want warn against across-the-site SSL usage. Moreover i strictly recommend to use SSL only on pages containing or transferring security or privacy relevant data. In addition i recommend not to use images on SSL-encrypted pages. Why?  
Issue: SSL-encrypted pages aren't stored by shared caches!

21. Fix: how to enable SSL encryption per directory and file:
#Forse SSL for certain directory
RewriteCond %{SERVER_PORT} 80
RewriteCond %{REQUEST_URI} directory
RewriteRule ^(.*)$ https://domain-name.tld/directory/$1 [R,L]

#Forse SSL for files: account.php, system.php, admin.php
RewriteCond %{HTTPS} on
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond $1 !^(account|system|admin\.php)  [NC]
RewriteRule (.*) http://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Site load time acceleration with Google's mod_pagespeed

This module isn't a part of default server (Apache, nginx etc) installation. It means, usage of this module is possible only for somebody who has server administrator rights or whose server administrator already installed this module. Mod_pagespeed in the real world isn't very popular: this statistic means
  • entire Internet - 0,1%
  • top million - 0,5%
  • top 100k - 0,6%
  • top 10k - 0,7%
It's not much, but based on this statistic it could be assumed, that the more load has a site, the more causes it has to use mod_pagespeed. This module has many pretty sophisticated settings. For the case you really want or need to exhaust all optimizing possibilities, there is a set of .htaccess rules for mod_pagespeed you can use to speed up your site:

22. Example: how to set up Google's mod_pagespeed with .htaccess to increase the site loading speed
<IfModule pagespeed_module>
AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html
ModPagespeed on
ModPagespeedLowercaseHtmlNames on

ModPagespeedEnableFilters extend_cache
ModPagespeedEnableFilters insert_dns_prefetch

# Text / HTML
ModPagespeedEnableFilters collapse_whitespace
ModPagespeedEnableFilters convert_meta_tags
ModPagespeedEnableFilters remove_comments
ModPagespeedEnableFilters collapse_whitespace
ModPagespeedEnableFilters elide_attributes
ModPagespeedEnableFilters trim_urls
ModPagespeedEnableFilters pedantic

# JavaScript
ModPagespeedEnableFilters combine_javascript
ModPagespeedEnableFilters canonicalize_javascript_libraries
ModPagespeedEnableFilters rewrite_javascript
ModPagespeedEnableFilters defer_javascript
ModPagespeedEnableFilters inline_javascript

# CSS
ModPagespeedEnableFilters rewrite_css
ModPagespeedEnableFilters combine_css
ModPagespeedEnableFilters move_css_to_head
ModPagespeedEnableFilters inline_css
ModPagespeedEnableFilters inline_import_to_link
ModPagespeedEnableFilters move_css_above_scripts

# Images
ModPagespeedEnableFilters inline_preview_images
ModPagespeedEnableFilters insert_img_dimensions
ModPagespeedEnableFilters rewrite_images
ModPagespeedEnableFilters recompress_images
ModPagespeedEnableFilters convert_jpeg_to_progressive
ModPagespeedEnableFilters resize_mobile_images
ModPagespeedEnableFilters sprite_images
ModPagespeedEnableFilters lazyload_images

ModPagespeedEnableFilters local_storage_cache
</IfModule>

Conclusion / tl / dr;

Using the .htaccess rules from this article will act as the cheat sheet on
  • performing audits of working sites, which need optimization or
  • building news sites with SEO and strong load time optimization in mind.
Disclaimer
Use this tool carefully - little misspelling could throw your site from index!
PS: This is the last part of the HASCH the OnPage SEO framework. Don't hesitate to look into the previous parts of it:

HASCH the OnPage SEO framework

Yandex.Metrica