Freitag, 4. Juli 2014

A for Architecture: part 2 of the HASCH the OnPage SEO framework

What means a SEO talking about architecture? What kind of architecture matters for SEO?

SEO must clearly distinguish two kinds of website architecture:
  • site-wide architecture,
  • page-wide architecture.
Both site-wide and page-wide architectures have own rules to obey. I will formulate these rules flexible and adaptable enough to be applied to any content or ecommerce project.

Site-wide architecture for SEO

Site-wide, also project-wide architecture i would call with an umbrella term in general "information architecture" (IA). The main tasks of IA are:
  • structural information design
  • proper content organizing and labeling for better findability and usability
But what is the concrete main goal of information architecture regarding SEO? I think, it is the best (quick- and deep) possible site ability to be crawled by the search bot. In other words the content organization must
  • allow the search bot to accessibly extract and index any ranking-relevant information,
  • provide correct combinations of keyword, content and landing page.
It isn't a rocket science, but there are some things we must watch out for. I list them one by one and mention the SEO meaning of each:

Hierarchy of pages, of content, of keywords and of prominence: how to build a SEO-ed silo

There are many good, meaningful articles about how to organize the pages hierarchy for a website. The most used keyword for this is "siloing", which means in general the following site hierarchy: homepage - category page - article. I recommend to keep things simple. When preparing a draft of your site's hierarchy

1. keep the depth as flat as possible, the optimal site's depth is 3 levels.
Unnecessarily nested site make the search bot going extra mile and increases the difficulty of getting a proper PageRank. The fewer levels your site build with, the more PageRank will inherit all subpages from the homepage.

correct internal linking for seo
Correct interlinking:
only between parts and levels
of the same (topical) segments
2. Build your segments of the hierarchy (category 1 - article 1, category n - article n) based on content to the same topic and on related keywords, optimized from broad to narrow / topical specific.
Topically categorized site structure provides user friendly navigation, becomes easy maintainable, and, very important, keeps topical coherency and, therefore, establish clusters of topical relevancy for Google.

3. Interlink the parts and levels of every segments bidirectionally vertical and between items horizontal on the same level, but never different segments and their parts between each other.
Topically interlinked content to related keywords gains the topical relevance inside of subsegments and, hence, of the last level items, which are the (long tail) landing pages.
Never interlink segments with different topical orientation between each other!
how to link internally for proper seo
Incorrect interlinking:
links between parts and levels
of topically different segments
4. Keep keyword hierarchy: optimize category page for keyword containing max. 2 words, article page - for keyword with min. 2 words.

5. Map keywords to content, so one page is optimized ONLY for 1 keywords (unimportant, how many words it contains)
Trying to optimize a page to more the a one single keyword ends up in insufficiently optimized page for all keywords!


6. Only one single menu on a site shall contain dofollow links. Any other navigational elements between content items, like HTML-sitemaps, tag clouds, must be nofollow.
If beside of the primary menu any additional navigation element on the site would have dofollow links, it will create additional links to the same content, which are duplicated content. Don't build the site needlessly complicated: one navigation is fully enough and you avoid a bunch of troubles.

7. Furnish links of your primary menu with bookmark and title.
It makes them more accessible and qualify these links to be picked up as sitelinks.

8. Sitemap segmentation: create XML-sitemaps for every site's segment and for images.
Although Google is able to render Javascript, DON'T USE javascript based menu!

URLs for SEO

It is needless to say, URLs must be readable, describe the site structure, also be hierarchic and contain keywords. Most usable URL design is: http://domain.tld/category-1/article-1. Regarding URL of article page the category part in URL is unnecessary:

9. build the URL to the article page like http://domain.tld/article-1.
SEO benefits are visible: this URL is shorter, easier to notice and doesn't contain keywords, which don't have to do with the article directly.

How to fix duplicated content

Duplicated content is the issue, which affect already all, who uses any CMS or blog engine, like Joomla, WordPress etc. The issue is located in the nature of this software. The URL canonicalization and 301 redirect are the techniques to fix duplicated content on the way, which will don't harm your site's SEO.

10. Use 301 redirect if your duplicated content issue is caused by
  • moving sites,
  • expired content,
  • different URLs for the homepage, like /,  /index.php, /index.html, /home
  • ecommerce product pages of products, which will be never more available
11. Use canonical if you run into the duplicated content issue due to
  • multiple static or dynamic URLs pointing to the page with the same content (article links originated from blog calender, menu links, print versions, search result pages with or without search parameters in URL, category pages with same items, but different sorted, with or without sorting parameters in URL, same pages with session parameters in URL)
  • pages with very similar content (product pages with different colors)
  • syndicated content (RSS-feeds)

Exclude from crawling

There are three methods to block content from crawling:
Webmaster tools even provide some possibilities to exclude duplicated content: URL removal tool, URL blocking tool.

how to optimize site architecture for search engines
These are pages blocked from indexing with a robots.txt-directive,
which is ignored from the search bot,
cause these pages got incoming links and earned PageRank
The best way to block URL from indexing is to use blocking values of meta tag robots in the header of every page, which must be excluded from indexing, so you block URL one at a time. If a set of URLs or a directory is blocked from indexing with a directive in robots.txt, this directive could be ignored, if blocked URL gets incoming links and, as a result, link juice / PageRank.
Use meta tag robots for most efficient excluding from crawling/indexing.
12. Exclude from crawling and indexing site's directories containing scripts, design template, auxiliary images, which aren't content images, listed by images-sitemap . Use robots.txt or, better, create an empty index.html inside of blocked directory and add meta tag robots with blocking values.

13. Exclude from crawling and indexing with nofollow links to pages not including your ranking keywords and block these pages from crawling with meta tag robots. Such pages could be ToS, privacy policy, disclaimer, search result pages, archives, contact, register and login pages and the like.

14. Exclude from crawling with nofollow all auxiliary links like calendar dates, print versions, tag clouds and the like.
In ideal, but unachievable case only links containing keywords as anchor text aren't nofollow. In real life it isn't possible, but think and act in this direction.

Internal site search and SEO

Ca 10% of the site visitors use site search. Nevertheless search is on of important navigation engines on the site. But as important is the search for navigation and user experience, as much more the search results and search filters must be excluded from crawling and indexing. The cause: search results and search filters produce unpredictable much duplicated content. Dependently of what allows your site management software or CMS target following goals regarding your site search:

15. Exclude from crawling and indexing all search result pages.

16. Exclude from crawling and indexing all links to search result pages.

Improving of user experience with site search

17. Place on the search site links to the most and recently searched queries

18. Adjust your search filters so, so they perform faceted search with not more as 100 items in the deepest facet.

If your site development follows these rules, you may be assured, you've done all the best for your site's architecture and now is the time to have a focus on the next issue.

Page-wide architecture

site architecture optimized for search engines
HTML5 influenced page structure
What this means? The order of elements inside of the opening and closing body-tags. Before HTML5 there were only block and inline elements and tables as building blocks. HTML5 brought the semantic structure into the page body: we got such meaningful HTML elements like header and footer, nav and aside, section and article. We can mix them in any unique order we just want. The human eye and the search bot read the page similar but a bit different.
The search bot reads straight from top to bottom.
This fact is the starting point to think about page-wide architecture. It isn't an objective of this text to tell about user experience, human reading manner and how to design beautiful and readable websites. Instead i'm talking about search bot: how it reads, how ordered must be the page content to ensure, that Google understands fully and exactly, what the content detailedly means. Most discussions about the page content order are from 2006-2011 - at that time this issue was named as "source ordered content", there are already zero fresh discussions to this topic. So what's the point?

The idea is simple: because the search bot reads from top to the bottom, the content must be ordered according to its importance too, the more important is a thing, the closer is it to place to the opening opening body-tag
It is clear, such kind of layout can't harm the site in any way. So it was only discussed, whether and which practical effect this approach brings. The usual elements order in general is: top menu, header (logo+slogan in H1), then the content, sidebar(s) and the footer. Many suboptimal layouted CMS and WordPress templates push sidebars before the main content. So the core of the whole discussion seems to be about the worthwhile of the effort, to order the content accordingly its importance and rearrange it with CSS to achieve the familiar, usual appearance. As i mentioned, the discussion faded away years ago without a clear conclusion and everybody acts for own convenience.

However, Google confirms the idea about main things come first here: describing causes of non-showing up the structured data markup through the Google's testing tool Google means
In general, the goal of rich snippets is to display the most relevant content on the page to users. If you see an "insufficient data to generate preview" message, it is generally for one of two reasons:
1. The marked-up content doesn't appear to be the main subject of the page. This can happen if the marked-up content is very low down on the page...
And shortly i got another solid reassurance about validity of the consideration, that the search bot ranks in the order of appearance.
The search bot ranks the content importance in descending order.
Most important things stay on the top and at the beginning:
  • title directly after opening body-tag,
  • keywords at the beginning of the title,
  • keywords at the beginning of any heading,
  • keywords at the beginning of the first sentence after any heading,
  • begin and end your content above the fold.
The task to place keywords exactly so is tricky, cause your content could quickly become unreadable and overoptimized. Just let your copywriter keep these rules in mind
So a page has to be constructed corresponding to this insight. Other leading thought is, that optimizing takes place less for the site in general, but rather for every page at once, one by one. As the general building blocks we take HTML5 blocks, as on the image above:

19. Directly after opening body-tag comes the block article, with title as H1 and the main keyword-rich content.

20. Below the end of article's text place the header, where i recommend to put, beside of site slogan and logo, a authorship link, some contact data using address tag and containing something like address, ZIP code, city and link to the Google map or the geodata of the site.
Publishing structured authorship and contact data on each content page near to content makes the best possible local SEO effect: you establish an apparent binding of your content with your location data.
If you place any kind of structured markup in such boilerplate elements like header, nav or footer, and this markup is this is the single occurence of structured data on this page (it is a very very very rare case hovewer), it could happen, that the Google structure Data testing Tool doesn't show it up, with a message "unsufficient data to generate preview". It's because Google means, the data structured data isn't the main subject of this page. In our case is this true - we place contact data here only for purpose to show our relation to this page.

21. With intent to build on the page a cluster of topically related content with related keywords place then the section or aside with some (my choice is 3) linked titles and text snippets of most read related articles (not only most read and only related, but both: most read OF the related). The only important thing regarding the elements order is, to ensure, that the search bot on its linear top to bottom moving reads all keyword-relevant content parts contiguously.

22. Now we are ready with our page's content part: place the nav with main menu, where one of the first menu links would be better to the contact page with structured detailed contact information.

Special case: page-wide architecture of the page 404

What helps a visitor on a page, where is no content by the page's nature? It is fully enough to place the nav with the menu, a header with site informations like logo, slogan and contact data, and then a section with some snippets of most read content items.
Ensure your pages return correct answer codes: usual content 200, page 404 - 404.


I'm aware, that the structure i described above looks a bit unusual, but after small use of CSS the page looks like the human eye uses it to see: top navigation, header with logo and slogan, title and the article text, snippets of related articles and footer with contact data. Further i'm absolutely sure, that not to follow the elements order i described will bring no negative effect, maybe just a bit less of positive. I experimented extensively with other element positioning and what i can say definitely: it is more important, how are page blocks optimized, as how they are ordered on the page.

Complying with the instructions be sure your site architecture is proper optimized for search engines.

The first article from "HASCH the OnPage SEO framework" is here:
H for Header: part 1 of the HASCH OnPage SEO framework