By Paul Boagg

http://boagworld.com/site-content/dealing-with-legacy-content/

An automated solution

An automated solution is good for two reasons. First, it doesn’t require anybody manually checking all of the pages. Second, it doesn’t require one person telling another that their content is going to be taken down. The whole thing just happens. People are much more likely to agree to an automated policy for content control than they are to being singled out as somebody who hasn’t maintained their content properly.

So how would this automated approach work in practice?

Automated review points

Essentially a review of a particular webpage would occur when certain criteria are met. This review could happen automatically or manually depending on your preference. However, in either case it requires your content management system being able to identify pages that have reached a certain age (or a certain time since they were last reviewed). In most cases this is something that already exists in a CMS or could easily be added.

An alternative to time based review points would be traffic based. This is designed to remove content that is not really used by users rather than out of date content. This review point would be triggered if the traffic to a page falls below a certain threshold over a given period. This would indicate that the page is of little interest and is simply making it more difficult for the majority of people to find what they are after.

This is a lesson Microsoft had to learn with its support pages. They had support pages for every conceivable issue. However, instead of helping users most of this content just cluttered up the site and made it harder for users to find what they really wanted. In the end they removed less frequented pages and their customer satisfaction shot up.

How often you choose to review pages or how low the traffic trigger is, is entirely up to you. This will depend on how often your site/organisation changes and how much you want to ask of your content providers.

When a page is identified for review an email is sent out to the owner of this page (either manually or automatically) asking them to check the page. Ideally this should simply involve the content provider logging into the CMS and editing the page in question. A simple check box saying that the page is up-to-date is all that is required. If that is not possible a reply by email saying that the page is up-to-date would be just as good.

If the content provider fails to identify the page as up-to-date within a set time period, this triggers a cleanup event (see below). Notice the default here. At the moment the majority of websites defaults are organised so that if the content provider does nothing the content remains online. This approach turns that on its head. No action leads to content being marked for cleanup.

Sample email

 

What happens when a cleanup is triggered?

How you choose to handle the cleanup of webpages is up to you. However, here is my recommended process:

Mark the page as being old content

The first step would be to mark the content as old and potentially out of date. This can be done by automatically inserting a banner at the head of the main content telling the user that this content is potentially out of date. Below is an example of how this might look.

Example notification banner

You might wish to also send an email update to the content owner of that page saying that the page has been marked as out of date.

Remove the page from the site’s navigation

If the content provider still hasn’t checked the page after a set period you might then choose to trigger a further event that removes the page from the navigational structure of the site. This will reduce the clutter that users need to navigate through to find the page they want. However, for those who still really want to access these pages they are still findable via search.

Remove the page from the search results

Of course there is also the option to prevent pages being returned in search results too. It can be hard to find the right page when searching a large site simply because of the amount of content being returned. If a piece of content is out of date then it makes sense not to return it in the search results.

This effectively orphans the page but keeps it online. You may wonder what the point of this is. Surely you would be better deleting the page entirely?

Delete the page altogether

There are mixed opinions about deleting content entirely. On the surface it seems like the most logical thing to do. If content is horribly out of date or is rarely visited what is the point of it being online?

As I see it there is no harm in keeping it online if it is clearly labelled as out of date and it no longer prevents users from finding content they really want. However, removing it can be damaging.

For a start there maybe third party links to that page let alone hard coded links within your own website. The last thing you want to present a user with is a ‘page not found’ error.

The only time I would recommend removing a page entirely is when the user can be automatically redirected to an alternative page that serves their needs better.

Conclusion

I am not suggesting that this approach is perfect. There is nothing stopping a content provider just checking the ‘this page is up-to-date’ box without properly reviewing the content. However, it does put the onus on the content provider to take action. This should automatically remove huge amounts of content from the site without battling with each content provider individually.