Content Analysis for re-use

The basic premise of “single source” can be summed up in one word.


Sounds simple enough but there is a wealth analysis and work that is required before that, somewhat elegant, aim can be met.

Analysing your content for potential re-use opportunites is, by and large, an onerous task. Whether you do it all by hand, printing out reams of documentation and annotating by hand, or electronically compiling spreadsheets using colour coding or obscure (“they made sense to me at the time”) codes, it takes time to do it properly and there are no shortcuts. Sorry to break it to you so bluntly.

However it does mean that you are forced to spend some time re-reading your content, content which you might not have visited for some time or in some cases, may not have written yourself. You’ll likely find inconsistencies in the content itself, styling errors and quite probably a completely different writing style. Whilst it may seem obvious I urge you, should it arise, to fight the urge to start editing as you go along.

My basic understanding of single source, and the re-use of information, is that there are times when you’ll need to rewrite content so it can be easier used in multiple locations. A change of tense perhaps, a rephrasing or reconstruction of a sentence may be all that is required, and hell, if you have the document open in front of you, why not just go ahead and make that change? Suffice to say that editing content that you are analysing has only one potential outcome. Chaos. Regardless of how well organised, how well planned your analysis is, if you start making changes to your content on the fly, you will soon find yourself with a blurred view of the very thing you are trying to analyse.

Yeah, I know. It’s sounds obvious, and it is when viewed from a distance.

However what I really wanted to discuss, for I’m certainly not 100% certain on this, is at what level does content granularity become too granular? If I want to re-use a paragraph then, obviously, breaking up content to the paragraph level makes sense but that immediately seems like overkill in many cases. So I’ve been steering away from that kind of structural thinking, away from paragraphs and sentences into semantically discrete blocks. So a short product description, containing a heading and a paragraph, is one block and a long product description, containing a heading and several paragraphs, is another. I’m pretty sure this is the correct approach but it does mean that, once you’ve made that decision, you are stuck with fairly large chunks of information.

I’m hoping that this is a good balance though, for if we are to break our content into smaller granules, the overhead of maintaining and manipulating them surely increases. Remember, in a single source system we are concerned with more than content, we also have to contend with the metadata associated with that content, and the more pieces of information we have to maintain, the increase in risk that the metadata becomes so complex as to be useless?

I think. Maybe.. I’m really not that sure.

Have you conducted any content analysis? If so how did you approach the granularity issue? I get the sense that, for a lot of people, the level of granularity is reached once the content analysis is complete, that it basically decides itself.

As we slowly progress towards a single source solution, I’m intrigued as to what to expect next, any thoughts or comments are much appreciated. After all, all the articles, conferences and books in the world can replace real life experience.

This post was, in part, inspired when pondering if semantic analysis might be a way to tackle this but, for now, I wonder if it is perhaps a step too far for most?


  1. Hi Gordon

    Thanks for the link-love!

    Semantic analysis is a great way to understand content, but you’re right, it’s not for everyone.

    Given a sample of content it can tell you the structure and associated meaning (ie. categories) associated with that content. This is good for creating better indexing of content — and Google does it this way so that search results are great rather than just ok.

    I guess then the question is, do you need to understand content at such a detailed level? If it’s just a content audit you’re doing, then the answer is probably ‘no’. If the question is to turn content into discrete elements so that it ability so that machine readability improves (like I needed to do with the medical restrictions text), then the answer is probably ‘yes’.


Comments are closed.