On 25 August 2014, Sorin Pintilie (@sorpeen, http://www.sorpin.com/) published an article on The Pastry Box Project, discussing a mechanism that would allow content to be transcluded into a web page, by applying an href="…" attribute to a <p> tag. This article is a response to that.
Transclusion is the inclusion of a small element of content from one source into other material, by reference. The transcluded content is presented as an integral part of the final material – at the point of reference – while remaining dependent on its primary source. It is included at presentation time. The principle of transclusion was part of the original description of hypertext, as published by Ted Nelson in 1965.
There are two variants to transclusion. The first, as envisaged by Nelson, is the easier: content reuse within a single publishing environment. Sorin’s article, and this one, deal with the second type: including a snippet of someone else’s content into your publishing.
Sorin’s approach to third party transclusion follows a model that is common today. Hack the problem at the browser level. He suggests attaching an href="…" to a <p> tag using a new syntax: <p href="http://samplelink.com/^^firstkeyword…lastkeyword">…</p>.
While I can see how this approach is designed to get a quick solution to the issue – indeed, there is a working model linked to from Sorin’s article – I see four fundamental issues with it:
- Source obsolescence
- Implementation at browser level
- The syntax
- Applying it to the <p> tag
The most obvious issue is this approach’s dependence on the source content staying intact. If the transcluded source is rewritten – with the keywords removed, reorderd, or duplicated – the link integrity will be jeopardised.
Sorin’s model partially overcomes this, almost by accident. As a result of cross-site scripting restrictions, his sample offering only works by parsing the transcluded snippet remotely – requesting it from the same location as the supporting scripts are served, with that environment also caching the snippet.
While caching helps, it is a fragile arrangement. The cache would need to be indefinitely persisted, including when its environment is replaced. The link to the underlying source could become defunct at any time, including before it has ever been cached. At this point, there is no value in transclusion: we might as well just copy the source material.
Implementation at browser level
The second issue is more general, one that seems to be the de facto approach to many upgrades in web functionality: hack it into browsers using jQuery, until browser developers decide they ought to support it directly.
This approach has been proven to work with other functionality. There have been quite a few successful polyfills.
But transclusion is too fundamental to be hacked in this way. There are governance implications that must be considered. The source needs to be aware that it is being transcluded by a third party.
The syntax issue has multiple parts to it. The first is that is it not – and probably never would be – a valid URI syntax. HTTP does not allow for unilateral changes or extension to URI syntax and semantics (and the web community does not tolerate attempts).
Even if unilateral extension were not an issue, there is the inherent uncertainty of identifying the snippet correctly simply from a text match, even if the keywords are allowed to be multi-word.
Also, the … is clearly supposed to be an ellipsis (…), but as per Sorin’s sample, is just a series of three full stops. I presume that both formats would need to be supported, in case someone actually wrote it correctly.
Also, there is the question of why would one reinvent the wheel? The XPointer specification (a system for addressing components of XML based internet media) already provides addressing features for identifying anything you might want to transclude, including arbitrary ranges of text.
Applying it to the <p> tag
Lastly, I take issue with the use of the <p> tag as the container for the transcluded content. What happens when we want to transclude not just a sentence, but multiple paragraphs? Or two half paragraphs? Transclusion into a <p> tag is a hack that only considers an edge case.
We already have one element fully capable of being a transclusion container: the iframe. And even if we figured it wasn’t capable enough for various reasons (size control, for starters), there are much better options available.
An approach designed for the reusable content age
It would be totally unfair of me to stop here. I have just ripped holes in Sorin’s proposed approach, so I am duty bound to explain how to do it better.
Implementation at content level, and source persistence
The first problem we need to solve is source obsolescence. How do we ensure content can be transcluded without needing to rely on a caching work-around? How do we ensure that when the source moves, the link remains valid? How do we inform the source that it is being transcluded by a third party, so the owners of that content can apply some governance to it?
This goes hand in hand with the best place to implement the model. It requires new implementations of content management. It requires content to be considered as content, not as strings of text within web pages. The core implementation must be at the level of the server that holds the content to be transcluded. It must be aware that its material is being reused.
As Eliot Kimber (@drmacro) commented in reviewing this article, transclusion is not something that can be patched into the web, it has to be built in at the lowest level.
The models I see working requires new functionality in content management systems, and in browsers. When an author wants to transclude content, he would highlight it in a page and perform a copy-like action. This action would not copy the selected text; it would query the server and obtain a specific URI for that snippet. The server would then know that snippet was to be transcluded. The returned URI is a clean reference to the content we want, from the master source.
When the owners of the transcluded content later edit it, they have several choices. They can move it to a different page; because the snippet URI links to the content, not the page it appears on. They can edit it, so the reference updates to the new content. They can choose to completely rewrite their content in a way that makes the originally transclauded content redundant; which has two sub-scenarios: retain the snippet so it can still be referenced as archived material, or mark it as obsolete.
Of course, there is the equally important question of how to inform all the parties who are transcluding the content that it is changing. I don’t have a particularly good answer to that, except to say that broadcast feed technologies already exist, so reusing one of those is likely the most viable option.
As to the question of what happens when the content source organisation replaces their content management system, it is a safe bet that any organisation using a platform that supports third-party transclusion, with content referenced in this way, would only migrate to another platform that provided the same. As long as the site itself exists, so will the snippets.
The syntax, and the container
With the source issue resolved, we need to turn to how we would reference content to be transcluded. We have already established that a transcluded snippet will have its own source-server-provided URI, so we do not need any special syntax in the request.
However, we do need to standardise how transcluded content will be returned. There need to be rules: agreements to avoid the requesting page being spammed, while also avoiding any tendency to create shell sites that simply rip other’s content. (Several of these were considerations in Nelson’s original hypertext project (Project Xanadu), and are still relevant.)
- The transcluded snippet should be allowed to limit its length (considering copyright and fair-use guidelines – if you attempt to transclude a massive article, you might only get a paragraph and a half, and the reader would need to go to the source for more). Whether this will occur should be identified when the URI is obtained.
- If the transcluded content begins or ends mid-paragraph, it will include leading and trailing ellipses (…) as appropriate.
- The transcluded content should be wrapped in appropriate tags, effectively delineated as paragraphs (thereby allowing multi-paragraph transclusion).
- The transcluded snippet should include three additional elements after the imported content: an optional element identifying the author, a link to the primary home of the transcluded snippet, and an optional identifier to indicate that the transcluded version is no longer current. There would need to be an option in the second element identifying it as an orphaned snippet, that has no source article.
- Transcluded content could support the embedding of advertising (allowing the reuse of ad-supported content), but there should always be an ad-free option (which might just mean it has a lower length threshold).
- The transcluded content should not contain any CSS or script references (this is an issue for some of the more commonly transcluded content). Indeed, it would be fair to strip them out entirely if they were served. This would help ensure that the delivered snippet used semantically clean structure.
The fact that the transcluded content can be complex, may contain multiple elements, means that there are only four reasonable elements to pull the content into.
One is the iframe. The problem here is height. As we cannot know the height of the transcluded snippet, we end up with whitespace or scrollbars. Extension of the iframe definition to support a fit-to-content directive may be an option.
The next obvious element to use is the <div>. It is a simple container, which allows complex content.
The third option, and I believe the best, is the <figure> element. It has a simple benefit over the <div>, in that it supports the <figcaption> sub-element, which would be a perfect container for the author, source link and out-of-date markers.
The last option, which would require the structure of the returned content to be more tightly controlled, would be to simply use the <a> element, with a role of transclusion (<a href="…" role="transclusion">).
The DIY transclusion approach
While we wait for the web to catch up with Nelson’s ideas (which are now more than half a century old), several organisations have already put in place mechanisms to allow their content to be transcluded by third parties. Commenting systems are one obvious example. But perhaps the more visible today is Twitter, which provides a simple syntax and script to embed tweets into any content.
Twitter’s approach demonstrates some basic positive thinking: they define the snippet size you can transclude (the tweet) and provide certain surrounding structure (the source, favourite and retweet counts, and favourite and retweet functions). The syntax is predefined – all you need is the tweet’s identifier.
The downside is that in order to include tweets, you need to load Twitter’s scripts. This is primarily because of cross-site scripting, and the embedded retweet functionality. Twitter’s model is not reusable as is. Everyone wanting to provide that ability to transclude their content needs their own model, their own syntax, and their own scripts.
So long as there are only a few sources that anyone cares to transclude, this can work. But it is not a long-term approach to third party transclusion.