Akhil Unnikrishnan

On tags and semantic organisation

This started out as a typo-ridden word-salad that I managed to conjure up during the writing session of IndieWeb Club #8 in August. It took me quite a while to go from early draft to this, but here we are.


Tags seem like overkill when you first encounter them - mostly because you don't think you'd ever need them. Once you get into the rhythm of using them though, you start to realise they're a lifesaver. Humans intuitively categorise the world around them, and tags are handy because they are a novel technique of categorising content - books, articles, media, URLs.

The discoverability that tags afford is reliable in a way search engines aren't, with their ever-changing ranking algorithms. Furthermore, the tagged resource is well-defined - most systems that allow tags also offer a way to list entries in a tag, usually via a URL.

While beneficial in these regards, tagging is a double-edged sword - as as I've written about before. You risk losing control over things unless you are super-disciplined about how you tag stuff. Karl Voit has a long essay1 on how tags should be used. The basics are:

  1. Use as few tags as possible.
  2. Limit yourself to a self-defined set of tags.
  3. Tags within your set must not overlap.
  4. By convention, tags are in plural.
  5. Tags are lower-case.
  6. Tags are single words.
  7. Keep tags on a general level.
  8. Omit tags that are obvious.
  9. Use one tag language.
  10. Explain your tags.

For what it's worth, this type of self-discipline and rigid system isn't feasible for everyone, and if you feel the system works, good for you. But the attention to detail and level of granularity that is required to keep the system on track would end up being too much work for most people.

Not everything in the world fits neatly into discrete categories, and tags run into the same limitations. This is why Karl's third rule is the easiest to run afoul of. As the collection of tags grows, you end up in a sea of tags, each one more specific than the one before.

At a certain point, you run into the limits of abstraction. Tagging then is no longer about organising content. The content is being abstracted to the point of losing its meaning. Each layer of abstraction pulls you a step away from the original idea. The map is not the territory2. Tags cannot convey the full context of the content you are trying to categorise. But by tagging every nuance, you'll end up recreating the entire thing.

The tension between specificity and discoverability leads to yet another difficult situation. Since the possibility of any hyper-specific tag being linked to a lot of content is incredibly low, such niche tags end up being too sparse. Generic tags are better if discoverability is important to you, but then they will end up crowded with just how much content is linked to them.

Writing off tagging as a concept is not optimal either, since discoverability and recall are affected. It also hinders the very rudimentary act of grouping together (semantically)-related content. Hearsay at the IndieWeb Club meetup suggests the existence of systems that convert all of your blog posts into embeddings - which are then used to find semantically similar entries - to populate the little "Related posts" block at the bottom. Why do that when you can use tags to get a reverse-chronological list of all the content that you've deemed to be related?

When personal knowledge is spread over several disparate systems, each with their own paradigms for how they store and represent information, a highly-rigid tagging system (specifically rules 2, 9, and 10) might end up being useful. However, the huge investment of time that's required to centralise tags, create a personal tagging language, and then to seed it across those systems makes it less appealing.

Case in point, all of my notes live in Obsidian, but my to-read pile is on Instapaper, and my annotated bookmarks are in Raindrop. It's a burden for me to keep these systems connected and in sync with each other. There is no common language that I can use across these three apps, since tags are not applied consistently. Creating a central tag system takes time and effort, and I do not want to invest it in a system that I'm not sure I will stick with - looking at you BuJo.

Even for me though, the lure of a standardised tagging method that can be consistently applied across several apps/platforms is hard to resist. This is especially true since I don't have any tagging system in place! How's that for a twist? The only tags I use within Obsidian are the tags I assign to my blog posts, and those don't follow any set rules - it's all a spur of the moment decision I make before I hit publish. The realisation here is this: I should probably be more mindful of how I tag things. The number of tags and how generic or specific they are do matter, but I should be able to get some good mileage out of having a reasonable, reproducible system that I can use every time to determine the tags that are assigned. As always, the moderate path ends up being the solution to a problem that not a lot of people would realistically face, or even be worried about!

I'll probably spend September building up a proper tagging vocabulary and consistent rules. Whether it works or not remains to be seen. Tagging is good, if done in moderation, but the power of tags make it impossibly difficult to resist. It is in such cases, where you end up hoarding tags and getting hyper-specific, that the anxiety kicks in, discoverability drops to zero, and all the effort goes to waste. Does it sound like I'm spiralling yet?

  1. How to Use Tags

  2. The Map Is Not the Territory | Farnam Street

#Content Curation #Personal Knowledge Management