Using Analytics in Documentation

20 min readMar 2, 2023

Make the numbers thing to do the word thing more gooder.

The Feedback Loop: Happy, Healthy, Shiny Docs

As a technical content strategist, my job covers two main areas. I work as a Technical Writer (write the docs), and also as a Content Strategist (think about the wider content and UX landscape that the docs sit in). And because I come from a product background, I’m just a teensy bit obsessed with whatever we build or write meeting user needs, which means I need to continually ask users about their experience using the docs, and use that feedback.

Because I can’t always get access to people irl, and devs just wanna (have fun?) not be pestered by TWs all day, I use analytics as a rough proxy for, and to augment, user feedback about documentation sets. I think this should be standard in how docs are built and managed. Docs are ecosystems that need to be flexible and have the capacity to change (who wants to build mausoleums of knowledge?) and our users deserve better than “someone wrote something one time, so job done”. I love documentation and I care about doing a “good job”, whatever that means (spoilers: it can mean a lot of things).

People — your users — can tell you lots of things. And once you know more about how they use something, you can make better informed decisions about how to improve it. But when you can’t get people, numbers can still tell you lots of things. The key is to be organised, and also not to get carried away.

Why Did I Write This?

I actually wrote this a couple of years ago but can’t access it any more, so what you’re reading now is a fresh v2. My dad did always tell me that the best essay submissions were the ones you had to re-write from scratch for a solid argument. I’ve been thinking about (and using) analytics in documentation for the past 3+ years, and seen it increasingly discussed, so heck — here’s my 2 cents as of today.

I also found I was having recurrent conversations with new clients about the power of data in documentation and why it’s super empowering and important to treat your docs as a product in their own right. A developer at heart, I wanted to put this info in one single util/post and just point people at it when the conversation next came up. Perhaps someone else will find it a useful precis to share with their boss/team/whomever (well, probably not short enough to be a precis, but you get the idea) which would be nice.

Lastly, there’s the matter of why I’m writing an opinionated blogpost very much in my tone and voice, rather than a nice quick-skim, vanilla-lexiconned superficial tutorial to glean maximum SEO points and be A Textbook Resource Of Note. As of right now (Spring 2023), many pockets of the content world are quietly freaking out about the potential for ChatGPT and its Wikipedia-flavoured copy to, among other things, “replace” content writers of any type. Apart from the obvious rebuttal (“Good! Humans shouldn’t WANT to sit behind a computer their entire lives! Let AI do the toil so we can focus on other things and work LESS!”) I’m not particularly worried about AI replacing someone as weird, salty, and bizarrely-skilled as me. I love reading opinionated blog posts! I will try, however, to adhere to a kind of existential style guide and not write using niche language or use cultural references that are accidentally exclusionary, citing refs wherever possible. If you spot something, tell me.

So, reader discretion is advised. This is a post about how I use analytics in documentation. It’s not written to be a How To, or a You Should, though I’m always very open to having my mind changed or horizons expanded. I believe this area of the field is both nascent and also fluid — approaches that work in certain orgs or businesses might not be suitable for others. This is part of the fun!

Takeaways

What I hope you take away from this blog post:

An understanding of how to leverage quantitative data,
In conjunction with myriad other professional tools in your skillset,
And create/deduce insights about your documentation
That you can use to drive improvements in your documentation
And measurably “prove”* your docs are “good”

(*heavy inverted commas because this will not be a scientific proof but then again what even is that).

My hope is that, by looking at how I’ve been doing it, you’ll be able to do one or both of the following:

“OK cool, I understand how Lucy got from A to B to C, I think I can apply that reasoning in my own work”
“OK cool, I see the reasoning, but I disagree with how Lucy does X. I’m going to use analytics, but do Y instead.”

Either or both of these would be delightful outcomes and I would sincerely love to hear what you get up to. The more people I have to get nerdy about this stuff with, the better.

Analytics != Metrics

Just a quick one, these words aren’t synonyms. Metrics are the data that you collect. Analytics are the insights you get from the numbers. Example:

Metric: Lucy has 15 bikes.

Analytics: Lucy has more than the average number of bikes per person in the UK, and definitely too many to ride at once.

Further business intel, based on the analytics, which is based on the metrics: Lucy probably likes bikes. Lucy is probably an easy person to sell bikes to. Or potentially buy a bike from. More investigation needed. (Which is where your qualitative data (“Lucy, why do you have so many bikes?”) then comes in, and fills a lot of contextual gaps).

Show Me a Dashboard Already

The next section is organised into chunks: “business questions” about documentation usage. This springs from two things:

A personal loathing of pointlessly busy analytics dashboards. When I started prying up the lid of docs sites and trying to learn more about usage and building up analytics insights a few years ago, it drove me a bit bananas. I’d see dashboards from other disciplines or areas that had lots of busy, exciting-looking things on them, but didn’t actually feel useful (aka a really bad signal:noise ratio). Loads of little panels and charts and numbers, but nothing about people. Even worse, the dashboards were difficult to actually understand unless you knew exactly what was going on, and they felt inaccessible.
Being able to share this with people who aren’t TWs. If you’re a big ol’ documentation nerd like me, you won’t necessarily need it organised in this way — the joy of being able to look at the underside of the usage rock and see all the cool bugs of data is enough. But we’ve all had to have that conversation with people who don’t quite see the value of Technical Writing because they’re underinformed, and honestly I thought this might save you a bit of time. Metrics are what Will Larson describes as “an extremely effective way to lead change with little to no organizational authority”. Just get those silly sausages to read this post and use the saved time on something fun instead. My gift to you, xoxo, docs grrl, etc.

I appreciate that I’m splashing about in the kiddy pool of metrics and my many intelligent data scientist/BI pals are gently facepalming/generally humouring me while they’re in the big pool. However, when I’m building a basic docs dashboard that’s designed to be informative and shared internally, I’d ideally like anyone to be able to view it, regardless of role in an org or technical background, and be able to understand what’s going on.

Alright. Here are the chunks I tend to split dashboards into:

Who is using our docs?

I always start with this one. I see it as an important exercise in equitable documentation.

People who have a professional focus on output (rather than outcomes) tend to fixate on “page hits” and “unique visits”. While these are certainly worthwhile metrics to give a ballpark understanding of whether your docs are being used, I really urge you to take the time to understand the “who” before you start thinking about the “how”, because it then helps you really grok the “why” — the key to knowing your audience. Once again, this ties in with product management and building up an audience/user matrix before you start haphazardly building shit.

If you understand peoples’ motivations, common frustrations, the “absolute must have for docs” or “content type they really rely on”, you are no longer guessing when making improvements. Ideally this is built up through first hand interaction, stakeholder interviews, being part of the community and conversation, but it’s possible to glean small details from analytics too. At the end of the day, the focus of this chunk is “Who uses our docs? What can we find out about them to better support them?”.

For this, I usually build sections of the dashboard that answer specific sub-questions:

What timezone are they in? (If we roll out big changes, better to not do it at peak usage time)
What region are they in? (You can start to look into engagement rates on the docs per country, and this can assist about where to focus content and community efforts in future)
Repeat vs unique visits (repeat visits is an interesting one — a high volume here makes me think the product UI isn’t super intuitive so people are relying on the docs for the handhold)
Operating system, machine/device, other hardware details (I want to be able to point to this and say to our developers “46% of our users are on Windows machines, so yes, I do need you to provide the steps for that procedure on more than just Linux”)
X:Y:Z split on desktop:mobile:tablet (Ibid — I want to be able to show actual evidence if I have to say “only 1.8% of our docs users are on mobile and we are under a lot of strain, so right now, we aren’t going to test the docs on mobile devices”)

All of this helps build up an informed, up-to-date set of user personae.

A tangent on language choice, grammar, and syntax: As both a linguist and an anthropologist, it’s always been of note to me that English is the lingua franca of the software programming world. It does follow in a sense that English would also be the predominant language used for software documentation, but that hegemony doesn’t need to be the end of the story.

Steven Roger Fischer’s truly superlative book A History of Language points out that the global expansion of English has resulted “in the creation of International Standard English, the world’s primary language of bilingual speakers. In numbers of first-language speakers, English is second only to Mandarin Chinese. The international growth of English has been unparalleled in world history. With the advent of International Standard English, a veritable world language has nearly been achieved for the first time.” However, Statista predicts that there will be 28.7 million developers worldwide by 2024 and that “much of this growth is expected to occur in China, where the growth rate is between six percent to eight percent heading up to 2023.”

I could write an entire blog post on this topic alone (let’s be honest, I probably will) but for now I’ll just say that I do not believe it is good practice to have your docs written in just English (US or UK) if you know from your data that many of your users are non-native speakers. Perhaps it’s just the behumbled contemporary white Western native English-speaking Technical Writer aiming for a more equitable post-colonial future, but there is absolutely no reason not to put energy into leveraging i18n (internationalisation) aka allowing your docs to be translate-able. This is easily possible using either jazzy plugins/libraries or SSG-built-in tools like Docusaurus or Hugo, and community effort. For instance, if you work in an industry where product adoption is high in eg. LATAM, it might be a very positive move to ensure your docs are also available in Spanish and Portuguese (stretch goal: Dutch for Suriname and French for French Guiana), meaning native English proficiency isn’t a prerequisite for getting going with your company’s software. This is the epitome of “understand who our docs users are, and learn how we can support them better”. This is part of your job in providing “great docs” and docs experiences.

How are the docs used?

Now we’ve looked at the who, we can look at the how.

This includes:

Time on site (total)
Monthly users
Most visited pages
Least visited pages
Pages with longest and shortest visit duration
Average daily usage (either in mins or visits) and time of day (relative to timezone)
User flow path

From these questions, I can start to piece together little threads, thin narratives, that I can weave into a usage tapestry and be able to tell the story to other people in the business, and make informed decisions where appropriate. For instance:

“These are the top 5 pages with the highest visits, so these are the ones we’ve put in the “Quick Links” or “Popular Resources” section on the docs landing page.”
“These are the top 5 pages with the highest visits, so these are the ones we’ve made it a priority to ensure each month are definitely up to date.”
“These are the 5 pages with the fewest visits (sometimes none at all, for months and months) which we used as a prompt to check the inbound links, whether the feature was still live, and whether the detail is still accurate.”
“The user journey to get to the Storage API page seems straightforward. People land on the docs homepage, and go straight to the Storage API page using the “Quick links” section. Once there, they spend about 10 minutes on the site, then exit. In comparison, the Calls API page seems a lot more difficult to find. People tend to visit an average of 8 pages before landing on it, as it’s not in the “Quick links” section.”
“Comparing month-on-month, the visits for the “Tutorial to do X” page have grown a ton — let’s use that as a prompt to see if there’s scope for additional, similar tutorials, or whether that one can be revamped/improved.”
“X page has a dramatically longer visit duration than other, similar pages. Let’s use that as a prompt to review whether the information is confusing, difficult to parse, or inaccurate.”

Are people able to find what they need?

This is arguably one of the central documentation “success metrics” on a philosophical level (more on success metrics in “Understanding Docs as Product”). Occasionally I do use the following as a very basic proxy:

# clicks to access popular pages — if this is low, I’m happy. It shouldn’t take 7 clicks to get to something when people want to access it in 1.
Scroll completion on popular pages. This helps me understand: Does the content even get read? People don’t read docs like they read books, cover to cover — usage is idiosyncratic, users often dart around, they’ll search, they’ll use Find On Page for keywords. They often combine “reading your content” with “understanding the structure of your docs” simultaneously, which can lead to a visually frenetic usage pattern. If your pages are split into modular chunks, scroll completion can be useful. Heck, might even tell you if you have a rubbish page that infuriates users in the first few sentences and they rage quit the tab. You gotta know.

Documentation quality

This one is dealer’s choice, subjective per user, and easier to understand (imo) qualitatively, because a lot of things that people associate with knowledge quality can’t be measured easily with numbers (accuracy, completeness, usability, clarity, consistency — again, read more in depth in Docs as Product). I like to look at:

How reliable is the hosting of our docs? (I’m looking for recorded 404 events or other 4xx or 5xx errors)

And some things you can get from GitHub Insights:

What is the update frequency? (Broadly speaking, I find updates “small and often”, DevOps style, to be a proxy for better documentation than docs sets that are consistently neglected — unless, of course, your product doesn’t have a lot of updates)
How many contributors are there? (I won’t pretend this is a simple one — having multiple contributors can go either way, depending on a lot of other factors)

When thinking about “quality”, please don’t just take “Grammarly-style line editing” as your sole gold standard. Try not to be salty with contributors who don’t automatically adhere to your style guide. For one thing, there are linters for that (special mention to Woke) — another thing you can automatically see metrics on! For another, while inconsistencies in grammar and syntax can impact a reader’s trust in your docs, remember language is all in flux (and literally all made up). Fischer again with the mic drop: “Samuel Johnson, who attempted in the 18th century to write the first “complete” dictionary of English, declared his goal was to “redefine our language to grammatical purity and to clear it from colloquial barbarisms”. Johnson was of course doomed from the start, since there is no such thing as a “pure” language. For English in particular, of the 10,000 most frequent words, only 31.8 are inherited Germanic, with the remaining consisting of 45% French, 16.7% Latin and several minor contributing languages.” Personally, I would consider “ease of contribution process”, “assistance and support from TW team if required”, and “automation of key CICD processes” to be a better indicator of that multifaceted gemstone, “quality”.

Why Bother, Though?

OK, you’ve stuck with me so far. It takes a bit of effort to set up. Why should you consider it?

It makes it easier to make smart decisions at a business level

Looking agnostically at analytics for a moment, I like the way Commsor look at the benefits:

See how your existing efforts are performing
Identify where there’s room for improvement
Evaluate whether the current impact aligns with business goals

Kumar Dhanagopal, Technical Writer and Staff Solutions Writer at Google, also did a great talk at API The Docs recently looking at analytics. Though analytics on their own can’t provide all the needed information, that knowledge “can help to improve the quality of decisions that a company makes when they plan to write and maintain documentation.” So while it doesn’t tell us how users feel about or experience the docs, it can at least show us how users interact with the docs, and how we can make better-informed decisions and prioritise future docs additions.

There’s also a potentially wider implication, which I’m going to shamelessly lift from Accelerate, and apply to this conversation:

“By helping the industry identify and understand the capabilities that actually drive performance improvements in a statistically meaningful way — more than just anecdote, and beyond the experiences of one or a few teams — we can help the industry improve.”

Wouldn’t that be nice?

(This is also a great place to signal boost the communities of both Write The Docs and API The Docs, whose regular sessions I highly recommend.)

It matters to your users

Jeff Lawson said that validating design decisions with users “can be the difference between a successful and unsuccessful product launch.”

As aforementioned, if you’ve read my post on understanding documentation as product, you’ll know I feel strongly about writing documentation that meets user needs. So how do you know what your users need? Users aren’t homogenous: they vary between companies, communities, products, product awareness or utilisation, technical (or other) experience, motivation. You need to have some kind of data-driven, not-just-your-opinion insight into who they are, what’s important to them, and how your docs can best serve them. If you want to have good docs, you need to know (not guess) that you’re building the right knowledge centre for them.

It makes your docs better

Using analytics is a fairly passive way to try and get a more representative sample. As Lawson points out, while many companies have NPS surveys to tell us the cumulative effect of our actions on customers, not many deploy NPS for docs, and besides — people who submit NPS feedback often have strong feelings (of either variety) which might not be representative of the “typical” customer.

It’s also a way to attempt keeping docs “good” using any data you can get your hands on. This can be helpful when docs are a good way to bridge the gap between the “development tenure half-life” of 3.1 years and the “code half-life” of 13 years (data from the Sixty North blog as quoted in Living Documentation).

It matters to you (if you’re like me)

Using analytics is a sometime-stand-in for talking to your users. In a way, it’s an automation that reduces toil — I love the Docs for Developers definition here: “toil isn’t just “work you don’t like to do”; toil has a specific definition in the world of software engineering: “Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows” (originally from Site Reliability Engineering: How Google Runs Production Systems). This automation can be the key to unlocking genuinely new and exciting insights, which hugely motivates me professionally. Ask Your Developer makes a great point about how different this opportunity is to how humans are generally brought up, ‘scuse another longish quote:

“A big difference between an open, learning environment at work and the one we experienced in elementary school is that in school, the teacher knows the answers but shows students how to do the work to arrive at the answer on their own. In business, especially when you’re working on the cutting edge of technology, you’re not looking for an answer that someone else already knows. The business and its employees must find answers to questions that have not been asked before. But an open, learning environment provides the way to find those elusive answers.”

I love being able to learn new things from what the data tells me.

I am happy to yolo decisions — I believe in the “lol what happens if I press this button” school of learning — but I also really like doing a bit of investigation and making decisions I believe in. I like “Doing A Good Job” whatever that means at the time or in context. Writing good docs, having quality docs, maintaining healthy docs — these are all part of a broader holistic thing for me. Yeah, I like consistent syntax and grammar. It’s neat. But I also like knowing the user experience is positive for users, and I like being open to changing it up. This isn’t all that surprising, as I studied a lot (a loooot) of Chomsky at undergraduate level and (another Fischer ref) Chomsky believes “that linguistics, psychology and philosophy are no longer to be held as separate disciplines but comprise a unitary system of human thinking that should be understood as a larger whole.” Analytics is one part of the beautiful, interesting, interdisciplinary constellation of something I’m delighted to get paid actual money for as a profession.

Caveats: A Cautionary Tale in 4 Parts

Everything in moderation, including analytics

I remember first coming across analytics usage in exotic (at the time) places: in healthcare. When I was doing my MSc in stroke rehabilitation and became increasingly interested in the present and future of digital public health, I read a book by Robert Wachter aka The Digital Doctor. He politely and carefully hammered it home that:

“It will take great discipline and all the professionalism we can muster to remember, in a healthcare world now bathed in digital data, that we are taking care of human beings. The iPatient can be useful as a way of representing a set of facts and problems, and big data can help us analyse them and better appreciate our choices. But ultimately, only the real patient counts, and only the real patient is worthy of our full attention.”

Replace “patient” for “user”, and once again (this must be getting boring by now) you’ve got the central tenet of product management. We need to ensure we’re looking at (or at least thinking about) a holistic person behind that user interface, rather than making reductive inferences based on scraps of data, and weaving together a narrative where we can. Stories are, as Brene Brown says, just data with a soul. I really like thinking about the borderline-Cartesian “data with a soul” idea because it reminds me of the power of narrative in knowledge and data. According to Trenton Moss, a Stanford University research study “showed that statistics alone have a retention rate of 5–10%, but when coupled with anecdotes, the retention rate rises to 65–70%. That’s an increase of up to fourteen times,” and research from cognitive psychologist Jerome Bruner says we’re twenty-two times more likely to remember a fact if it’s told in the form of a story. I’m really keen not to reduce people down to numbers, partly because it’s so much more interesting and memorable to try and find the bigger picture.

Be wary, though, dear reader — too much focus on the complexities of the human psyche when interacting with docs analytics can have consequences just as dire. See Christian and Griffith’s “Algorithms To Live By”:

“When we need to make sense of, say, national health care reform — a vast apparatus too complex to be readily understood — our political leaders typically offer us two things: cherry-picked personal anecdotes and aggregative summary statistics. The anecdotes, of course, are rich and vivid, but they’re unrepresentative. Almost any piece of legislation, no matter how enlightened or misguided, will leave someone better off and someone worse off, so carefully selected stories don’t offer any perspective on broader patterns. Aggregative statistics, on the other hand, are the reverse: comprehensive but thin. […] A statistic can only tell us part of the story, obscuring any underlying heterogeneity. And often we don’t even know which statistic we need.”

Don’t go nuts with the dashboards, ok

More dashboards (or more panels in your single docs dashboard) does not necessarily mean more meaningful insight. As Michael Bhaskar, maestro of curation, tells us: “this is where the value of curation starts to become clear. In a world of too much data, having the right data is valuable. In a world where we don’t have any time, choosing the right thing to do is valuable. […] The Long Boom means there is more of everything, whether data, debt or doughnuts. It doesn’t mean life is easier or better. In an overloaded world, the locus of value is shifting.

Don’t capture what you don’t need, or won’t use. Not only is this a waste of your time and dashboard space, I reckon it probably infringes on things like GDPR. In Ethical Business Practice and Regulation, they cover how the Information Commissioner lambasted DeepMind — “there’s no doubt the huge potential that creative use of data could have on patient care and clinical improvements, but the price of innovation does not need to be the erosion of fundamental privacy rights.”

Before you begin: Proper Planning Prevents P***-Poor Performance

If you’re going to implement your own dashboards and process, remember to plan what you’re going to measure. I’ll lean on Docs for Developers again here, as they do a better job of covering a higher-level strategy than I’ve covered here.

To create an effective analytics strategy, clearly define the following:

Your organization’s goals and how they’re measured
Your reader’s goals and how they’re measured
Your documentation goals and how they’re measured

Don’t take it too seriously. You could well be wrong

If there’s one thing I took from Edward Snowden’s book Permanent Record, it’s the scope and power of what analytics can tell you: “in sum, metadata can tell your surveillant virtually everything they’d ever want or need to know about you, except what’s actually going on inside your head.”

I might be completely wrong about all of this (or, more dangerously, partly wrong, and no one can agree on what… but such is the joy of an opinion piece). There’s also a lot to be said for GIGO: Garbage In, Garbage Out, even with a jazzy dashboard. Be careful to have clean data wherever you’re making inferences from it. This is, coincidentally, why the Royal Society (one of the earliest scientific academies, founded in 1660 in London) took as its motto “Nullius in verba”, which means something like “Take no one’s word for it”, and why Hawking always said that the greatest enemy of knowledge is not ignorance (it is the illusion of knowledge).

Be careful not to fall foul of the “Ladder of Inference”:

The Ladder of Inference is an excellent tool for helping people distinguish what they think from the larger reality around them It shows how people select certain data out of an almost infinite pool of available data, make assumptions and draw conclusions based on the data they select, make recommendations and take action based on these conclusions, and then look for new data to reinforce their original assumptions.

Conclusions

I can, however, temper these remarks to end on a positive by using a classic Deutschism: don’t downplay or stigmatise your thoughts and hunches too much:

“The real source of our [human] theories is conjecture, and the real source of our knowledge is conjecture alternating with criticism. We create theories by rearranging, combining, altering and adding to existing ideas with the intention of improving upon them. The role of experimentation and observation is to choose between existing theories, not to be the source of new ones.”

I’ll leave you with something really close to my heart:

Thanks for reading. I really appreciate it.