How HCI Might Engage with the Easy Access to Statistical Likelihoods of Things


“Everything that needs to be said has already been said. But since no one was listening, everything must be said again.” —André Gide

Together We Are Likely

Context and Creativity

Broken Intuition and Broken Metaphors

The Role of Human-Computer Interaction Research

tl;dr

unintuitive statistical likelihoods of language and vision are now readily available via API; HCI has a vital role in understanding and scaffolding human interaction where our intuitions fail

Together We Are Likely

chatGPT didn’t write this, because if it had I wouldn’t have had to write it. I would have missed out on the iteration, reflection, and wrestling with ideas that writing this manually required me to do over the past six months. I began writing this in November 2022, several decades ago in LLM years.

Lest I set your expectations too high, let me assure you that even 9 or 10 full months of human wrestling still produces prose that is likely. We share too much context, and too many equally capable people are wrestling with the same questions. Oh to be Darwin or Mendel, happening alone together upon a similar context that would lead them to joint paths both statistically likely given priors and also revolutionary.

My hope is thus only that this post is slightly less likely than one written by chatGPT trained on data that is two years old. Soon enough, it will be gobbled up into the training data and chatGPT will assimilate whatever iteration on cleverness this post contains, steadily marching along with humanity into our statistically likely future.

Speaking of several years ago… in 2016, I tweeted:
tweet snapshot from @jeffbigham, reads, 'despite all my rage, I am still just a statistically probable sequence of observable online behaviors'

I thought many of my tweets (toots, now) were clever, but from the macroview of Twitter I was tapping out statistically likely characters, just like everyone else.

Ten years ago, Target used the statistical likelihoods computed over its massive customer data to correctly
predict that a teenager was pregnant before she had told anyone, using a “pregnancy prediction score” cobbled together from statistical trends on data like how much unscented lotion a person was buying. It was shocking at the time, in large part because it was surprising that it was possible, and perhaps separately surprising that Target didn’t stop to ask what bad things could happen if they predicted stuff like this.

We are all statistically probable, even Target’s data science team. If they were implicitly predicting pregnancies to target ads, and didn’t think too much about what could go wrong, so too were lots and lots of others predicting things in whatever domains they happened to work in. If a single prediction was surprising and the fallout unintuitive to many, imagine what happens when you train on an unfathomably large corpus of natural language text, which is so core to what we think of as human intelligence, and don’t just make one prediction for a few queries, but rather predict thousands of tokens based on hundreds of words of input.

Human intuition completely breaks down.

I think that is why experts and novices alike can’t quite wrap our heads around what is going on. Each statistically likely token boggles our minds, and there’s hundreds and thousands and millions of them, produced reliably in sequence in response to whatever language we give as input. Is it so surprising that even the experts can’t agree if this is AGI or exponentially-scaled babble, the end of humanity or
merely a huge disruption to information work. And, maybe not even that!

Predictably, those arguing for something in the middle don’t get much attention.

If you cross all of this with a healthy dose of humans also being enormously complex and nobody agreeing on what intelligence is, I think you get close to understanding the current state of things. But, this intersection of poorly understood technology and deeply intertwined human interactions is where I think HCI has an enormous opportunity, if we are able to take it on and lead.


Context and Creativity

the book 'Steal Like An Artist' by Austin Kleon

If all these models are doing is producing statistically probable language, why do they sometimes seem creative? I think it’s a combination of humans driving and curating AI outputs, but also another example of our intuition about creative production breaking down with large data. My intuition is no better on this than everyone else’s, so I’ll give the caveat that maybe there really is some emergent phenomenon happening.

Austin Kleon (who happens to be my cousin, what are the chances?!) has a great series of books starting with “Steal like an Artist”, which has the premise (my version) that oftentimes folks are so obsessed with being completely new, they fail to realize that all human creativity is a collective iteration. We all “steal” from each other, nothing is “new” even when it is new, embrace it.

At the beginning of the pandemic, I made the video below, which got a few thousand retweets. As it spread, at least one person complained that I had copied the idea from someone else.

with @zoom_us 's new video background feature, i can stay engaged and nodding thoughtfully, even when i'm not even in the room -- this is the kind of innovation we need! pic.twitter.com/J4prwOEjxp

— Jeff Bigham πŸ‚πŸΏπŸπŸπŸŒ»πŸŒ»πŸŽƒ (@jeffbigham) March 22, 2020

I don’t mean to argue that people are stochastic parrots, or that we shouldn’t fairly compensate the artists that large models learn from (and arguably steal from). But, even as we exercise our agency, we are probable. Distinguishing parrot from probable is difficult and maybe ill-defined, and the difficulty in doing so may also be at the crux of why our intuition breaks down when confronted with large language models.

A couple of years ago, I was surprised to learn that almost everyone I knew was watching reruns of The Office when I was also watching reruns of The Office. But, if I am doing something, it is very likely that others are doing that same thing because of our increasingly shared context (likely, a confluence of Netflix getting the rights to the show and heavily promoting it, and The Office memes simultaneously spiking on social media.)

People who are surprised that someone else is also watching reruns of The Office on Netflix, despite it being the statistically likely thing given the hidden context we share, will definitely be surprised by an LLM that writes a convincing holiday poem in the style of an Office script.

Stochastic parrots in isolation aren’t so intelligent, but when the parrots are integrated into humanity’s distributed cognition they might be a surprising multiplier. Or not.

Broken Intuition and Broken Metaphors

An LLM can create a holiday poem in the style of The Office for essentially the same reasons as why Target was able to predict that teenager’s pregnancy and for the same reason we were all rewatching The Office (there are lots of holiday poems and writings about the The Office in the LLM’s training corpus). Each token output is statistically likely given the prompt and the token preceding it. Or, maybe that doesn’t entirely capture it, experts don’t exactly agree. Yet, this is much harder to grock because it’s not just one prediction dependent on a small number of describable input parameters, it’s 100s of predictions back to back dependent on inscrutable inputs represented in natural language.

These prediction cascades, where many predictions and statistical likelihoods are brought together token-by-token many times in sequence, don’t even seem like predictions. The size of data completely escapes even the experts’ intuition, and we are not accustomed to seeing machines produce plausible language of this length, fluency, and complexity. In the months since I began writing this, the idea that machines can do this seems much less novel, and perhaps we are starting toward disillusionment in the discrepancy between what we imagined based on what we first saw and the reality sinking in as we’ve seen more.

The best models are also massively fine-tuned and shaped by humans, in scales and methods that are purposefully held opaque. Human input is the secret that led Google to originally win search and maps, even as competitors searched for better algorithms. If OpenAI is talking openly about instruction tuning and RLHF, I’d be intrigued to learn all the human work that they’re not talking about. How far can a statistically scaffolded massive human labeling enterprise get in mimicking intelligence? More than that might be happening, but also I suspect it can (and has) gotten pretty far.

Data at the scale LLMs are trained on is so mysterious, we don’t have metaphors for describing how that data is transformed or not into the model’s outputs. Is it copying its training data or learning from it? Kind of both, and also kind of neither. It seems as though at least some of the amazing power of image diffusion models may come from finding and very slightly modifying images in the training data, but copying alone doesn’t capture the fullness of what is happening. The models are so complex it’s hard to convincingly argue against the idea that there are some emergent properties beyond what we might think of as just next token prediction (it’s also hard to convincingly argue against the idea that it’s just next token prediction at a scale we haven’t seen before).

Our intuition and metaphors have fallen short, which also makes it magnificent and also scary. Lacking effective metaphors, broken metaphors tempt us (e.g., LLMs are sentient, LLMs are AI Hype, LLMs are evil, LLMs have emergent intelligence). We’re so eager to intellectually frame these things that we too readily disregard the obvious problems with each of these framings.

What’s the right metaphor for massive statistics over language, constantly massaged by humans, interpreted by humans, iterated on by humans, and curated by humans into something that is legitimately impressively different than what we’ve seen before, even while limited in predictable and unpredictable ways?  Predictably, I don’t know either.

The Role of Human-Computer Interaction Research

I look at this from the perspective of what human-computer interaction can contribute. How can we help humans thrive in a world in which there is easy access to unintuitive statistically likely things.

I bucket the opportunities into - Benefit, Understand, Protect and Thrive

Humans have never before had such easy access to statistical likelihoods of things, and interestingly the least predictable part about all of this is how it might upend the many things that humans do and care about (or not). My personal bet is that this all will be both more transformative than we think, and also less. In ten years, I expect we’ll be using computer interfaces that would be familiar to someone 2023, yet many of the interactions that don’t quite work fluidly today will work much better – LLMs etc will be the lubricant that helps remove friction, and we’ll get to spend less time fighting with our computers and more time on creative human production.

I believe most of the directions above hold even if we somehow qualitatively move beyond statistical likelihoods and closer to something like AGI … we interact with intelligent humans all the time, and yet nobody would describe that as simple, let alone solved. We’ll still need innovations in HCI to allow humans to benefit, understand, protect and thrive. In a world in which we’re interacting regularly with “AGI”, we’ll need even more innovation in HCI.


Yet, futures in which humans and human concerns are at the center are not the only possible outcome – we will need to make it this way! HCI as a field is core to not only understanding but shaping how this develops, because we are the only field positioned at the intersection of what matters – our work understands people and technology, designs futures that reimagine how people and technology will work together, critically reshapes narratives around benefit and harm, and ultimately builds interactions that enable people to thrive.

chatGPT felt almost like the end when it first came out – “we’ve done it!”  But, we haven’t done it at all. We’re still very much at the beginning of what interaction will look like in the era of easy access to statistical likelihoods. HCI needs to lead the way.


This page and contents are copyright Jeffrey P. Bigham except where noted.
Blog posts are not intended to be final products, but rather a reflection of current thinking and/or catalysts for discussion, like tweets but longer.