Mozilla Hubs, Spatial Computing

On Asynchronous Communication & the Value of Chat

The concept of ‘time’ as a resource has come up in several conversations that I’ve been a part of recently, and a recent discussion led me to think more deeply about time’s role in communication. Last year, I gave a talk at SkillsMatter exploring social technologies and online communication. You can break down technological communication services across a couple of different axes, but today I’m going to write a bit more about (a)synchronicity.

In particular, I’ll explore the role that time plays in how we communicate, and in the ways that technology consciously or unconsciously changes the nature of those conversations. And, to take that a step further – I’ll start to address why it matters for mixed reality experiences. 

Let’s start by defining our spectrum. It may be easy to think about synchronicity as a binary – something is either “happening at the same time” or “not”. In practice, the amount of time that we’re waiting for an asynchronous event to fire plays a huge role in defining the context of how we receive a message, and, therefore, the message itself. 

Say I get a text message from my partner, telling me that they’re about to get in line at the grocery store. It’s sent via text, so I may or may not see it right away. We live about five minutes away from the grocery store – it’s very likely that my partner would get home before I see the message, especially if I’m doing something where my focus is elsewhere – for example, the shower. 

Time is an important part of that interaction – if I need something from the store, and I get that message immediately after it’s sent, I can ask my partner to pick it up (maybe I’ll call them if I really need it, so I choose to opt for a synchronous form of communicating that). But if my partner returns home before I see the message, then that message is now essentially useless information to me. 

There is value in time, and delaying the process of sending or receiving information to another person. It’s interesting to note, then, that technology seems to be aggressively nudging us into constant synchronicity – always on their platform, ready to receive and respond in the blink of an eye. 

To that point – synchronicity is not bad! But a balance is necessary. Our brains did not evolve for constant, incessant communication threads where we have no time to sit and think our own thoughts. Yet, left to companies like Facebook, that is what we would do – enable push notifications for every message, every ‘like’, every comment, every time your friends shared something themselves.

So, back to this spectrum of time. Some forms of communication these days expect synchronicity – perhaps your work day feels increasingly busy, but with less substance – because you spend so much time responding to Slack, whose design encourages synchronicity. Or, maybe your company has a culture where email is preferred, which tends to be a bit slower than platforms like Slack, Microsoft Teams, and Discord. It’s weird when you ping-pong emails back and forth like it’s an instant messaging (IM) platform, right?

In the past, the synchronous vs. asynchronous divide was more easily differentiated. You could signal an availability for synchronous chatting with an IM client, or you could choose to give yourself and your intended audience the gift of time to respond with email. You were generally choosing to be present, on the phone, or on the computer – it wasn’t the consistent presence that we have today with our cell phones, which can alert us to everything under the sign to be notified for. 

Coming back to balance – I think we’ve gone too far down the side of synchronicity. When we are synchronously communicating with the various parties in our lives, every moment of our day, we are losing time, especially when these behaviors are subconscious and lacking intentionality

Right now, mixed reality is in an interesting spot – it is a tool for connecting us in a more present, synchronous way – but the hardware is isolating and still just complicated enough that you know you are investing time into the experience when you put on the HMD. That’s the problem with virtual world technologies – for the time being, at least, there isn’t a way that we can find a happy place between purely synchronous and stretches of time between interactions. The cognitive load to switch contexts of being in a physical environment and trying to be (at least partially) present in a virtual one is overwhelming – even when that virtual “environment” is a stack of camera feeds. Most mixed reality platforms don’t tackle this spot, so what has emerged is a hybrid platform such as Discord taking on the bulk of the “in-between” communications for virtual worlds. 

I could write for a while on this topic, but I want to bring it back to a very particular type of communication: the chat stream. Specifically, I want to shed some light on the oft-underutilized functionality of text chat in online platforms. I’ll start with a short story

Once upon a time, I worked at a virtual reality startup. This company’s platform allowed you to communicate as avatars via voice chat and explore magical worlds created by users. However, the platform did not allow you to text chat with people as a native function of the application. There was no way to chat between different virtual worlds; the chat apps that did exist were user-created and subject to the platform’s ever-changing APIs, and, probably the most fundamental – the people in charge just didn’t believe in text chat as a necessary feature for a virtual world, so it wasn’t built.

Yes, text chat inside of VR spaces is hard. It’s probably not going to be a preferred method for input for quite some time – but to write it off entirely erases a huge potential of complementary, rich forms of communication. It also is a key feature for accessibility, and for many people, chat is far and away a more comfortable way to engage with a group virtually.

Recently, I’ve been on a lot of video calls. I’m guessing many of you have, too. While a subset of this is likely related to the intentionality of these calls and experience that the hosts have on moderating virtual meetings, I’ve observed that the calls where the chat is actively monitored and addressed as an equitable form of participation are more nuanced, diverse in their participation, and open. Twitch, an online game streaming platform, has embodied this practice to an extreme, essentially forcing an asymmetric, slightly asynchronous flow of the communications between the streamer and their audience. And it works

Coming off a Zoom call where chat wasn’t equitable, I started thinking about why Hubs, the platform I work on at Mozilla, feels more equitable in chat by default, even when the intention of moderation isn’t explicit. I’ve come up with a few hypothesis, which I’m interested to learn more about and understand better: 

  • Chat messages in Hubs are temporal and visible by default. This is different from other video platforms, where chat is hidden by default, and the history is visible throughout the duration of the call. Because the messages are visible to everyone in the room in Hubs, in the same place where you are viewing the rest of the content, perhaps this leads to the chat messages feeling more “included” in what you’re experiencing at a given time
  • Spatial conversations seem to naturally facilitate more equitable conversations with less structure – or, perhaps, that’s just how I use them / who I use them with. That said, there seems to be an interesting thread where spatial communications provide more flexibility in terms of who is speaking, so the participants are more primed to expect that someone else will “speak”. In the context of Hubs, this can be seen in the behavior where the person who is currently speaking will often read the chat message out loud, which naturally opens up a space for the chat sender to have their idea heard by the group and step in when there is a natural break
  • The mental space that is released by providing a shared environment with other conversationalists seems to open up a part of the brain that is more susceptible to managing multiple streams and types of conversation. With video calls, there’s a massive amount of subconscious processing happening as we attempt to decipher facial and physical cues of everyone on video, as well as what audio is being spoken. With avatar-based spatial chat applications, that lack of detail and facial expression can be an equalizer by reducing cognitive processing required and making space for other forms of shared knowledge or explicit content. 

I don’t think that all of these things are unique to spatial communications that include chat; rather, they seem to be more of a curious byproduct of mixing and matching technological solutions to remote conversations. As demonstrated by today’s current climate, it is only going to become more important that we as an industry think about collaborative and communicative technologies outside of their underlying features, and more about the nuance of different types of communications and their purposes. Within that, I would hope to see more of a focus on encouraging innovation in the asynchronous and asymmetric areas of communication platforms, because that space and time gives us the freedom to truly think and respond, rather than react.