The Message Is Multimodal: Acoustic Space, Disability, and the Present Future

Abstract

Disability communities have enacted McLuhan’s media theories for decades, revealing communication as inherently multimodal, mediated, and distributed. Through assistive technologies that routinely become universal infrastructure, the essay shows how disability both confirms and unsettles McLuhan’s framework, and why the future of media is already being prototyped at the margins.

Keywords

Human-Centered AI, Embodied Technology, Multimodal AI, Ethical AI, Accessibility

Every modality has a station; together, they form the network. AI-generated illustration by the author, 2026.

In July 2024, Representative Jennifer Wexton stood at the lectern of the U.S. House of Representatives, holding a tablet. When she tapped the screen, her voice filled the Chamber. It was a voice assembled from years of archived recordings, algorithmically stitched together and synthesized.

It was the first AI-generated voice ever to address the House floor. In an institution that has spent centuries refining the art of oratory, perfecting how an argument is shaped, projected, and carried to the back bench, Wexton’s moment redefined what authoritative speech could be.

Her voice had no single point of origin. It existed everywhere at once: in old C-SPAN footage, in the tablet’s processor, in the sound waves moving through air that had once carried Webster and Clay and Obama. Fully present, yet nowhere fixed.

Marshall McLuhan’s concept of acoustic space was coming to life on the House floor; a demonstration of a space without fixed boundaries and of the power of form over content. Wexton’s experience crystallized this paradox: a mediated environment that complicated McLuhan’s theories.” The ‘hot’ and ‘cool’ media binary seemed tempered by accessibility technologies, while the notion of amputation-through-extension faltered when considering users without prior ability.

“We look at the present through a rear-view mirror… We march backwards into the future.” — Marshall McLuhan, The Medium is the Massage (1967)

From the outside, the Wexton moment might look like a milestone for multimodal AI, one of Big Tech’s new product categories and a trend of the year. But for the thousands of people who rely daily on Augmentative and Alternative Communication (AAC) devices, it was nothing new. AAC users have spent decades assembling voices from visual symbols, predictive text, and synthesized speech, orchestrating their presence across distributed components: eye gaze on a screen, fingers on a touchpad, voice emerging from a speaker. The House had finally caught up to a future disability communities were already inhabiting.

McLuhan’s rear-view mirror metaphor captures how we are always slightly behind ourselves: early films looked like recorded theater, early websites resembled digital magazines. But he failed to understand that, sometimes, the people we think are behind us are actually way ahead. The gap between calling Wexton’s moment “futuristic” and recognizing it as a decades-old practice, reveals who gets to define what the future may be, and for whom.

“Acoustic space is organic and integral, perceived through the simultaneous interplay of all the senses.” — Marshall McLuhan and Bruce Powers, The Global Village (1989)

McLuhan described acoustic space as a place without fixed boundaries, formed by the acoustics themselves rather than the systems containing them. This contrasted with the visual space of print culture, which was linear, perspective-driven, organized around sequence and point of view. For decades, his description remained an elegant abstraction in media theory. Scholars invoked it to describe television, radio and, later, the internet. For disability communities, however, it has been a lived practice.

Picture a seventh-grade classroom in the early 2010s. A teacher asks a question about To Kill a Mockingbird. A non-verbal student reaches for her AAC device, a tablet loaded with symbols and predictive text. She begins assembling her response, tapping through layers of meaning: a pictograph, a category, a phrase, a modifier. Thereby, the device communicates her composition.

How and where does this student’s expression take form? In her cognitive processing. In the symbol database designed by others. In the algorithm predicting her next word based on patterns learned from thousands of users. In the synthetic voice rendering phonemes. In the speaker emitting sound waves. In the ears and minds of her classmates, who are learning to recognize this output as speech even though it doesn’t sound like their own.

Her message has no single origin. It emerges through interaction between mind and machine, symbol and sound, selection and synthesis. It transforms McLuhan’s theory of distributed consciousness into lived, embodied, daily practice.

The classroom reorganizes itself around this cadence as students slow the pace, taking turns to respond to the teacher, and silence becomes compositional space. Other students learn to track multiple channels at once: watching the screen, listening for audio cues, reading body language, staying present through pauses. The teacher redistributes questions to accommodate different processing speeds. Everyone in that room is learning to navigate an acoustic environment, built by accessibility.

Traditional media literacy curricula teach students to ask: Who created this message? What techniques are being used? Who benefits from this representation? These questions assume a message with identifiable origins, a creator with intentions, a text that exists independently of its reception. But in the multimodal classroom those questions become more complex. Media literacy in this environment means learning to track meaning across distributed systems, to recognize that all communication, not just AAC, emerges through technological mediation we’ve naturalized into invisibility. This is what disability accommodations teach us about media literacy: that communication has always been multimodal, always mediated, always collaborative. The form of these innovations: their multimodal, distributed, flexible structure, matters more than their ostensible content as accommodations.

“The medium is the message.” — Marshall McLuhan, Understanding Media (1964)

In the late 1960s, wheelchair users in Berkeley took jackhammers to sidewalks when the city refused to install curb cuts. They broke apart the infrastructure that excluded them and rebuilt it differently. The intervention was intended to support a small population of wheelchair users.Fifty years later, everyone uses curb cuts: parents with strollers, delivery workers with hand trucks, travelers with rolling luggage, anyone nursing a twisted ankle. The accommodation became a better design applied almost universally.

This pattern repeats across media environments. The form of these innovations, their multimodal, distributed, flexible structure, matters more than their ostensible content as accommodations for persons with particular needs. McLuhan’s most famous provocation insists we pay attention to how the properties of media restructure human experience, not only the content transmitted. Assistive technology makes this visible. For example, on-screen captions were introduced as an accessibility feature for deaf and hard-of-hearing viewers. Today, a majority of Americans under thirty watch video with captions enabled even when audio is available. Non-native speakers find them clarifying. People in noisy environments find them practical. People with ADHD find them stabilizing. Many simply prefer the experience of coordinated text and sound.

Similarly, voice-to-text began as assistive technology for people with motor impairments. Now it powers virtual assistants in hundreds of millions of homes. Audiobooks were developed primarily for blind readers. They’re now a mainstream format reshaping how literature is consumed.

Such accommodations have become infrastructure. Supposed workarounds become sophisticated design. McLuhan understood that what seems like an inferior version of “real” media often turns out to be the more advanced form. Disability communities have proved him right, repeatedly. But he has not always been right.

McLuhan’s distinction between hot and cool media—between high-definition forms requiring little participation and low-definition forms demanding active interpretation—has shaped media theory for decades. Hot media are packed with data. Cool media leave gaps that audiences must fill. The hot-versus-cool distinction maps roughly onto his broader framework: visual/hot versus acoustic/cool. In reality, disability dismantles the binary.

By McLuhan’s logic, the medium should become hotter if, for example, one adds captions to a lecture: more data, more definition, less ambiguity. Yet students consistently report the opposite. Captions don’t simplify reception—they complicate it, productively. Viewers coordinate reading and listening simultaneously; they catch misheard words, reinforce comprehension through dual encoding, and preview complex terminology. Thus the medium gets cooler. Participation increases.

The same happens with voice-to-text. Speech-to-text should be hotter than typing—direct translation from high-definition speech to high-definition writing. In practice, it’s cognitively intense. Users monitor how spoken language aligns with written conventions; how they edit in real time, and manage a hybrid output that belongs fully to neither speech nor text. The medium cools even as definition increases.

The binary logic only holds if you assume standardized bodies—people who see, hear, and process language in normative ways. Once one acknowledges how people actually engage with media⎯through screen readers, haptics, bone conduction, symbol-based interfaces—the categories collapse. Media temperature isn’t inherent. It shifts with user, context, and embodied need.

Accessibility features don’t cool down hot media. They reveal that media were never as hot as McLuhan imagined. They expose the hidden interpretive labor that normative users perform invisibly, mistaking it for passive reception.

“When you extend any sense, you also amputate it.” — Marshall McLuhan, in conversation with Barrington Nevitt (1968)

McLuhan’s most provocative claim was that every technological extension would force an amputation. Print extended memory but would diminish oral recall. Calculators extended computation but would erode mental mathematics. Cars extended mobility but walking would atrophy. Each extension reorganized the sensorium, privileging certain capacities while diminishing others. The McLuhan framework makes these seem like inevitable consequences of technological change.

Yet, disability complicates this. A wheelchair extends mobility. It amputates walking only if you assume walking is possible. For wheelchair users, the chair doesn’t amputate anything—it reveals that stairs had been amputating mobility all along while masquerading as neutral infrastructure.

Screen readers don’t amputate visual reading. They expose that visual reading is always contingent on specific embodied capacities that sighted users naturalize as universal text access. AAC devices don’t amputate “natural” speech. They reveal that speech has always been mediated—by neurology, musculature, breath control, timing, and social choreography that speaking people treat as unmediated expression. Disability theory reveals what McLuhan missed: these are not symmetrical trade-offs. They are political choices. The question is not whether technologies extend and amputate, but whose capacities get extended, whose get diminished, and who decides which trade-offs are acceptable. Disability perspectives reveal these to be design decisions that could have been—and could still be—different.

“Ours is a brand-new world of all-at-onceness. ‘Time’ has ceased, ‘space’ has vanished. We now live in a global village…a simultaneous happening.” — Marshall McLuhan and Quentin Fiore, The Medium is the Massage (1967)

At the Dawn Avatar Robot Café in Tokyo, customers are served coffee by four-foot-tall robots gliding between tables. The café posts a disclaimer at the entrance: We are not operated by AI. Real human workers are serving you through these robots. Each robot is piloted remotely by a human worker—many immunocompromised, chronically ill, or physically disabled—operating from hospital beds or living rooms across Japan. No fantasy of seamless embodiment here. Everyone knows the worker is elsewhere, and the interaction feels human anyway.

Timing, gesture, tone, and presence are all mediated, but none are diminished.

McLuhan’s “all-at-onceness” and the collapse of time and space become literal. Workers are simultaneously present and absent. They cannot physically touch customers, yet their gestures create genuine social interaction. Presence has multiple simultaneous locations. Social connection survives—even depends on—mediation.

Disability scholars recognize this as what Alison Kafer calls “crip futurity”: not the erasure of disability through cure or normalization, but a reorganization of social space that accommodates what Margaret Price terms bodyminds, selves inseparable from their cognitive, physical, and sensory experiences, in their multiplicity.

McLuhan scholars see it as his framework made architectural: center and margin dissolved, perception distributed across channels, the ratio among our senses fundamentally rebalanced.

The technology emerged from disability necessity, not media theory. Chronically ill workers needed employment access, not designers trying to instantiate McLuhan’s concepts. Theory arrived later to explain what had already been built.

“Environments are not passive wrappings, but are, rather, active processes which are invisible.” — Marshall McLuhan, War and Peace in the Global Village (1968)

Which brings me back to Jennifer Wexton and that tablet on the House floor.

Her synthesized voice possessed all the properties McLuhan attributed to acoustic environments: no fixed origin, no singular moment of capture, fully present despite existing nowhere in particular. Listeners had to participate actively by coordinating what they saw (Wexton tapping a screen) with what they heard (a voice both hers and not hers, familiar but uncanny). The medium was cool. It demanded interpretation. More importantly, it was an extension without amputation. Wexton’s capacity for deliberative speech was extended without diminishing her agency, identity, or democratic authority.

The significance lies in the institution itself. Congress, perhaps the most orally traditional, visually hierarchical institution in American democracy, reorganized its acoustic space to include a voice it had previously excluded because the institution could no longer sustain the fiction that this mode of communication didn’t count.

The moment changed what Congress was willing to recognize as speech. Marginal practices become mainstream when institutions can no longer deny their legitimacy.

McLuhan wrote that we cannot see media environments from inside them—only in retrospect, or from the margins. Disability offers that marginal perspective through friction. When media environments fail your body, you see their architecture. When communication channels exclude you, you learn their construction. When infrastructure breaks, you build alternatives, and in doing so you reveal that what others took as natural was always a set of choices.

McLuhan taught us to pay attention to media environments as environments: to notice how technologies reorganize perception before we consciously register the change. “The medium is the message” remains his most powerful provocation precisely because it is so difficult to internalize. We want to focus on content. He kept insisting we look at structure.

The study of disability teaches us to pay attention to margins, to notice how exclusion produces innovation, how constraints generate creativity, how bodies dismissed as “broken” navigate media environments with sophisticated fluency. Together, these frameworks suggest a methodology: if you want to understand where media environments are heading, look at what disabled people are building now. Disability is already inhabiting futures that the mainstream hasn’t recognized.

When Jennifer Wexton’s synthesized voice filled the House Chamber, observers called it the future. The acoustic space everyone else was discovering had been built decades earlier. It was constructed by communities navigating exclusion and building multimodal fluency the rest of us are only beginning to recognize as essential. We are still learning to listen.

Current Issues

Archived JML Print Issues

Learn More About The Journal of Media Literacy

Kiron Heriot Darragh

Kiron Heriot-Darragh is a creative and narrative strategist working at the intersection of technology, culture, and embodiment. She has led cultural programs at OpenAI, Apple, and Nike, and works with artists and researchers to shape how emerging technologies enter culture, meaning, and lived experience.

The Message Is Multimodal: Acoustic Space, Disability, and the Present Future

Abstract

Keywords

Current Issues

Archived JML Print Issues

Learn More About The Journal of Media Literacy

International Council for Media Literacy

Join Our Mailing List

Search

Contact Us

Abstract

Keywords

Current Issues

Archived JML Print Issues

Learn More About The Journal of Media Literacy

Share This:

Reader Interactions

Leave a ReplyCancel reply

Footer

International Council for Media Literacy

Join Our Mailing List

Search

Contact Us