Presence in Virtual Reality

Adam Hersko-RonaTas ’18

New Frontier exhibit at the 2016 Sundance Film Festival. Participant is using an HTC Vive with wands to experience room-scale VR.

At a talk about “Virtual Realities and the Public Sphere: The Future of Cultural Architecture” hosted by the DeVos Institute of Arts Management, Thomas Forrest Kelly – a professor of music at Harvard University – explained to me: we’re always backing into the medium. This is often an attempt to validate whatever the new medium is, tethering it to some existing form that has its roots already buried into our social consciousness. That way, it might become easier to accept. Television shows began by using curtains to signal the beginning and end of a program, harkening back to the long-established realm of theatre. Movies about fairytales began with a narrator introducing the story, sometimes even depicting the physical book being opened to its first chapter. Animation leeched off the credibility of symphony music. We borrow what works hoping it will carry us into an uncharted frontier long enough to learn the new language.

So how can this understanding of our visual system – and the film techniques meant to capitalize on it – inform our approach to a new medium like VR? First, we will need to reconsider the use of some tried-and-true cinematic tools and evaluate how they would work with VR. The most obvious hurdles: non-transferable editing conventions, different camera constraints, and a lack of direct control over the viewer’s gaze.

Classical film and virtual reality can be differentiated by a key element: presence, a subjective phenomenon of existing in a space induced by immersive VR technology. “Telepresence” is the mediated perception of the world via some medium while presence is simply our natural perception of the world.

The virtual world is a space within a space. And space itself exists only in relationships established by a triangular organization of dimensions: height, width, and depth. To feel “present” in a virtual world, we anchor ourselves by selecting and attending to the elements we can tie to a common reference point. After all, our eyes and ears are being fed nothing more than pixels and bits by the head-mounted display.

A team of researchers (Diemer, Alpers, Peprkorn, Shiban, & Mahlberger, 2015) recognizes three dimensions of presence: spatial presence, involvement, and realness. Involvement hinges on the level of interactivity and is influenced by speed (rate at which user input is assimilated into experience), range (number of possibilities of action), and mapping (ability of the system to project changes instituted by user into the environment). Realness depends on the technology, which, as it develops, will expand with regard to sensory breadth and depth. The results of this study show that the stronger the feeling (e.g. fear and anxiety are “stronger” than joy), the greater the correlation between emotion and reported sense of presence.  Additionally, high emotionality in the patient sample (having an anxiety disorder versus no mood disorder) facilitated the impact of emotion on sense of presence.

Virtual reality as a whole consists of two components: system factors and content. System factors influence the cognitive presence judgment externally and affect immersion and interactivity. Content influences emotion through arousal and impacts presence judgment through perception and multisensory information. Therefore, the researchers argue that immersion and presence are different: the former is a reflection of the technology while the latter is a subjective experience (2015).

But even if telepresence is internally, subjectively created, viewers still seek an objective narrative, theme, character, etc. while watching VR. So what sort of stories should we be telling in VR to capitalize on its strengths?

Film theorist Andre Bazin was skeptical of the use of rapid cuts between short takes and the use of depth-of-field (often used to guide audience attention in film). These techniques, he believed, stood in ideological and ethical opposition to the integrity of cinema by manipulating the viewer and corrupting the flow of continuous time and space. He advocated for a more “democratic ambiguity” that welcomes audience interpretation of the image. This way, the viewer can decide where to direct his or her own attention without the harness imposed by cinematic devices. This argument was in stark contrast to the attitude of the French New Wave of the 50s and 60s, which sought to make the medium itself as transparent as possible with disjointed cuts and few attempts at unity; their films became imbued with technique meant to be self-revealing.

VR storytelling should follow Bazinian principles if creators want to maintain a strong illusion of presence. To establish telepresence, the audience needs ample time to look around their environment and position themselves in it without completely corrupting the magic. This means that quick cuts do not really work well. Unlike film, where fast cuts can be seamless and the audience’s position relative to the action is not crucial for understanding the events, the VR viewer is an inherent part of the VR narrative that cannot be as easily forgotten or overridden. Classical editing conventions are abandoned.

Quick cuts in film work because viewers do not need a fully constructed representation of time and space to passively follow the basic story on screen. In VR, a robust spatiotemporal representation is the name of the game. Transitions ought to be carefully crafted in order to assist viewers in gaining a sense of presence at their own pace. And now, because the viewer is directly deposited in the scene they are watching, we must ask: Is the viewer a character? Do we acknowledge them in the scene? Or are they just a fly on the wall?

Doug Liman, a prominent Hollywood producer, commented on storytelling when interviewed about his VR miniseries Invisible:

“…when you just take a traditional scripted scene out of any TV script or movie script and shoot it in VR, it’s going to be less compelling than what was shot in 2D. You’ll feel like you’re watching a video of a play. VR should be more emotionally involving, but that doesn’t happen automatically by just taking a VR camera and sticking it onto what would be a traditionally blocked scene for 2D. To keep the audience engaged and involved, we also needed the story to jump around more than we thought it would have to. Your enjoyment of the world actually increased by dramatically changing environments… and not staying in any one place too long. Because at a certain point you feel like you’ve explored your world, and I think like a great meal, you want to leave people hungry.”

The fact that the viewer has to search an exponentially wider visual array in VR means that you must be confident that the viewer will be looking where you want them to so you can match the line of action across the cut.  Otherwise, it’s imperative to provide the viewer with a sufficient warning before a cut. After watching a few episodes of Liman’s series, I occasionally felt lost and considered many of the camera positions superfluous (some shots lasting less than two seconds). How much time is too little to hold on a shot? Calculating the average length necessary for audience members to feel comfortable in a new environment is an area worthy of further study.

Doug Liman, director of Invisible.

The faster or more often you cut, the more you make it apparent to the viewer that he or she is in an artificially constructed space as a passive observer. He or she might become detached from the action and feel less immersed in the fictional reality. This also calls into question the effectiveness of passive VR stories (which is all you can truly have when filming with 360° cameras; added interactivity is difficult in something pre-shot). Wevr’s VR miniseries Gone adds an element of interactivity by allowing the viewer to teleport to various locations in space without upsetting the timing of story. This allows the audience to get a closer look at things and gain new vantages of their own accord. Live-action footage with this sort of treatment and any VR games that allow for interactivity further fulfill the dimensions contributing to telepresence (involvement and realness).

How do we treat the camera as a cinematic object? With the VR camera, the viewer becomes a singular point in spherically (or sometimes cubically) projected space. There are four basic kinds of movement in film: movement of objects in the frame, movement between the images (the cuts), optical movement of the camera lens (the zoom), and camera movement. Movement of objects in the frame still applies to VR. Cuts become trickier, as explained earlier. Zoom is practically impossible. Unlike a flat plane, the visual sphere that surrounds the VR viewer cannot be zoomed in upon uniformly. Basic geometry ensures that the image projected on the sphere has nowhere to expand to. Areas can be highlighted as if by a magnifying glass deposited over your field of view, but this visual trick can appear contrived. The last kind of movement, camera movement, is absolutely possible.

Apart from stereopsis, there are few other ways to create depth apart from physical motion of the objects and of the viewer through the space. This can be difficult for many technical reasons (e.g. where do you hide the machine/person operating the camera), but can be resolved with remote-controlled equipment, careful planning, and lots of post-production image-doctoring.

 

The filmmaker’s primary tool is control of the viewer’s head movements and locomotion. The cameraman and editor for film are conducting the same purpose in terms of composition. Orientation is pivotal for comprehension. The whole logic behind cinematic framing is that the act of excluding or including certain elements in the optic array channels gaze and attention in a premeditated way. And, as Bazin would likely rejoice, this exact tool is surrendered in VR.

 

So do VR filmmakers raise a white flag and cross their fingers hoping viewers will know where to look when? Of course not. The key pillars of a story told in any media generally revolve around an animate object (a character), event structure (a plotline), and effective/affective style. All of these can still exist within VR as long as they work in tandem with the illusory element that bolsters subjective feelings of telepresence. Spatially localized sound can snap viewers’ heads in a desired direction. Furthermore, creators can rest assured that viewers will tend to direct gaze to the predictable places anyway. If we remember the human propensity to stare at faces, we can safely assume that viewer attention will largely gravitate towards wherever the characters are positioned in the environment.

Equirectangular, full panoramic still from the short “Parched” (2017).

But I also see an opportunity to redefine the role of the director. Bates, in his paper “Virtual Reality, Art, and Entertainment” (1992), suggests that the director-viewer relationship should be a two-way street and that it is the director’s duty to ensure the elements of the world are constantly curated to satisfy the viewer’s threshold of “free-will.” Attempting to account for an infinitely branching array of user interactions or orientations provides additional complexity. But how much can the director influence the experience without drawing attention to his or her puppetry? Preliminary results demonstrate that this influence can become invasive fairly quickly; however, modifying the actions of agents in the world can reduce the noticeability of interference (1992). Viewers engaged in a VR experience were quick to accept behaviors of characters despite the character behavior being deemed irrational from an outsider perspective (1992). This finding further supports the notion that social cognition is closely tied to our perception of stories.

So VR is “here.” Naysayers will bemoan this new medium’s immersive quality as something that might rapture the coming generation. But just like with television (or even the advent of print), we see that the world adapted and progressed as a result of the medium’s inception. Of course, this does not mean we should overlook the obvious ethical implications of virtual reality (e.g. How do we mediate trauma when the sensations within the experience are so salient?) and its addictive qualities, but the medium’s positive potential currently seems to outweigh its disadvantages.

One reasonable fear is that a new medium like VR threatens traditional video— or “flatties” as my coworker affectionately dubbed 2D video — as the most moving storytelling platform. Highly immersive cinema makes its impact simply through arousal, a very basic dimension of emotion. When it comes to eliciting an arousal response, VR is king. Instead of becoming the chief platform for storytelling, however, VR should be welcomed as a supplementary medium, an alternative means for unfolding narrative along the dimensions of space and time.

This sort of oversaturation of new media allows for traditional media platforms to hone their strengths. Film can focus on what it does best: painting a frame, directing attention, capturing subtly in an actor’s face. VR diffuses the growing demand for fully immersive cinematic experiences, but obviously takes some physical effort. People often choose to see movies for the passivity.

Moreover, it is important to consider which stories are and are not worth sharing in VR. In what is already a compelling film, the additional x-number of degrees surrounding the viewer are likely extraneous. I take my own analyses of cinematic technique with a grain of salt. Of course, blanketing a project with time-tested strategies used to manipulate attention does not guarantee effectiveness or that the story told with them will be any good. So it is crucial to acknowledge that storytelling, because it is innately human, can work well without serious intervention or overthinking its presentational elements. With that said, a cognitive perspective does help us understand the basis for a lot of creative choices storytellers do make (either naturally or deliberately) when crafting a visual experience for their audience.

The discrepancy between how film and VR are consumed will continue to exist in large part because in VR, once you’re in— you’re in. The television viewer could keep his or her eyes on the stove as he or she watched the evening news. In virtual reality, the viewer must fully commit to the experience. By serving input directly into the sensory organs we rely on most to discern space (the eyes and ears), we have [virtually] removed ourselves from the world around us. Books, similarly, require their audience to devote a greater cognitive load than television. Human brains require higher-level semantic processing to keep attention locked onto the page of a book. VR is a little less delicate; viewers are physically captive to their sensory modalities.

So this allocation of attention could be seen as being still more passive than a book but more active than television: the information is being channeled directly to the viewer. What it lacks in freedom for imagination, however, VR makes up for in cogency. The agency and spatial cues (both visual and auditory) that VR harnesses are pretty darn convincing.

So until everyone has their own, easily-accessible, affordable, high-quality head-mounted display, the opportunities to share these virtual stories in their intended form will remain largely isolated within the community generating the excitement around this medium in the first place. With YouTube and Facebook pushing for content to share in their respective 360° players, the internet has seen a large upsurge in 360° videos, most of which are viewed on computer screens. Google’s Cardboard — an inexpensive, foldable, cardboard headset that utilizes the user’s smartphone — has opened this medium to the masses. Vive and Oculus offer room scale VR head-mounted displays, but require hefty computers and several external components for tracking. Cardboard remains the carousel to Vive and Oculus’ rollercoaster.

As always, organizations based around digital storytelling are looking to remain one step ahead of the consumer trend. There are dozens of platforms competing to become the “Netflix” of virtual reality content, like Wevr’s aptly named Transport. Most mid-tier and large film festivals are clamoring to include new media exhibitions as a part of their programming. Finding its stride in January 2017 at the Sundance Film Festival, the New Frontier exhibition highlighted many forms of interactive media. The showstopper, without a doubt, was the collection of Oculus and Vive experiences, with Cardboards being issued to attendees like candy.

These narrative concerns for what telepresence does for the viewer are moot, however, if the headsets are not widely available. Creators have little control over how their content is viewed, scrambling to account for the infinite variability of screens around the world. Unfortunately, unlike film, VR is not easily screened for an audience. It is a uniquely personal experience whose dissemination calls for arcade-like constructions in place of theaters. For VR, there is no reliable standard. Screens are ubiquitous, from personal devices to stores to buses. Pop-up VR exhibitions are still a rarity. But the focus on the technology distracts from the crucial element that makes the technology itself worth using. As Bates aptly phrases it, the focus on head-mounted displays is the equivalent of “studying celluloid instead of cinema, paper instead of novels, cathode ray tubes instead of television.”

Proponents of this technology love hearing it labeled the “empathy machine.” But I believe further research is needed to evaluate how starkly a point-of-view scene in VR influences brain activity given that viewers watching any experience will suspend their disbelief, but rarely completely lose sight of the distance between themselves and the medium. Empirical studies into the neurological effect of this new media will hopefully guide how distributors implement VR responsibly; every medium is appropriated by the commercial realm once it has hopped on the mainstream treadmill. Admittedly, VR does offer an unprecedented look into the literal point-of-view of other people and their experiences. Historically, storytellers could only convey these experiences in unfairly specific terms when words and curated imagery (moving and static) were their primary tools. But now we can take a step back, offering audiences less guidance and more agency while consuming these stories. Sans clever framing or rich adjectives to bias the viewer, he or she becomes the arbiter.

A healthy dose of skepticism is always in order. Each new iteration of visual technology brings with it the assumption that we have somehow mastered one phase and are moving on to the next, a phase that is closer to some nebulous idea of what we believe immersive narratives should ultimately achieve. Virtual reality simply shifts this paradigm by offering a dimensional extension to our narrative canvases. Are we moving closer to simulating our own reality in order to most convincingly share any story? Maybe. But each new medium offers the opportunity to tell stories formerly untold by playing to its strengths, highlighting a specific facet of storytelling. Consumers constantly consciously and unconsciously subvert and alter the standard of every medium they inherit. Chances are this cycle will continue.