Scholar's Hub

Award-Winning Papers: HCI

These papers have received best paper awards or distinguished paper awards from renowned computer science conferences in the Human-Computer Interaction field. This collection is sourced from each conference.

If you notice any errors, please contact us.

Illustration: Trending Papers
ASSETS

A Collaborative Approach to Support Medication Management in Older Adults with Mild Cognitive Impairment Using Conversational Assistants (CAs)

  • N. Mathur, Kunal Dhodapkar, Tamara Zubatiy, Jiachen Li, Brian D. Jones, Elizabeth D. Mynatt

  • Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility

  • October 22, 2022

Improving medication management for older adults with Mild Cognitive Impairment (MCI) requires designing systems that support functional independence and provide compensatory strategies as their abilities change. Traditional medication management interventions emphasize forming new habits alongside the traditional path of learning to use new technologies. In this study, we navigate designing for older adults with gradual cognitive decline by creating a conversational “check-in” system for routine medication management. We present the design of MATCHA - Medication Action To Check-In for Health Application, informed by exploratory focus groups and design sessions conducted with older adults with MCI and their caregivers, alongside our evaluation based on a two-phased deployment period of 20 weeks. Our results indicate that a conversational “check-in” medication management assistant increased system acceptance while also potentially decreasing the likelihood of accidental over-medication, a common concern for older adults dealing with MCI.

TLDR

The results indicate that a conversational “check-in” medication management assistant increased system acceptance while also potentially decreasing the likelihood of accidental over-medication, a common concern for older adults dealing with MCI.

CHI

The Nuanced Nature of Trust and Privacy Control Adoption in the Context of Google

  • Ehsan Ul Haque, Mohammad Maifi Hasan Khan, Md Abdullah Al Fahim

  • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

  • April 19, 2023

This paper investigates how trust towards service providers and the adoption of privacy controls belonging to two specific purposes (control over “sharing” vs. “usage” of data) vary based on users’ technical literacy. Towards that, we chose Google as the context and conducted an online survey across 209 Google users. Our results suggest that integrity and benevolence perceptions toward Google are significantly lower among technical participants than non-technical participants. While trust perceptions differ between non-technical adopters and non-adopters of privacy controls, no such difference is found among the technical counterparts. Notably, among the non-technical participants, the direction of trust affecting privacy control adoption is observed to be reversed based on the purpose of the controls. Using qualitative analysis, we extract trust-enhancing and dampening factors contributing to users’ trusting beliefs towards Google’s protection of user privacy. The implications of our findings for the design and promotion of privacy controls are discussed in the paper.

TLDR

The results suggest that integrity and benevolence perceptions toward Google are significantly lower among technical participants than non-technical participants, and the direction of trust affecting privacy control adoption is observed to be reversed based on the purpose of the controls.

Understanding the Benefits and Challenges of Deploying Conversational AI Leveraging Large Language Models for Public Health Intervention

  • Eunkyung Jo, Daniel A. Epstein, Hyunhoon Jung, Young-Ho Kim

  • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

  • April 19, 2023

Recent large language models (LLMs) have advanced the quality of open-ended conversations with chatbots. Although LLM-driven chatbots have the potential to support public health interventions by monitoring populations at scale through empathetic interactions, their use in real-world settings is underexplored. We thus examine the case of CareCall, an open-domain chatbot that aims to support socially isolated individuals via check-up phone calls and monitoring by teleoperators. Through focus group observations and interviews with 34 people from three stakeholder groups, including the users, the teleoperators, and the developers, we found CareCall offered a holistic understanding of each individual while offloading the public health workload and helped mitigate loneliness and emotional burdens. However, our findings highlight that traits of LLM-driven chatbots led to challenges in supporting public and personal health needs. We discuss considerations of designing and deploying LLM-driven chatbots for public health intervention, including tensions among stakeholders around system expectations.

TLDR

CareCall, an open-domain chatbot that aims to support socially isolated individuals via check-up phone calls and monitoring by teleoperators, is examined, finding CareCall offered a holistic understanding of each individual while offloading the public health workload and helped mitigate loneliness and emotional burdens.

CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context

  • Joseph Chee Chang, Amy X. Zhang, Jonathan Bragg, Andrew Head, Kyle Lo, Doug Downey, Daniel S. Weld

  • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

  • February 14, 2023

When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature reviews. This paper introduces CiteSee, a paper reading tool that leverages a user’s publishing, reading, and saving activities to provide personalized visual augmentations and context around citations. First, CiteSee connects the current paper to familiar contexts by surfacing known citations a user had cited or opened. Second, CiteSee helps users prioritize their exploration by highlighting relevant but unknown citations based on saving and reading history. We conducted a lab study that suggests CiteSee is significantly more effective for paper discovery than three baselines. A field deployment study shows CiteSee helps participants keep track of their explorations and leads to better situational awareness and increased paper discovery via inline citation when conducting real-world literature reviews.

TLDR

CiteSee is a paper reading tool that leverages a user’s publishing, reading, and saving activities to provide personalized visual augmentations and context around citations to help users prioritize their exploration.

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

  • Zixiong Su, Shitao Fang, J. Rekimoto

  • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

  • February 12, 2023

Silent speech interface is a promising technology that enables private communications in natural language. However, previous approaches only support a small and inflexible vocabulary, which leads to limited expressiveness. We leverage contrastive learning to learn efficient lipreading representations, enabling few-shot command customization with minimal user effort. Our model exhibits high robustness to different lighting, posture, and gesture conditions on an in-the-wild dataset. For 25-command classification, an F1-score of 0.8947 is achievable only using one shot, and its performance can be further boosted by adaptively learning from more data. This generalizability allowed us to develop a mobile silent speech interface empowered with on-device fine-tuning and visual keyword spotting. A user study demonstrated that with LipLearner, users could define their own commands with high reliability guaranteed by an online incremental learning scheme. Subjective feedback indicated that our system provides essential functionalities for customizable silent speech interactions with high usability and learnability.

TLDR

LipLearner leverages contrastive learning to learn efficient lipreading representations, enabling few-shot command customization with minimal user effort, and exhibits high robustness to different lighting, posture, and gesture conditions on an in-the-wild dataset.

CHI PLAY

Why Should Red and Green Never Be Seen? Exploring Color Blindness Simulations as Tools to Create Chromatically Accessible Game

  • Mateus Pinheiro, Windson Viana, Ticianne de Gois Ribeiro Darin

  • Proceedings of the ACM on Human-Computer Interaction

  • September 29, 2023

Video games have become an important aspect of modern culture, especially with the widespread use of mobile devices. Thus, it is important that video games are accessible to all people, but colorblind players are still affected by the use of colors in game interfaces. Some challenges of developing chromatically accessible games are the limited availability of colorblind test subjects and the importance of identifying and considering accessibility threats even in the early stages of development. Thus, digital simulations emerge as possible tools to increase accessibility and awareness. In this paper, we conducted three empirical studies that seek to verify the relationship between the identification of color accessibility problems by people with typical color vision using simulations and people with color blindness, in the context of mobile games. Results indicate concrete uses in which color blindness simulations give advantages to developers with typical vision in identifying chromatic accessibility issues in their games. Additionally, we discuss different possibilities for incorporating simulation tools, accessibility guidelines, and colorblind user participation into a realistic game design life cycle. We also discuss how the incorporation of simulation tools could be beneficial to foment the discussion of accessibility in game design studios.

TLDR

Three empirical studies are conducted that seek to verify the relationship between the identification of color accessibility problems by people with typical color vision using simulations and people with color blindness, in the context of mobile games

Communication Sequences Indicate Team Cohesion: A Mixed-Methods Study of Ad Hoc League of Legends Teams

  • Evelyn T S Tan, Katja Rogers, L. Nacke, Anders Drachen, Alex Wade

  • Proceedings of the ACM on Human-Computer Interaction

  • October 25, 2022

Team cohesion is a widely known predictor of performance and collaborative satisfaction. However, how it develops and can be assessed, especially in fast-paced ad hoc dynamic teams, remains unclear. An unobtrusive and objective behavioural measure of cohesion would help identify determinants of cohesion in these teams. We investigated team communication as a potential measure in a mixed-methods study with 48 teams (n=135) in the digital game League of Legends. We first established that cohesion shows similar performance and satisfaction in League of Legends. teams as in non-game teams and confirmed a positive relationship between communication word frequency and cohesion. Further, we conducted an in-depth exploratory qualitative analysis of the communication sequences in a high-cohesion and a low-cohesion team. High cohesion is associated with sequences of apology->encouragement, suggestion->agree/acknowledge, answer->answer, and answer->question, while low-cohesion is associated with sequences of opinion/analysis->opinion/analysis, disagree->disagree, command->disagree, and frustration->frustration. Our findings also show that cohesion is important to team satisfaction independently of the match outcomes. We highlight that communication sequences are more useful than frequencies to determine team cohesion via player interactions.

TLDR

CSCW

The Effects of AI-based Credibility Indicators on the Detection and Spread of Misinformation under Social Influence

  • Zhuoran Lu, Patrick Li, Weilong Wang, Ming Yin

  • Proceedings of the ACM on Human-Computer Interaction

  • November 7, 2022

Misinformation on social media has become a serious concern. Marking news stories with credibility indicators, possibly generated by an AI model, is one way to help people combat misinformation. In this paper, we report the results of two randomized experiments that aim to understand the effects of AI-based credibility indicators on people's perceptions of and engagement with the news, when people are under social influence such that their judgement of the news is influenced by other people. We find that the presence of AI-based credibility indicators nudges people into aligning their belief in the veracity of news with the AI model's prediction regardless of its correctness, thereby changing people's accuracy in detecting misinformation. However, AI-based credibility indicators show limited impacts on influencing people's engagement with either real news or fake news when social influence exists. Finally, it is shown that when social influence is present, the effects of AI-based credibility indicators on the detection and spread of misinformation are larger as compared to when social influence is absent, when these indicators are provided to people before they form their own judgements about the news. We conclude by providing implications for better utilizing AI to fight misinformation.

TLDR

It is shown that the effects of AI-based credibility indicators on the detection and spread of misinformation are larger as compared to when social influence is absent, when these indicators are provided to people before they form their own judgements about the news.

IMX
IUI

Deep Learning Uncertainty in Machine Teaching

  • Téo Sanchez, Baptiste Caramiaux, Pierre Thiel, W. Mackay

  • 27th International Conference on Intelligent User Interfaces

  • March 22, 2022

Machine Learning models can output confident but incorrect predictions. To address this problem, ML researchers use various techniques to reliably estimate ML uncertainty, usually performed on controlled benchmarks once the model has been trained. We explore how the two types of uncertainty—aleatoric and epistemic—can help non-expert users understand the strengths and weaknesses of a classifier in an interactive setting. We are interested in users’ perception of the difference between aleatoric and epistemic uncertainty and their use to teach and understand the classifier. We conducted an experiment where non-experts train a classifier to recognize card images, and are tested on their ability to predict classifier outcomes. Participants who used either larger or more varied training sets significantly improved their understanding of uncertainty, both epistemic or aleatoric. However, participants who relied on the uncertainty measure to guide their choice of training data did not significantly improve classifier training, nor were they better able to guess the classifier outcome. We identified three specific situations where participants successfully identified the difference between aleatoric and epistemic uncertainty: placing a card in the exact same position as a training card; placing different cards next to each other; and placing a non-card, such as their hand, next to or on top of a card. We discuss our methodology for estimating uncertainty for Interactive Machine Learning systems and question the need for two-level uncertainty in Machine Teaching.

TLDR

It is explored how the two types of uncertainty—aleatoric and epistemic—can help non-expert users understand the strengths and weaknesses of a classifier in an interactive setting and the need for two-level uncertainty in Machine Teaching is questioned.

Hand Gesture Recognition for an Off-the-Shelf Radar by Electromagnetic Modeling and Inversion

  • Arthur Sluÿters, S. Lambot, J. Vanderdonckt

  • 27th International Conference on Intelligent User Interfaces

  • March 22, 2022

Microwave radar sensors in human-computer interactions have several advantages compared to wearable and image-based sensors, such as privacy preservation, high reliability regardless of the ambient and lighting conditions, and larger field of view. However, the raw signals produced by such radars are high-dimension and relatively complex to interpret. Advanced data processing, including machine learning techniques, is therefore necessary for gesture recognition. While these approaches can reach high gesture recognition accuracy, using artificial neural networks requires a significant amount of gesture templates for training and calibration is radar-specific. To address these challenges, we present a novel data processing pipeline for hand gesture recognition that combines advanced full-wave electromagnetic modelling and inversion with machine learning. In particular, the physical model accounts for the radar source, radar antennas, radar-target interactions and target itself, i.e.,, the hand in our case. To make this processing feasible, the hand is emulated by an equivalent infinite planar reflector, for which analytical Green’s functions exist. The apparent dielectric permittivity, which depends on the hand size, electric properties, and orientation, determines the wave reflection amplitude based on the distance from the hand to the radar. Through full-wave inversion of the radar data, the physical distance as well as this apparent permittivity are retrieved, thereby reducing by several orders of magnitude the dimension of the radar dataset, while keeping the essential information. Finally, the estimated distance and apparent permittivity as a function of gesture time are used to train the machine learning algorithm for gesture recognition. This physically-based dimension reduction enables the use of simple gesture recognition algorithms, such as template-matching recognizers, that can be trained in real time and provide competitive accuracy with only a few samples. We evaluate significant stages of our pipeline on a dataset of 16 gesture classes, with 5 templates per class, recorded with the Walabot, a lightweight, off-the-shelf array radar. We also compare these results with an ultra wideband radar made of a single horn antenna and lightweight vector network analyzer, and a Leap Motion Controller.

TLDR

A novel data processing pipeline for hand gesture recognition that combines advanced full-wave electromagnetic modelling and inversion with machine learning is presented that enables the use of simple gesture recognition algorithms, such as template-matching recognizers, that can be trained in real time and provide competitive accuracy with only a few samples.

SIGGRAPH

Image features influence reaction time

  • Budmonde Duinkharjav, Praneeth Chakravarthula, Rachel M. Brown, Anjul Patney, Qi Sun

  • ACM Transactions on Graphics (TOG)

  • May 5, 2022

We aim to ask and answer an essential question "how quickly do we react after observing a displayed visual target?" To this end, we present psychophysical studies that characterize the remarkable disconnect between human saccadic behaviors and spatial visual acuity. Building on the results of our studies, we develop a perceptual model to predict temporal gaze behavior, particularly saccadic latency, as a function of the statistics of a displayed image. Specifically, we implement a neurologically-inspired probabilistic model that mimics the accumulation of confidence that leads to a perceptual decision. We validate our model with a series of objective measurements and user studies using an eye-tracked VR display. The results demonstrate that our model prediction is in statistical alignment with real-world human behavior. Further, we establish that many sub-threshold image modifications commonly introduced in graphics pipelines may significantly alter human reaction timing, even if the differences are visually undetectable. Finally, we show that our model can serve as a metric to predict and alter reaction latency of users in interactive computer graphics applications, thus may improve gaze-contingent rendering, design of virtual experiences, and player performance in e-sports. We illustrate this with two examples: estimating competition fairness in a video game with two different team colors, and tuning display viewing distance to minimize player reaction time.

TLDR

This work develops a perceptual model to predict temporal gaze behavior, particularly saccadic latency, as a function of the statistics of a displayed image, and implements a neurologically-inspired probabilistic model that mimics the accumulation of confidence that leads to a perceptual decision.

CLIPasso: Semantically-Aware Object Sketching

  • Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Bachmann, Amit H. Bermano, D. Cohen-Or, A. Zamir, Ariel Shamir

  • ACM Transactions on Graphics (TOG)

  • February 11, 2022

Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present CLIPasso, an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distill semantic concepts from sketches and images alike. We define a sketch as a set of B\'ezier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.

TLDR

CLIPasso is presented, an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications, and utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distill semantic concepts from sketches and images alike.

Spelunking the Deep: Guaranteed Queries on General Neural Implicit Surfaces via Range Analysis

  • Nicholas Sharp, A. Jacobson

  • February 5, 2022

Neural implicit representations, which encode a surface as the level set of a neural network applied to spatial coordinates, have proven to be remarkably effective for optimizing, compressing, and generating 3D geometry. Although these representations are easy to fit, it is not clear how to best evaluate geometric queries on the shape, such as intersecting against a ray or finding a closest point. The predominant approach is to encourage the network to have a signed distance property. However, this property typically holds only approximately, leading to robustness issues, and holds only at the conclusion of training, inhibiting the use of queries in loss functions. Instead, this work presents a new approach to perform queries directly on general neural implicit functions for a wide range of existing architectures. Our key tool is the application of range analysis to neural networks, using automatic arithmetic rules to bound the output of a network over a region; we conduct a study of range analysis on neural networks, and identify variants of affine arithmetic which are highly effective. We use the resulting bounds to develop geometric queries including ray casting, intersection testing, constructing spatial hierarchies, fast mesh extraction, closest-point evaluation, evaluating bulk properties, and more. Our queries can be efficiently evaluated on GPUs, and offer concrete accuracy guarantees even on randomly-initialized net- works, enabling their use in training objectives and beyond. We also show a preliminary application to inverse rendering.

TLDR

This work presents a new approach to perform queries directly on general neural implicit functions for a wide range of existing architectures, using automatic arithmetic rules to bound the output of a network over a region and conducting a study of range analysis on neural networks.

Instant neural graphics primitives with a multiresolution hash encoding

  • T. Müller, Alex Evans, Christoph Schied, A. Keller

  • ACM Transactions on Graphics (TOG)

  • January 16, 2022

Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920×1080.

TLDR

A versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations is introduced, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920×1080.

DeepPhase: periodic autoencoders for learning motion phase manifolds

  • S. Starke, I. Mason, T. Komura, Zhaoming Xie, Hung Yu Ling, M. V. D. Panne, Yiwei Zhao, F. Zinno

  • ACM Transactions on Graphics (TOG)

  • December 31, 2021

Learning the spatial-temporal structure of body movements is a fundamental problem for character motion synthesis. In this work, we propose a novel neural network architecture called the Periodic Autoencoder that can learn periodic features from large unstructured motion datasets in an unsupervised manner. The character movements are decomposed into multiple latent channels that capture the non-linear periodicity of different body segments while progressing forward in time. Our method extracts a multi-dimensional phase space from full-body motion data, which effectively clusters animations and produces a manifold in which computed feature distances provide a better similarity measure than in the original motion space to achieve better temporal and spatial alignment. We demonstrate that the learned periodic embedding can significantly help to improve neural motion synthesis in a number of tasks, including diverse locomotion skills, style-based move- ments, dance motion synthesis from music, synthesis of dribbling motions in football, and motion query for matching poses within large animation databases.

TLDR

It is demonstrated that the learned periodic embedding can significantly help to improve neural motion synthesis in a number of tasks, including diverse locomotion skills, style-based move- ments, dance motion synthesis from music, synthesis of dribbling motions in football, and motion query for matching poses within large animation databases.

TEI
Ubi Comp/ISWC

Detecting Receptivity for mHealth Interventions in the Natural Environment

  • Varun Mishra, F. Künzler, Jan-Niklas Kramer, E. Fleisch, T. Kowatsch, D. Kotz

  • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

  • November 16, 2020

Just-In-Time Adaptive Intervention (JITAI) is an emerging technique with great potential to support health behavior by providing the right type and amount of support at the right time. A crucial aspect of JITAIs is properly timing the delivery of interventions, to ensure that a user is receptive and ready to process and use the support provided. Some prior works have explored the association of context and some user-specific traits on receptivity, and have built post-study machine-learning models to detect receptivity. For effective intervention delivery, however, a JITAI system needs to make in-the-moment decisions about a user's receptivity. To this end, we conducted a study in which we deployed machine-learning models to detect receptivity in the natural environment, i.e., in free-living conditions. We leveraged prior work regarding receptivity to JITAIs and deployed a chatbot-based digital coach - Ally - that provided physical-activity interventions and motivated participants to achieve their step goals. We extended the original Ally app to include two types of machine-learning model that used contextual information about a person to predict when a person is receptive: a static model that was built before the study started and remained constant for all participants and an adaptive model that continuously learned the receptivity of individual participants and updated itself as the study progressed. For comparison, we included a control model that sent intervention messages at random times. The app randomly selected a delivery model for each intervention message. We observed that the machine-learning models led up to a 40% improvement in receptivity as compared to the control model. Further, we evaluated the temporal dynamics of the different models and observed that receptivity to messages from the adaptive model increased over the course of the study.

TLDR

A study in which machine-learning models were deployed to detect receptivity in the natural environment, i.e., in free-living conditions and deployed a chatbot-based digital coach that provided physical-activity interventions and motivated participants to achieve their step goals.

UIST

GenAssist: Making Image Generation Accessible

  • Mina Huh, Yi-Hao Peng, Amy Pavel

  • ACM Symposium on User Interface Software and Technology

  • July 14, 2023

Blind and low vision (BLV) creators use images to communicate with sighted audiences. However, creating or retrieving images is challenging for BLV creators as it is difficult to use authoring tools or assess image search results. Thus, creators limit the types of images they create or recruit sighted collaborators. While text-to-image generation models let creators generate high-fidelity images based on a text description (i.e. prompt), it is difficult to assess the content and quality of generated images. We present GenAssist, a system to make text-to-image generation accessible. Using our interface, creators can verify whether generated image candidates followed the prompt, access additional details in the image not specified in the prompt, and skim a summary of similarities and differences between image candidates. To power the interface, GenAssist uses a large language model to generate visual questions, vision-language models to extract answers, and a large language model to summarize the results. Our study with 12 BLV creators demonstrated that GenAssist enables and simplifies the process of image selection and generation, making visual authoring more accessible to all.

TLDR

The study with 12 BLV creators demonstrated that GenAssist enables and simplifies the process of image selection and generation, making visual authoring more accessible to all.

Generative Agents: Interactive Simulacra of Human Behavior

  • J. Park, Joseph C. O'Brien, Carrie J. Cai, M. Morris, Percy Liang, Michael S. Bernstein

  • ACM Symposium on User Interface Software and Technology

  • April 7, 2023

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents: computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty-five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors. For example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture—observation, planning, and reflection—each contribute critically to the believability of agent behavior. By fusing large language models with computational interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

TLDR

This work describes an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior.

Grid-Coding: An Accessible, Efficient, and Structured Coding Paradigm for Blind and Low-Vision Programmers

  • Md Ehtesham-Ul-Haque, Syed Mostofa Monsur, Syed Masum Billah

  • Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

  • October 28, 2022

Sighted programmers often rely on visual cues (e.g., syntax coloring, keyword highlighting, code formatting) to perform common coding activities in text-based languages (e.g., Python). Unfortunately, blind and low-vision (BLV) programmers hardly benefit from these visual cues because they interact with computers via assistive technologies (e.g., screen readers), which fail to communicate visual semantics meaningfully. Prior work on making text-based programming languages and environments accessible mostly focused on code navigation and, to some extent, code debugging, but not much toward code editing, which is an essential coding activity. We present Grid-Coding to fill this gap. Grid-Coding renders source code in a structured 2D grid, where each row, column, and cell have consistent, meaningful semantics. Its design is grounded on prior work and refined by 28 BLV programmers through online participatory sessions for 2 months. We implemented the Grid-Coding prototype as a spreadsheet-like web application for Python and evaluated it with a study with 12 BLV programmers. This study revealed that, compared to a text editor (i.e., the go-to editor for BLV programmers), our prototype enabled BLV programmers to navigate source code quickly, find the context of a statement easily, detect syntax errors in existing code effectively, and write new code with fewer syntax errors. The study also revealed how BLV programmers adopted Grid-Coding and demonstrated novel interaction patterns conducive to increased programming productivity.

TLDR

This study revealed that the Grid-Coding prototype enabled BLV programmers to navigate source code quickly, find the context of a statement easily, detect syntax errors in existing code effectively, and write new code with fewer syntax errors.

CrossA11y: Identifying Video Accessibility Issues via Cross-modal Grounding

  • Xingyu Liu, Ruolin Wang, Dingzeyu Li, Xiang 'Anthony' Chen, Amy Pavel

  • Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

  • August 23, 2022

Authors make their videos visually accessible by adding audio descriptions (AD), and auditorily accessible by adding closed captions (CC). However, creating AD and CC is challenging and tedious, especially for non-professional describers and captioners, due to the difficulty of identifying accessibility problems in videos. A video author will have to watch the video through and manually check for inaccessible information frame-by-frame, for both visual and auditory modalities. In this paper, we present CrossA11y, a system that helps authors efficiently detect and address visual and auditory accessibility issues in videos. Using cross-modal grounding analysis, CrossA11y automatically measures accessibility of visual and audio segments in a video by checking for modality asymmetries. CrossA11y then displays these segments and surfaces visual and audio accessibility issues in a unified interface, making it intuitive to locate, review, script AD/CC in-place, and preview the described and captioned video immediately. We demonstrate the effectiveness of CrossA11y through a lab study with 11 participants, comparing to existing baseline.

TLDR

This paper presents CrossA11y, a system that helps authors efficiently detect and address visual and auditory accessibility issues in videos by using cross-modal grounding analysis and automatically measures accessibility of visual and audio segments in a video by checking for modality asymmetries.

Going Incognito in the Metaverse: Achieving Theoretically Optimal Privacy-Usability Tradeoffs in VR

  • V. Nair, Gonzalo Munilla-Garrido, D. Song

  • ACM Symposium on User Interface Software and Technology

  • August 11, 2022

Virtual reality (VR) telepresence applications and the so-called “metaverse” promise to be the next major medium of human-computer interaction. However, with recent studies demonstrating the ease at which VR users can be profiled and deanonymized, metaverse platforms carry many of the privacy risks of the conventional internet (and more) while at present offering few of the defensive utilities that users are accustomed to having access to. To remedy this, we present the first known method of implementing an “incognito mode” for VR. Our technique leverages local ε-differential privacy to quantifiably obscure sensitive user data attributes, with a focus on intelligently adding noise when and where it is needed most to maximize privacy while minimizing usability impact. Our system is capable of flexibly adapting to the unique needs of each VR application to further optimize this trade-off. We implement our solution as a universal Unity (C#) plugin that we then evaluate using several popular VR applications. Upon faithfully replicating the most well-known VR privacy attack studies, we show a significant degradation of attacker capabilities when using our solution.

TLDR

This work presents the first known method of implementing an “incognito mode” for VR, which leverages local ε-differential privacy to quantifiably obscure sensitive user data attributes and shows a significant degradation of attacker capabilities when using this solution.

VRST

Intuitive User Interfaces for Real-Time Magnification in Augmented Reality

  • Ryan Schubert, G. Bruder, Gregory F. Welch

  • Virtual Reality Software and Technology

  • October 9, 2023

Various reasons exist why humans desire to magnify portions of our visually perceived surroundings, e.g., because they are too far away or too small to see with the naked eye. Different technologies are used to facilitate magnification, from telescopes to microscopes using monocular or binocular designs. In particular, modern digital cameras capable of optical and/or digital zoom are very flexible as their high-resolution imagery can be presented to users in real-time with displays and interfaces allowing control over the magnification. In this paper, we present a novel design space of intuitive augmented reality (AR) magnifications where an AR head-mounted display is used for the presentation of real-time magnified camera imagery. We present a user study evaluating and comparing different visual presentation methods and AR interaction techniques. Our results show different advantages for unimanual, bimanual, and situated AR magnification window interfaces, near versus far vergence distances for the image presentation, and five different user interfaces for specifying the scaling factor of the imagery.

TLDR

A novel design space of intuitive augmented reality (AR) magnifications where an AR head-mounted display is used for the presentation of real-time magnified camera imagery and a user study evaluating and comparing different visual presentation methods and AR interaction techniques is presented.

Walk This Beam: Impact of Different Balance Assistance Strategies and Height Exposure on Performance and Physiological Arousal in VR

  • Dennis Dietz, Carl Oechsner, Changkun Ou, Francesco Chiossi, F. Sarto, Sven Mayer, A. Butz

  • Proceedings of the 28th ACM Symposium on Virtual Reality Software and Technology

  • November 29, 2022

Dynamic balance is an essential skill for the human upright gait; therefore, regular balance training can improve postural control and reduce the risk of injury. Even slight variations in walking conditions like height or ground conditions can significantly impact walking performance. Virtual reality is used as a helpful tool to simulate such challenging situations. However, there is no agreement on design strategies for balance training in virtual reality under stressful environmental conditions such as height exposure. We investigate how two different training strategies, imitation learning, and gamified learning, can help dynamic balance control performance across different stress conditions. Moreover, we evaluate the stress response as indexed by peripheral physiological measures of stress, perceived workload, and user experience. Both approaches were tested against a baseline of no instructions and against each other. Thereby, we show that a learning-by-imitation approach immediately helps dynamic balance control, decreases stress, improves attention focus, and diminishes perceived workload. A gamified approach can lead to users being overwhelmed by the additional task. Finally, we discuss how our approaches could be adapted for balance training and applied to injury rehabilitation and prevention.

TLDR

Latest News & Updates

Case Study: Iterative Design for Skimming Support

Case Study: Iterative Design for Skimming Support

How might we help researchers quickly assess the relevance of scientific literature? Take a closer look at Skimming, Semantic Reader’s latest AI feature, and the collaborative design process behind it.

Behind the Scenes of Semantic Scholar’s New Author Influence Design

Behind the Scenes of Semantic Scholar’s New Author Influence Design

We released a new version of Author Influence interface to help scholars better discover other scholars in their fields. Here's how we identified user insights and made those design choices.

Artificial-intelligence search engines wrangle academic literature

Artificial-intelligence search engines wrangle academic literature

Nature had a chat with Dan Weld, Chief Scientist at Semantic Scholar, to discuss how search engines are helping scientists explore and innovate by making it easier to draw connections from a massive collection of scientific literature.

Experience a smarter way to search and discover scholarly research.

Create Your Account