Scholar's Hub

Award-Winning Papers: AI & Theory

These papers have received best paper awards or distinguished paper awards from renowned computer science conferences in the Artificial Intelligence and Theory fields.

This collection is sourced from each conference. If you notice any errors, please contact us.

Illustration: Trending Papers

AI

AAAI
ACL

What the DAAM: Interpreting Stable Diffusion Using Cross Attention

  • Raphael Tang, Akshat Pandey, Zhiying Jiang, Gefei Yang, K. Kumar, Jimmy Lin, Ferhan Ture

  • Annual Meeting of the Association for Computational Linguistics

  • October 10, 2022

Diffusion models are a milestone in text-to-image generation, but they remain poorly understood, lacking interpretability analyses. In this paper, we perform a text-image attribution analysis on Stable Diffusion, a recently open-sourced model. To produce attribution maps, we upscale and aggregate cross-attention maps in the denoising module, naming our method DAAM. We validate it by testing its segmentation ability on nouns, as well as its generalized attribution quality on all parts of speech, rated by humans. On two generated datasets, we attain a competitive 58.8-64.8 mIoU on noun segmentation and fair to good mean opinion scores (3.4-4.2) on generalized attribution. Then, we apply DAAM to study the role of syntax in the pixel space across head–dependent heat map interaction patterns for ten common dependency relations. We show that, for some relations, the head map consistently subsumes the dependent, while the opposite is true for others. Finally, we study several semantic phenomena, focusing on feature entanglement; we find that the presence of cohyponyms worsens generation quality by 9%, and descriptive adjectives attend too broadly. We are the first to interpret large diffusion models from a visuolinguistic perspective, which enables future research. Our code is at https://github.com/castorini/daam.

TLDR

The first to interpret large diffusion models from a visuolinguistic perspective, which enables future research, and shows that, for some relations, the head map consistently subsumes the dependent, while the opposite is true for others.

Do Androids Laugh at Electric Sheep? Humor “Understanding” Benchmarks from The New Yorker Caption Contest

  • Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Yejin Choi

  • Annual Meeting of the Association for Computational Linguistics

  • September 13, 2022

Large neural networks can now generate jokes, but do they really “understand” humor? We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest: matching a joke to a cartoon, identifying a winning caption, and explaining why a winning caption is funny. These tasks encapsulate progressively more sophisticated aspects of “understanding” a cartoon; key elements are the complex, often surprising relationships between images and captions and the frequent inclusion of indirect and playful allusions to human experience and culture. We investigate both multimodal and language-only models: the former are challenged with the cartoon images directly, while the latter are given multifaceted descriptions of the visual scene to simulate human-level visual understanding. We find that both types of models struggle at all three tasks. For example, our best multimodal models fall 30 accuracy points behind human performance on the matching task, and, even when provided ground-truth visual scene descriptors, human-authored explanations are preferred head-to-head over the best machine-authored ones (few-shot GPT-4) in more than 2/3 of cases. We release models, code, leaderboard, and corpus, which includes newly-gathered annotations describing the image’s locations/entities, what’s unusual in the scene, and an explanation of the joke.

TLDR

This work challenges AI models with three tasks derived from the New Yorker Cartoon Caption Contest: matching a joke to a cartoon, identifying awinning caption, and explaining why a winning caption is funny.

CIKM

D-HYPR: Harnessing Neighborhood Modeling and Asymmetry Preservation for Digraph Representation Learning

  • Honglu Zhou, Advith Chegu, Samuel S. Sohn, Zuohui Fu, Gerard de Melo, M. Kapadia

  • Proceedings of the 31st ACM International Conference on Information & Knowledge Management

  • December 22, 2021

Digraph Representation Learning (DRL) aims to learn representations for directed homogeneous graphs (digraphs). Prior work in DRL is largely constrained (e.g., limited to directed acyclic graphs), or has poor generalizability across tasks (e.g., evaluated solely on one task). Most Graph Neural Networks (GNNs) exhibit poor performance on digraphs due to the neglect of modeling neighborhoods and preserving asymmetry. In this paper, we address these notable challenges by leveraging hyperbolic collaborative learning from multi-ordered and partitioned neighborhoods, and regularizers inspired by socio-psychological factors. Our resulting formalism, Digraph Hyperbolic Networks (D-HYPR) -- albeit conceptually simple -- generalizes to digraphs where cycles and non-transitive relations are common, and is applicable to multiple downstream tasks including node classification, link presence prediction, and link property prediction. In order to assess the effectiveness of D-HYPR, extensive evaluations were performed across 8 real-world digraph datasets involving 21 prior techniques. D-HYPR statistically significantly outperforms the current state of the art. We release our code at https://github.com/hongluzhou/dhypr

TLDR

The resulting formalism, Digraph Hyperbolic Networks (D-HYPR) -- albeit conceptually simple -- generalizes to digraphs where cycles and non-transitive relations are common, and is applicable to multiple downstream tasks including node classification, link presence prediction, and link property prediction.

CVPR

Planning-oriented Autonomous Driving

  • Yi Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wen Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li

  • December 20, 2022

Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction, and planning. In order to perform a wide diversity of tasks and achieve advanced-level intelligence, contemporary approaches either deploy standalone models for individual tasks, or design a multi-task paradigm with separate heads. However, they might suffer from accumulative errors or deficient task coordination. Instead, we argue that a favorable framework should be devised and optimized in pursuit of the ultimate goal, i.e., planning of the self-driving car. Oriented at this, we revisit the key components within perception and prediction, and prioritize the tasks such that all these tasks contribute to planning. We introduce Unified Autonomous Driving (UniAD), a comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query interfaces to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven by substantially outperforming previous state-of-the-arts in all aspects. Code and models are public.

TLDR

This work introduces Unified Autonomous Driving (UniAD), a comprehensive framework up-to-date that incorporates full-stack driving tasks in one network and is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective.

EMNLP

Faster Minimum Bayes Risk Decoding with Confidence-based Pruning

  • Julius Cheng, Andreas Vlachos

  • Conference on Empirical Methods in Natural Language Processing

  • November 25, 2023

Minimum Bayes risk (MBR) decoding outputs the hypothesis with the highest expected utility over the model distribution for some utility function. It has been shown to improve accuracy over beam search in conditional language generation problems and especially neural machine translation, in both human and automatic evaluations. However, the standard sampling-based algorithm for MBR is substantially more computationally expensive than beam search, requiring a large number of samples as well as a quadratic number of calls to the utility function, limiting its applicability. We describe an algorithm for MBR which gradually grows the number of samples used to estimate the utility while pruning hypotheses that are unlikely to have the highest utility according to confidence estimates obtained with bootstrap sampling. Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR while being statistically indistinguishable in terms of accuracy. We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.

TLDR

This work describes an algorithm for MBR which gradually grows the number of samples used to estimate the utility while pruning hypotheses that are unlikely to have the highest utility according to confidence estimates obtained with bootstrap sampling.

HRI

Lively: Enabling Multimodal, Lifelike, and Extensible Real-time Robot Motion

  • Andrew Schoen, Dakota Sullivan, Ze-dong Zhang, D. Rakita, Bilge Mutlu

  • Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction

  • March 13, 2023

Robots designed to interact with people in collaborative or social scenarios must move in ways that are consistent with the robot's task and communication goals. However, combining these goals in a naïve manner can result in mutually exclusive solutions, or infeasible or problematic states and actions. In this paper, we present Lively, a framework which supports configurable, real-time, task-based and communicative or socially-expressive motion for collaborative and social robotics across multiple levels of programmatic accessibility. Lively supports a wide range of control methods (i.e. position, orientation, and joint-space goals), and balances them with complex procedural behaviors for natural, lifelike motion that are effective in collaborative and social contexts. We discuss the design of three levels of programmatic accessibility of Lively, including a graphical user interface for visual design called LivelyStudio, the core library Lively for full access to its capabilities for developers, and an extensible architecture for greater customizability and capability.

TLDR

This paper discusses the design of three levels of programmatic accessibility of Lively, including a graphical user interface for visual design called LivelyStudio, the core library Lively for full access to its capabilities for developers, and an extensible architecture for greater customizability and capability.

Interactive Policy Shaping for Human-Robot Collaboration with Transparent Matrix Overlays

  • Jake Brawer, Debasmita Ghose, Kate Candon, Meiying Qin, A. Roncone, Marynel Vázquez, B. Scassellati

  • Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction

  • March 13, 2023

One important aspect of effective human--robot collaborations is the ability for robots to adapt quickly to the needs of humans. While techniques like deep reinforcement learning have demonstrated success as sophisticated tools for learning robot policies, the fluency of human-robot collaborations is often limited by these policies' inability to integrate changes to a user's preferences for the task. To address these shortcomings, we propose a novel approach that can modify learned policies at execution time via symbolic if-this-then-that rules corresponding to a modular and superimposable set of low-level constraints on the robot's policy. These rules, which we call Transparent Matrix Overlays, function not only as succinct and explainable descriptions of the robot's current strategy but also as an interface by which a human collaborator can easily alter a robot's policy via verbal commands. We demonstrate the efficacy of this approach on a series of proof-of-concept cooking tasks performed in simulation and on a physical robot.

TLDR

A novel approach that can modify learned policies at execution time via symbolic if-this-then-that rules corresponding to a modular and superimposable set of low-level constraints on the robot's policy is proposed.

Exploring Machine-like Behaviors for Socially Acceptable Robot Navigation in Elevators

  • Danilo Gallo, Shreepriya Gonzalez Jimenez, Antonietta Grasso, Cécile Boulard, T. Colombino

  • 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)

  • March 7, 2022

In this paper, we present our ongoing research on socially acceptable robot navigation for an indoor elevator sharing scenario. Informed by naturalistic observations of human elevator use, we discuss the social nuances involved in a seemingly simple activity like taking an elevator and the challenges and limitations of modeling robot behaviors based on a full human-like approach. We propose the principle of machine-like for the design of robot behavior policies that effectively accomplish tasks without being disruptive to the routines of people sharing the elevator with the robots. We explored this approach in a bodystorming session and conducted a preliminary evaluation of the resulting considerations through an online user study. Participants differentiated robots from humans for issues of proxemics and priority, and machine-like behaviors were preferred over human-like behaviors. We present our findings and discuss the advantages and limitations identified for both approaches for designing socially acceptable navigation behaviors.

TLDR

The principle of machine-like is proposed for the design of robot behavior policies that effectively accomplish tasks without being disruptive to the routines of people sharing the elevator with the robots.

MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning

  • Mariah L. Schrum, Erin Hedlund-Botti, Nina Moorman, M. Gombolay

  • 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)

  • March 7, 2022

Learning from demonstration (LfD) techniques seek to enable users without computer programming experience to teach robots novel tasks. There are generally two types of LfD: human- and robot-centric. While human-centric learning is intuitive, human centric learning suffers from performance degradation due to covariate shift. Robot-centric approaches, such as Dataset Aggregation (DAgger), address covariate shift but can struggle to learn from suboptimal human teachers. To create a more human-aware version of robot-centric LfD, we present Mutual Information-driven Meta-learning from Demonstration (MIND MELD). MIND MELD meta-learns a mapping from suboptimal and heterogeneous human feedback to optimal labels, thereby improving the learning signal for robot-centric LfD. The key to our approach is learning an informative personalized em-bedding using mutual information maximization via variational inference. The embedding then informs a mapping from human provided labels to optimal labels. We evaluate our framework in a human-subjects experiment, demonstrating that our approach improves corrective labels provided by human demonstrators. Our framework outperforms baselines in terms of ability to reach the goal $(p <. 001)$, average distance from the goal $(p=.006)$, and various subjective ratings $(p=.008)$.

TLDR

MIND MELD meta- learns a mapping from suboptimal and heterogeneous human feedback to optimal labels, thereby improving the learning signal for robot-centric LfD, and is evaluated in a human-subjects experiment, demonstrating that the approach improves corrective labels provided by human demonstrators.

REGROUP: A Robot-Centric Group Detection and Tracking System

  • Angelique Taylor, L. Riek

  • 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)

  • March 7, 2022

To facilitate HRI's transition from dyadic to group interaction, new methods are needed for robots to sense and understand team behavior. We introduce the Robot-Centric Group Detection and Tracking System (REGROUP), a new method that enables robots to detect and track groups of people from an ego-centric perspective using a crowd-aware, tracking-by-detection approach. Our system employs a novel technique that leverages person re-identification deep learning features to address the group data association problem. REGROUP is robust to real-world vision challenges such as occlusion, camera egomotion, shadow, and varying lighting illuminations. Also, it runs in real-time on real-world data. We show that REGROUP outperformed three group detection methods by up to 40% in terms of precision and up to 18 % in terms of recall. Also, we show that REGROUP's group tracking method outperformed three state-of-the-art methods by up to 66% in terms of tracking accuracy and 20% in terms of tracking precision. We plan to publicly release our system to support HRI teaming research and development. We hope this work will enable the development of robots that can more effectively locate and perceive their teammates, particularly in uncertain, unstructured environments.

TLDR

The Robot-Centric Group Detection and Tracking System (REGROUP), a new method that enables robots to detect and track groups of people from an ego-centric perspective using a crowd-aware, tracking-by-detection approach, employs a novel technique that employs person re-identification deep learning features to address the group data association problem.

ICLR

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

  • X. Y. Han, V. Papyan, D. Donoho

  • ArXiv

  • June 3, 2021

The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works demonstrated that deep nets trained with mean squared error (MSE) loss perform comparably to those trained with CE. As a preliminary, we empirically establish that NC emerges in such MSE-trained deep nets as well through experiments on three canonical networks and five benchmark datasets. We provide, in a Google Colab notebook, PyTorch code for reproducing MSE-NC and CE-NC: at https://colab.research.google.com/github/neuralcollapse/neuralcollapse/blob/main/neuralcollapse.ipynb. The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC. We develop three main contributions: (I) We show a new decomposition of the MSE loss into (A) terms directly interpretable through the lens of NC and which assume the last-layer classifier is exactly the least-squares classifier; and (B) a term capturing the deviation from this least-squares classifier. (II) We exhibit experiments on canonical datasets and networks demonstrating that term-(B) is negligible during training. This motivates us to introduce a new theoretical construct: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. (III) By studying renormalized gradient flow along the central path, we derive exact dynamics that predict NC.

TLDR

The recently discovered Neural Collapse phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy loss towards zero, and a new theoretical construct is introduced: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics.

ICML

The Importance of Non-Markovianity in Maximum State Entropy Exploration

  • Mirco Mutti, Ric De Santi, Marcello Restelli

  • International Conference on Machine Learning

  • February 7, 2022

In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing. Hazan et al. (2019) noted that the class of Markovian stochastic policies is sufficient for the maximum state entropy objective, and exploiting non-Markovianity is generally considered pointless in this setting. In this paper, we argue that non-Markovianity is instead paramount for maximum state entropy exploration in a finite-sample regime. Especially, we recast the objective to target the expected entropy of the induced state visitations in a single trial. Then, we show that the class of non-Markovian deterministic policies is sufficient for the introduced objective, while Markovian policies suffer non-zero regret in general. However, we prove that the problem of finding an optimal non-Markovian policy is NP-hard. Despite this negative result, we discuss avenues to address the problem in a tractable way and how non-Markovian exploration could benefit the sample efficiency of online reinforcement learning in future works.

TLDR

This paper recast the objective to target the expected entropy of the induced state visitations in a single trial, and shows that the class of non-Markovian deterministic policies is sufficient for the introduced objective, while Markovian policies suffer non-zero regret in general.

Understanding Dataset Difficulty with V-Usable Information

  • Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta

  • International Conference on Machine Learning

  • December 31, 2021

Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what attributes make the dataset difficult for a given model. To address these questions, we frame dataset difficulty—w.r.t. a model V —as the lack of V - usable information (Xu et al., 2019), where a lower value indicates a more difficult dataset for V . We further introduce pointwise V -information ( PVI ) for measuring the difficulty of individual instances w.r.t. a given distribution. While standard evaluation metrics typically only compare different models for the same dataset, V - usable information and PVI also permit the converse: for a given model V , we can compare different datasets, as well as different instances/slices of the same dataset. Furthermore, our framework allows for the interpretability of different input attributes via transformations of the input, which we use to discover annotation artefacts in widely-used NLP benchmarks.

TLDR

This work frames dataset difficulty—w.r.t. a model V —as the lack of V - usable information (Xu et al., 2019), where a lower value indicates a more difficult dataset for V .

Stable Conformal Prediction Sets

  • Eugène Ndiaye

  • International Conference on Machine Learning

  • December 19, 2021

When one observes a sequence of variables $(x_1, y_1), \ldots, (x_n, y_n)$, Conformal Prediction (CP) is a methodology that allows to estimate a confidence set for $y_{n+1}$ given $x_{n+1}$ by merely assuming that the distribution of the data is exchangeable. CP sets have guaranteed coverage for any finite population size $n$. While appealing, the computation of such a set turns out to be infeasible in general, e.g. when the unknown variable $y_{n+1}$ is continuous. The bottleneck is that it is based on a procedure that readjusts a prediction model on data where we replace the unknown target by all its possible values in order to select the most probable one. This requires computing an infinite number of models, which often makes it intractable. In this paper, we combine CP techniques with classical algorithmic stability bounds to derive a prediction set computable with a single model fit. We demonstrate that our proposed confidence set does not lose any coverage guarantees while avoiding the need for data splitting as currently done in the literature. We provide some numerical experiments to illustrate the tightness of our estimation when the sample size is sufficiently large, on both synthetic and real datasets.

TLDR

This paper combines CP techniques with classical algorithmic stability bounds to derive a prediction set computable with a single model fit, and demonstrates that the proposed confidence set does not lose any coverage guarantees while avoiding the need for data splitting as currently done in the literature.

IJCAI

Levin Tree Search with Context Models

  • Laurent Orseau, Marcus Hutter, Levi H.S. Leli

  • International Joint Conference on Artificial Intelligence

  • May 26, 2023

Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy. This guarantee can be used as a loss function, which we call the LTS loss, to optimize neural networks representing the policy (LTS+NN). In this work we show that the neural network can be substituted with parameterized context models originating from the online compression literature (LTS+CM). We show that the LTS loss is convex under this new model, which allows for using standard convex optimization tools, and obtain convergence guarantees to the optimal parameters in an online setting for a given set of solution trajectories --- guarantees that cannot be provided for neural networks. The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second. Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik's cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques.

TLDR

This work shows that the neural network can be substituted with parameterized context models originating from the online compression literature (LTS+CM) and obtain convergence guarantees to the optimal parameters in an online setting for a given set of solution trajectories --- guarantees that cannot be provided for neural networks.

Plurality Veto: A Simple Voting Rule Achieving Optimal Metric Distortion

  • Fatih Erdem Kizilkaya, D. Kempe

  • International Joint Conference on Artificial Intelligence

  • June 14, 2022

The metric distortion framework posits that n voters and m candidates are jointly embedded in a metric space such that voters rank candidates that are closer to them higher. A voting rule's purpose is to pick a candidate with minimum total distance to the voters, given only the rankings, but not the actual distances. As a result, in the worst case, each deterministic rule picks a candidate whose total distance is at least three times larger than that of an optimal one, i.e., has distortion at least 3. A recent breakthrough result showed that achieving this bound of 3 is possible; however, the proof is non-constructive, and the voting rule itself is a complicated exhaustive search. Our main result is an extremely simple voting rule, called Plurality Veto, which achieves the same optimal distortion of 3. Each candidate starts with a score equal to his number of first-place votes. These scores are then gradually decreased via an n-round veto process in which a candidate drops out when his score reaches zero. One after the other, voters decrement the score of their bottom choice among the standing candidates, and the last standing candidate wins. We give a one-paragraph proof that this voting rule achieves distortion 3. This rule is also immensely practical, and it only makes two queries to each voter, so it has low communication overhead. We also show that a straightforward extension can be used to give a constructive proof of the more general Ranking-Matching Lemma of Gkatzelis et al. We also generalize Plurality Veto into a class of randomized voting rules in the following way: Plurality veto is run only for k < n rounds; then, a candidate is chosen with probability proportional to his residual score. This general rule interpolates between Random Dictatorship (for k=0) and Plurality Veto (for k=n-1), and k controls the variance of the output. We show that for all k, this rule has expected distortion at most 3.

TLDR

An extremely simple voting rule, called Plurality Veto, which achieves the same optimal distortion of 3, and it is shown that for all k, this rule has expected distortion at most 3.

KDD

All in One: Multi-Task Prompting for Graph Neural Networks

  • Xiangguo Sun, Hongtao Cheng, Jia Li, Bo Liu, J. Guan

  • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  • July 4, 2023

Recently, "pre-training and fine-tuning'' has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a "negative transfer'' to the specific application, leading to poor results. Inspired by the prompt learning in natural language processing (NLP), which has presented significant effectiveness in leveraging prior knowledge for various NLP tasks, we study the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.

TLDR

This paper proposes a novel multi-task prompting method for graph models that unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern, and introduces meta-learning to efficiently learn a better initialization for the multi- task prompt of graphs so that the prompting framework can be more reliable and general for different tasks.

Improving Training Stability for Multitask Ranking Models in Recommender Systems

  • Jiaxi Tang, Yoel Drori, Daryl Chang, M. Sathiamoorthy, J. Gilmer, Li Wei, Xinyang Yi, Lichan Hong, Ed H. Chi

  • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  • February 17, 2023

Recommender systems play an important role in many content platforms. While most recommendation research is dedicated to designing better models to improve user experience, we found that research on stabilizing the training for such models is severely under-explored. As recommendation models become larger and more sophisticated, they are more susceptible to training instability issues, i.e., loss divergence, which can make the model unusable, waste significant resources and block model developments. In this paper, we share our findings and best practices we learned for improving the training stability of a real-world multitask ranking model for YouTube recommendations. We show some properties of the model that lead to unstable training and conjecture on the causes. Furthermore, based on our observations of training dynamics near the point of training instability, we hypothesize why existing solutions would fail, and propose a new algorithm to mitigate the limitations of existing solutions. Our experiments on YouTube production dataset show the proposed algorithm can significantly improve training stability while not compromising convergence, comparing with several commonly used baseline methods.

TLDR

The findings and best practices learned for improving the training stability of a real-world multitask ranking model for YouTube recommendations are shared and a new algorithm is proposed to mitigate the limitations of existing solutions.

Learning Causal Effects on Hypergraphs

  • Jing Ma, Mengting Wan, Longqi Yang, Jundong Li, Brent J. Hecht, J. Teevan

  • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  • July 7, 2022

Hypergraphs provide an effective abstraction for modeling multi-way group interactions among nodes, where each hyperedge can connect any number of nodes. Different from most existing studies which leverage statistical dependencies, we study hypergraphs from the perspective of causality. Specifically, in this paper, we focus on the problem of individual treatment effect (ITE) estimation on hypergraphs, aiming to estimate how much an intervention (e.g., wearing face covering) would causally affect an outcome (e.g., COVID-19 infection) of each individual node. Existing works on ITE estimation either assume that the outcome on one individual should not be influenced by the treatment assignments on other individuals (i.e., no interference), or assume the interference only exists between pairs of connected individuals in an ordinary graph. We argue that these assumptions can be unrealistic on real-world hypergraphs, where higher-order interference can affect the ultimate ITE estimations due to the presence of group interactions. In this work, we investigate high-order interference modeling, and propose a new causality learning framework powered by hypergraph neural networks. Extensive experiments on real-world hypergraphs verify the superiority of our framework over existing baselines.

TLDR

This work investigates high-order interference modeling, and proposes a new causality learning framework powered by hypergraph neural networks, which is verified over existing baselines on real-world hypergraphs.

FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning

  • Zhen Wang, Weirui Kuang, Yuexiang Xie, Liuyi Yao, Yaliang Li, Bolin Ding, Jingren Zhou

  • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  • April 12, 2022

The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at https://github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.

TLDR

This paper presents the implemented package FederatedScope-GNN (FS-G), which provides a unified view for modularizing and expressing FGL algorithms, and employs FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits.

NEURIPS

Training Compute-Optimal Large Language Models

  • Jordan Hoffmann, Sebastian Borgeaud, A. Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, K. Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, L. Sifre

  • ArXiv

  • March 29, 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4$\times$ more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.

TLDR

This work trains a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4$\times$ more more data, and reaches a state-of-the-art average accuracy, greater than a 7% improvement over Gopher.

SIGIR

The Information Retrieval Experiment Platform

  • Maik Frobe, Jan Heinrich Reimer, Sean MacAvaney, Niklas Deckers, Simon Reich, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

  • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

  • May 30, 2023

We integrate irdatasets, ir_measures, and PyTerrier with TIRA in the Information Retrieval Experiment Platform (TIREx) to promote more standardized, reproducible, scalable, and even blinded retrieval experiments. Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are compatible with ir_datasets and ir_measures. However, none of this is a must for reproducibility and scalability, as TIRA can run any dockerized software locally or remotely in a cloud-native execution environment. Version control and caching ensure efficient (re)execution. TIRA allows for blind evaluation when an experiment runs on a remote server or cloud not under the control of the experimenter. The test data and ground truth are then hidden from public access, and the retrieval software has to process them in a sandbox that prevents data leaks. We currently host an instance of TIREx with 15 corpora (1.9~billion documents) on which 32 shared retrieval tasks are based. Using Docker images of 50~standard retrieval approaches, we automatically evaluated all approaches on all tasks (50 ⋅ 32 = 1,600 runs) in less than a week on a midsize cluster (1,620 cores and 24 GPUs). This instance of TIREx is open for submissions and will be integrated with the IR Anthology, as well as released open source.

TLDR

A Non-Factoid Question-Answering Taxonomy

  • Valeriia Bolotova, Vladislav Blinov, Falk Scholer, W. Bruce Croft, M. Sanderson

  • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

  • July 6, 2022

Non-factoid question answering (NFQA) is a challenging and under-researched task that requires constructing long-form answers, such as explanations or opinions, to open-ended non-factoid questions - NFQs. There is still little understanding of the categories of NFQs that people tend to ask, what form of answers they expect to see in return, and what the key research challenges of each category are. This work presents the first comprehensive taxonomy of NFQ categories and the expected structure of answers. The taxonomy was constructed with a transparent methodology and extensively evaluated via crowdsourcing. The most challenging categories were identified through an editorial user study. We also release a dataset of categorised NFQs and a question category classifier. Finally, we conduct a quantitative analysis of the distribution of question categories using major NFQA datasets, showing that the NFQ categories that are the most challenging for current NFQA systems are poorly represented in these datasets. This imbalance may lead to insufficient system performance for challenging categories. The new taxonomy, along with the category classifier, will aid research in the area, helping to create more balanced benchmarks and to focus models on addressing specific categories.

TLDR

This work presents the first comprehensive taxonomy of NFQ categories and the expected structure of answers, constructed with a transparent methodology and extensively evaluated via crowdsourcing.

WWW

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

  • C. Hays, Zachary Schutzman, Manish Raghavan, Erin Walk, Philipp Zimmer

  • Proceedings of the ACM Web Conference 2023

  • January 17, 2023

Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near-perfect performance for classification on existing datasets, suggesting bot detection is accurate, reliable and fit for use in downstream applications. We provide evidence that this is not the case and show that high performance is attributable to limitations in dataset collection and labeling rather than sophistication of the tools. Specifically, we show that simple decision rules — shallow decision trees trained on a small number of features — achieve near-state-of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets. Our findings reveal that predictions are highly dependent on each dataset’s collection and labeling procedures rather than fundamental differences between bots and humans. These results have important implications for both transparency in sampling and labeling procedures and potential biases in research using existing bot detection tools for pre-processing.

TLDR

It is shown that simple decision rules — shallow decision trees trained on a small number of features — achieve near-state- of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets.

Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

  • Francesco Fabbri, Yanhao Wang, F. Bonchi, C. Castillo, M. Mathioudakis

  • Proceedings of the ACM Web Conference 2022

  • February 1, 2022

Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more and more radicalized content, eventually being trapped in what we call a “radicalization pathway”. In this paper, we study the problem of mitigating radicalization pathways using a graph-based approach. Specifically, we model the set of recommendations of a “what-to-watch-next” recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions. We measure the “segregation” score of a node representing radicalized content as the expected length of a random walk from that node to any node representing non-radicalized content. High segregation scores are associated to larger chances to get users trapped in radicalization pathways. Hence, we define the problem of reducing the prevalence of radicalization pathways by selecting a small number of edges to “rewire”, so to minimize the maximum of segregation scores among all radicalized nodes, while maintaining the relevance of the recommendations. We prove that the problem of finding the optimal set of recommendations to rewire is NP-hard and NP-hard to approximate within any factor. Therefore, we turn our attention to heuristics, and propose an efficient yet effective greedy algorithm based on the absorbing random walk theory. Our experiments on real-world datasets in the context of video and news recommendations confirm the effectiveness of our proposal.

TLDR

This paper models the set of recommendations of a “what-to-watch-next” recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions, and proposes an efficient yet effective greedy algorithm based on the absorbing random walk theory.

Theory

FOCS
SODA

Dynamic Algorithms for Maximum Matching Size

  • Soheil Behnezhad

  • ACM-SIAM Symposium on Discrete Algorithms

  • July 15, 2022

We study fully dynamic algorithms for maximum matching. This is a well-studied problem, known to admit several update-time/approximation trade-offs. For instance, it is known how to maintain a 1/2-approximate matching in $\log^{O(1)} n$ update time or a $2/3$-approximate matching in $O(\sqrt{n})$ update time, where $n$ is the number of vertices. It has been a long-standing open problem to determine whether either of these bounds can be improved. In this paper, we show that when the goal is to maintain just the size of the matching (and not its edge-set), then these bounds can indeed be improved. First, we give an algorithm that takes $\log^{O(1)} n$ update-time and maintains a $.501$-approximation ($.585$-approximation if the graph is bipartite). Second, we give an algorithm that maintains a $(2/3 + \Omega(1))$-approximation in $O(\sqrt{n})$ time for bipartite graphs. Our results build on new connections to sublinear time algorithms. In particular, a key tool for both is an algorithm of the author for estimating the size of maximal matchings in $\widetilde{O}(n)$ time [Behnezhad; FOCS 2021]. Our second result also builds on the edge-degree constrained subgraph (EDCS) of Bernstein and Stein [ICALP'15, SODA'16]. In particular, while it has been known that EDCS may not include a better than 2/3-approximation, we give a new characterization of such tight instances which allows us to break it. We believe this characterization might be of independent interest.

TLDR

This paper gives an algorithm that takes $\log^{O(1)} n$ update-time and maintains a $(2/3 + \Omega(1)$-approximation in $O(\sqrt{n})$ time for bipartite graphs, and gives a new characterization of such tight instances which allows us to break EDCS.

New Diameter-Reducing Shortcuts and Directed Hopsets: Breaking the $\sqrt{n}$ Barrier

  • Shimon Kogan, Merav Parter

  • ACM-SIAM Symposium on Discrete Algorithms

  • November 25, 2021

For an n-vertex digraph G = (V, E), a shortcut set is a (small) subset of edges H taken from the transitive closure of G that, when added to G guarantees that the diameter of G ∪ H is small. Shortcut sets, introduced by Thorup in 1993, have a wide range of applications in algorithm design, especially in the context of parallel, distributed and dynamic computation on directed graphs. A folklore result in this context shows that every n-vertex digraph admits a shortcut set of linear size (i.e., of O(n) edges) that reduces the diameter to1 . Despite extensive research over the years, the question of whether one can reduce the diameter to with Õ(n) shortcut edges has been left open. We provide the first improved diameter-sparsity tradeoff for this problem, breaking the diameter barrier. Specifically, we show an O(nω)-time randomized algorithm2 for computing a linear shortcut set that reduces the diameter of the digraph to Õ(n1/3). This narrows the gap w.r.t the current diameter lower bound of Ω(n1/6) by [Huang and Pettie, SWAT'18]. Moreover, we show that a diameter of O(n1/2) can in fact be achieved with a sublinear number of O(n3/4) shortcut edges. Formally, letting S(n, D) be the bound on the size of the shortcut set required in order to reduce the diameter of any n-vertex digraph to at most D, our algorithms yield: S(n, D) = { Õ(n2/D3), for D ≤ n1/3, Õ((n/D)3/2), for D > n1/3 . We also extend our algorithms to provide improved (β, ∊) hopsets for n-vertex weighted directed graphs.

TLDR

It is shown that a diameter of Õ(n1/2) can in fact be achieved with a sublinear number of O(n3/4) shortcut edges, and the first improved diameter-sparsity tradeoff is provided, breaking the √ n diameter barrier.

STOC

Doubly Efficient Private Information Retrieval and Fully Homomorphic RAM Computation from Ring LWE

  • Wei-Kai Lin, Ethan Mook, Daniel Wichs

  • Proceedings of the 55th Annual ACM Symposium on Theory of Computing

  • June 2, 2023

A (single server) private information retrieval (PIR) allows a client to read data from a public database held on a remote server, without revealing to the server which locations she is reading. In a doubly efficient PIR (DEPIR), the database is first preprocessed, but the server can subsequently answer any client’s query in time that is sub-linear in the database size. Prior work gave a plausible candidate for a public-key variant of DEPIR, where a trusted party is needed to securely preprocess the database and generate a corresponding public key for the clients; security relied on a new non-standard code-based assumption and a heuristic use of ideal obfuscation. In this work we construct the stronger unkeyed notion of DEPIR, where the preprocessing is a deterministic procedure that the server can execute on its own. Moreover, we prove security under just the standard ring learning-with-errors (RingLWE) assumption. For a database of size N and any constant ε>0, the preprocessing run-time and size is O(N1+ε), while the run-time and communication-complexity of each PIR query is polylog(N). We also show how to update the preprocessed database in time O(Nε). Our approach is to first construct a standard PIR where the server’s computation consists of evaluating a multivariate polynomial; we then convert it to a DEPIR by preprocessing the polynomial to allow for fast evaluation, using the techniques of Kedlaya and Umans (STOC ’08). Building on top of our DEPIR, we construct general fully homomorphic encryption for random-access machines (RAM-FHE), which allows a server to homomorphically evaluate an arbitrary RAM program P over a client’s encrypted input x and the server’s preprocessed plaintext input y to derive an encryption of the output P(x,y) in time that scales with the RAM run-time of the computation rather than its circuit size. Prior work only gave a heuristic candidate construction of a restricted notion of RAM-FHE. In this work, we construct RAM-FHE under the RingLWE assumption with circular security. For a RAM program P with worst-case run-time T, the homomorphic evaluation runs in time T1+ε · (|x| + |y|).

TLDR

This work constructs the stronger unkeyed notion of DEPIR, where the preprocessing is a deterministic procedure that the server can execute on its own, and proves security under just the standard ring learning-with-errors (RingLWE) assumption.

The Randomized 𝑘-Server Conjecture Is False!

  • Sébastien Bubeck, Christian Coester, Y. Rabani

  • Proceedings of the 55th Annual ACM Symposium on Theory of Computing

  • November 10, 2022

We prove a few new lower bounds on the randomized competitive ratio for the k-server problem and other related problems, resolving some long-standing conjectures. In particular, for metrical task systems (MTS) we asympotically settle the competitive ratio and obtain the first improvement to an existential lower bound since the introduction of the model 35 years ago (in 1987). More concretely, we show: (1) There exist (k+1)-point metric spaces in which the randomized competitive ratio for the k-server problem is Ω(log2 k). This refutes the folklore conjecture (which is known to hold in some families of metrics) that in all metric spaces with at least k+1 points, the competitive ratio is Θ(logk). (2) Consequently, there exist n-point metric spaces in which the randomized competitive ratio for MTS is Ω(log2 n). This matches the upper bound that holds for all metrics. The previously best existential lower bound was Ω(logn) (which was known to be tight for some families of metrics). (3) For all k<n∈, for all n-point metric spaces the randomized k-server competitive ratio is at least Ω(logk), and consequently the randomized MTS competitive ratio is at least Ω(logn). These universal lower bounds are asymptotically tight. The previous bounds were Ω(logk/loglogk) and Ω(logn/loglogn), respectively. (4) The randomized competitive ratio for the w-set metrical service systems problem, and its equivalent width-w layered graph traversal problem, is Ω(w2). This slightly improves the previous lower bound and matches the recently discovered upper bound. (5) Our results imply improved lower bounds for other problems like k-taxi, distributed paging, and metric allocation. These lower bounds share a common thread, and other than the third bound, also a common construction.

TLDR

It is shown that for metrical task systems (MTS) the competitive ratio is settled and the first improvement to an existential lower bound since the introduction of the model 35 years ago is obtained.

Locally testable codes with constant rate, distance, and locality

  • Irit Dinur, Shai Evra, R. Livne, A. Lubotzky, S. Mozes

  • Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing

  • November 8, 2021

A locally testable code (LTC) is an error correcting code that has a property-tester. The tester reads q bits that are randomly chosen, and rejects words with probability proportional to their distance from the code. The parameter q is called the locality of the tester. LTCs were initially studied as important components of probabilistically checkable proofs (PCP), and since then the topic has evolved on its own. High rate LTCs could be useful in practice: before attempting to decode a received word, one can save time by first quickly testing if it is close to the code. An outstanding open question has been whether there exist “c3-LTCs”, namely LTCs with constant rate, constant distance, and constant locality. In this work we construct such codes based on a new two-dimensional complex which we call a left-right Cayley complex. This is essentially a graph which, in addition to vertices and edges, also has squares. Our codes can be viewed as a two-dimensional version of (the one-dimensional) expander codes, where the codewords are functions on the squares rather than on the edges.

TLDR

This work constructs LTCs with constant rate, constant distance, and constant locality based on a new two-dimensional complex which they call a left-right Cayley complex, which is essentially a graph which, in addition to vertices and edges, also has squares.

Latest News & Updates

Case Study: Iterative Design for Skimming Support

Case Study: Iterative Design for Skimming Support

How might we help researchers quickly assess the relevance of scientific literature? Take a closer look at Skimming, Semantic Reader’s latest AI feature, and the collaborative design process behind it.

Behind the Scenes of Semantic Scholar’s New Author Influence Design

Behind the Scenes of Semantic Scholar’s New Author Influence Design

We released a new version of Author Influence interface to help scholars better discover other scholars in their fields. Here's how we identified user insights and made those design choices.

Artificial-intelligence search engines wrangle academic literature

Artificial-intelligence search engines wrangle academic literature

Nature had a chat with Dan Weld, Chief Scientist at Semantic Scholar, to discuss how search engines are helping scientists explore and innovate by making it easier to draw connections from a massive collection of scientific literature.

Experience a smarter way to search and discover scholarly research.

Create Your Account