Scholar's Hub

Top Saved Papers

Welcome to a curated collection of papers that have earned a special place in the research endeavors of scholars on Semantic Scholar! These papers stand out as the most saved to the library by our global community of researchers. Explore this special list of papers that researchers find essential and valuable on their academic journey.

If you have any feedback or suggestions, please contact us.

Last updated: December 21st, 2023

Illustration: Trending Papers

Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data

  • Linton et al.

  • medRxiv

  • January 28, 2020

The geographic spread of 2019 novel coronavirus (COVID-19) infections from the epicenter of Wuhan, China, has provided an opportunity to study the natural history of the recently emerged virus. Using publicly available event-date data from the ongoing epidemic, the present study investigated the incubation period and other time intervals that govern the epidemiological dynamics of COVID-19 infections. Our results show that the incubation period falls within the range of 2-14 days with 95% confidence and has a mean of around 5 days when approximated using the best-fit lognormal distribution. The mean time from illness onset to hospital admission (for treatment and/or isolation) was estimated at 3-4 days without truncation and at 5-9 days when right truncated. Based on the 95th percentile estimate of the incubation period, we recommend that the length of quarantine should be at least 14 days. The median time delay of 13 days from illness onset to death (17 days with right truncation) should be considered when estimating the COVID-19 case fatality risk.

TLDR

The incubation period falls within the range of 2-14 days with 95% confidence and has a mean of around 5 days when approximated using the best-fit lognormal distribution and it is recommended that the length of quarantine should be at least 14 days.

Deep Residual Learning for Image Recognition

  • He et al.

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • December 10, 2015

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers - 8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

TLDR

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Language Models are Few-Shot Learners

  • Brown et al.

  • NeurIPS

  • May 28, 2020

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

TLDR

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

How do glucocorticoids influence stress responses? Integrating permissive, suppressive, stimulatory, and preparative actions.

  • Sapolsky et al.

  • Endocrine reviews

  • February 1, 2000

The secretion of glucocorticoids (GCs) is a classic endocrine response to stress. Despite that, it remains controversial as to what purpose GCs serve at such times. One view, stretching back to the time of Hans Selye, posits that GCs help mediate the ongoing or pending stress response, either via basal levels of GCs permitting other facets of the stress response to emerge efficaciously, and/or by stress levels of GCs actively stimulating the stress response. In contrast, a revisionist viewpoint posits that GCs suppress the stress response, preventing it from being pathologically overactivated. In this review, we consider recent findings regarding GC action and, based on them, generate criteria for determining whether a particular GC action permits, stimulates, or suppresses an ongoing stress-response or, as an additional category, is preparative for a subsequent stressor. We apply these GC actions to the realms of cardiovascular function, fluid volume and hemorrhage, immunity and inflammation, metabolism, neurobiology, and reproductive physiology. We find that GC actions fall into markedly different categories, depending on the physiological endpoint in question, with evidence for mediating effects in some cases, and suppressive or preparative in others. We then attempt to assimilate these heterogeneous GC actions into a physiological whole.

TLDR

This review considers recent findings regarding GC action and generates criteria for determining whether a particular GC action permits, stimulates, or suppresses an ongoing stress-response or, as an additional category, is preparative for a subsequent stressor.

Imaging and clinical features of patients with 2019 novel coronavirus SARS-CoV-2

  • Xu et al.

  • European Journal of Nuclear Medicine and Molecular Imaging

  • February 28, 2020

Background The pneumonia caused by the 2019 novel coronavirus (SARS-CoV-2, also called 2019-nCoV) recently break out in Wuhan, China, and was named as COVID-19. With the spread of the disease, similar cases have also been confirmed in other regions of China. We aimed to report the imaging and clinical characteristics of these patients infected with SARS-CoV-2 in Guangzhou, China. Methods All patients with laboratory-identified SARS-CoV-2 infection by real-time polymerase chain reaction (PCR) were collected between January 23, 2020, and February 4, 2020, in a designated hospital (Guangzhou Eighth People’s Hospital). This analysis included 90 patients (39 men and 51 women; median age, 50 years (age range, 18–86 years). All the included SARS-CoV-2-infected patients underwent non-contrast enhanced chest computed tomography (CT). We analyzed the clinical characteristics of the patients, as well as the distribution characteristics, pattern, morphology, and accompanying manifestations of lung lesions. In addition, after 1–6 days (mean 3.5 days), follow-up chest CT images were evaluated to assess radiological evolution. Findings The majority of infected patients had a history of exposure in Wuhan or to infected patients and mostly presented with fever and cough. More than half of the patients presented bilateral, multifocal lung lesions, with peripheral distribution, and 53 (59%) patients had more than two lobes involved. Of all included patients, COVID-19 pneumonia presented with ground glass opacities in 65 (72%), consolidation in 12 (13%), crazy paving pattern in 11 (12%), interlobular thickening in 33 (37%), adjacent pleura thickening in 50 (56%), and linear opacities combined in 55 (61%). Pleural effusion, pericardial effusion, and lymphadenopathy were uncommon findings. In addition, baseline chest CT did not show any abnormalities in 21 patients (23%), but 3 patients presented bilateral ground glass opacities on the second CT after 3–4 days. Conclusion SARS-CoV-2 infection can be confirmed based on the patient’s history, clinical manifestations, imaging characteristics, and laboratory tests. Chest CT examination plays an important role in the initial diagnosis of the novel coronavirus pneumonia. Multiple patchy ground glass opacities in bilateral multiple lobular with periphery distribution are typical chest CT imaging features of the COVID-19 pneumonia.

TLDR

SARS-CoV-2 infection can be confirmed based on the patient’s history, clinical manifestations, imaging characteristics, and laboratory tests, including chest CT imaging features of the COVID-19 pneumonia.

Training language models to follow instructions with human feedback

  • Ouyang et al.

  • NeurIPS

  • March 4, 2022

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

TLDR

The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.

Learning Transferable Visual Models From Natural Language Supervision

  • Radford et al.

  • ICML

  • February 26, 2021

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.

TLDR

It is demonstrated that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.

Adam: A Method for Stochastic Optimization

  • Kingma et al.

  • ICLR

  • December 22, 2014

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

TLDR

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Language Models are Unsupervised Multitask Learners

  • Radford et al.

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples. The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

TLDR

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

A Simple Framework for Contrastive Learning of Visual Representations

  • Chen et al.

  • ICML

  • February 13, 2020

This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.

TLDR

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

LoRA: Low-Rank Adaptation of Large Language Models

  • Hu et al.

  • ICLR

  • June 17, 2021

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

TLDR

Low-Rank Adaptation, or LoRA, is proposed, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Internet of Things

  • Zhang et al.

  • Communications in Computer and Information Science

  • September 1, 2012

We are witnessing the dawn of a new era of Internet of Things (IoT; also known as Internet of Objects). Generally speaking, IoT refers to the networked interconnection of everyday objects, which are often equipped with ubiquitous intelligence. IoT will increase the ubiquity of the Internet by integrating every object for interaction via embedded systems, which leads to a highly distributed network of devices communicating with human beings as well as other devices. Thanks to rapid advances in underlying technologies, IoT is opening tremendous opportunities for a large number of novel applications that promise to improve the quality of our lives. In recent years, IoT has gained much attention from researchers and practitioners from around the world. This special issue is focused on the latest results in the area of IoT. In response to our call for papers, we have received a very large number of submissions, out of which eight papers are finally accepted as a result of a thorough review process by international experts in respective areas. The selection provides a fresh snapshot of the state-of-the-art research in the field. Radio frequency identification is a dispensable technology for IoT. In the paper ‘Code division multiple access/pulse position modulation ultra-wideband radio frequency identification for Internet of Things: concept and analysis’, Zhang et al. propose to utilize low-pulse-rate code division multiple-access/pulse position modulation ultra-wideband in the tag-to-reader link to provide multiple tag access capability and build a high-throughput radio frequency identification system for IoT. To optimize the network throughput, they design an effective Medium Access Control protocol as well as a dynamic frame size adjustment algorithm. Channel assignment can considerably affect network throughput. The paper ‘A dynamic channel assignment strategy based on cross-layer design for wireless mesh networks’ deals with this issue, with focus on allocating channels according to the status of adjacent links, that is, dynamic channel assignment. The authors propose a routing-information-aware channel assignment algorithm based on a cross-layer design. The proposed method can dynamically allocate channels for wireless nodes when they need communications and release channels after data transmission. In this way, limited channel resources can be used efficiently by more wireless nodes. As a consequence, the communication throughput can be improved. Collisions and interferences among nodes pose a challenge for data aggregation in many applications. The paper ‘An energy efficient medium access control protocol for target tracking based on dynamic convey tree collaboration in wireless sensor networks’ addresses this issue. The authors refine slot allocation to nodes in a dynamic convey tree and design an energy-efficient Medium Access Control protocol called dynamic-time division multiple access. The dynamic-time division multiple-access protocol avoids collisions and interferences and allocates contiguous active slots to nodes as far as possible during data aggregation from leaf nodes to a root node. As a result, energy consumption in switching from sleep to active state can be reduced.

TLDR

This special issue is focused on the latest results in the area of IoT, with a focus on code division multiple access/pulse position modulation ultra-wideband radio frequency identification for Internet of Things and an effective Medium Access Control protocol.

Neural Machine Translation by Jointly Learning to Align and Translate

  • Bahdanau et al.

  • ICLR

  • September 1, 2014

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

TLDR

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Improving Language Understanding by Generative Pre-Training

  • Radford et al.

Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).

TLDR

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Latest News & Updates

Case Study: Iterative Design for Skimming Support

Case Study: Iterative Design for Skimming Support

How might we help researchers quickly assess the relevance of scientific literature? Take a closer look at Skimming, Semantic Reader’s latest AI feature, and the collaborative design process behind it.

Behind the Scenes of Semantic Scholar’s New Author Influence Design

Behind the Scenes of Semantic Scholar’s New Author Influence Design

We released a new version of Author Influence interface to help scholars better discover other scholars in their fields. Here's how we identified user insights and made those design choices.

Artificial-intelligence search engines wrangle academic literature

Artificial-intelligence search engines wrangle academic literature

Nature had a chat with Dan Weld, Chief Scientist at Semantic Scholar, to discuss how search engines are helping scientists explore and innovate by making it easier to draw connections from a massive collection of scientific literature.

Experience a smarter way to search and discover scholarly research.

Create Your Account