16 research outputs found
Screening the stones of Venice: Mapping social perceptions of cultural significance through graph-based semi-supervised classification
Mapping cultural significance of heritage properties in urban environment from the perspective of the public has become an increasingly relevant process, as highlighted by the 2011 UNESCO Recommendation on the Historic Urban Landscape (HUL). With the ubiquitous use of social media and the prosperous developments in machine and deep learning, it has become feasible to collect and process massive amounts of information produced by online communities about their perceptions of heritage as social constructs. Moreover, such information is usually inter-connected and embedded within specific socioeconomic and spatiotemporal contexts. This paper presents a methodological workflow for using semi-supervised learning with graph neural networks (GNN) to classify, summarize, and map cultural significance categories based on user-generated content on social media. Several GNN models were trained as an ensemble to incorporate the multi-modal (visual and textual) features and the contextual (temporal, spatial, and social) connections of social media data in an attributed multi-graph structure. The classification results with different models were aligned and evaluated with the prediction confidence and agreement. Furthermore, message diffusion methods on graphs were proposed to aggregate the post labels onto their adjacent spatial nodes, which helps to map the cultural significance categories in their geographical contexts. The workflow is tested on data gathered from Venice as a case study, demonstrating the generation of social perception maps for this UNESCO World Heritage property. This research framework could also be applied in other cities worldwide, contributing to more socially inclusive heritage management processes. Furthermore, the proposed methodology holds the potential of diffusing any human-generated location-based information onto spatial networks and temporal timelines, which could be beneficial for measuring the safety, vitality, and/or popularity of urban spaces
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Pre-trained language models have attracted increasing attention in the
biomedical domain, inspired by their great success in the general natural
language domain. Among the two main branches of pre-trained language models in
the general language domain, i.e., BERT (and its variants) and GPT (and its
variants), the first one has been extensively studied in the biomedical domain,
such as BioBERT and PubMedBERT. While they have achieved great success on a
variety of discriminative downstream biomedical tasks, the lack of generation
ability constrains their application scope. In this paper, we propose BioGPT, a
domain-specific generative Transformer language model pre-trained on large
scale biomedical literature. We evaluate BioGPT on six biomedical NLP tasks and
demonstrate that our model outperforms previous models on most tasks.
Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI
end-to-end relation extraction tasks respectively, and 78.2% accuracy on
PubMedQA, creating a new record. Our case study on text generation further
demonstrates the advantage of BioGPT on biomedical literature to generate
fluent descriptions for biomedical terms. Code is available at
https://github.com/microsoft/BioGPT.Comment: Published at Briefings in Bioinformatics. Code is available at
https://github.com/microsoft/BioGP
Analyzing and Mitigating Interference in Neural Architecture Search
Weight sharing is a popular approach to reduce the cost of neural
architecture search (NAS) by reusing the weights of shared operators from
previously trained child models. However, the rank correlation between the
estimated accuracy and ground truth accuracy of those child models is low due
to the interference among different child models caused by weight sharing. In
this paper, we investigate the interference issue by sampling different child
models and calculating the gradient similarity of shared operators, and
observe: 1) the interference on a shared operator between two child models is
positively correlated with the number of different operators; 2) the
interference is smaller when the inputs and outputs of the shared operator are
more similar. Inspired by these two observations, we propose two approaches to
mitigate the interference: 1) MAGIC-T: rather than randomly sampling child
models for optimization, we propose a gradual modification scheme by modifying
one operator between adjacent optimization steps to minimize the interference
on the shared operators; 2) MAGIC-A: forcing the inputs and outputs of the
operator across all child models to be similar to reduce the interference.
Experiments on a BERT search space verify that mitigating interference via each
of our proposed methods improves the rank correlation of super-pet and
combining both methods can achieve better results. Our discovered architecture
outperforms RoBERTa by 1.1 and 0.6 points and ELECTRA
by 1.6 and 1.1 points on the dev and test set of GLUE benchmark. Extensive
results on the BERT compression, reading comprehension and ImageNet task
demonstrate the effectiveness and generality of our proposed methods.Comment: ICML 2022, Spotligh
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Error correction techniques have been used to refine the output sentences
from automatic speech recognition (ASR) models and achieve a lower word error
rate (WER) than original ASR outputs. Previous works usually use a
sequence-to-sequence model to correct an ASR output sentence autoregressively,
which causes large latency and cannot be deployed in online ASR services. A
straightforward solution to reduce latency, inspired by non-autoregressive
(NAR) neural machine translation, is to use an NAR sequence generation model
for ASR error correction, which, however, comes at the cost of significantly
increased ASR error rate. In this paper, observing distinctive error patterns
and correction operations (i.e., insertion, deletion, and substitution) in ASR,
we propose FastCorrect, a novel NAR error correction model based on edit
alignment. In training, FastCorrect aligns each source token from an ASR output
sentence to the target tokens from the corresponding ground-truth sentence
based on the edit distance between the source and target sentences, and
extracts the number of target tokens corresponding to each source token during
edition/correction, which is then used to train a length predictor and to
adjust the source tokens to match the length of the target sentence for
parallel generation. In inference, the token number predicted by the length
predictor is used to adjust the source tokens for target sequence generation.
Experiments on the public AISHELL-1 dataset and an internal industrial-scale
ASR dataset show the effectiveness of FastCorrect for ASR error correction: 1)
it speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER
reduction) compared with the autoregressive correction model; and 2) it
outperforms the popular NAR models adopted in neural machine translation and
text edition by a large margin.Comment: NeurIPS 2021. Code URL: https://github.com/microsoft/NeuralSpeec
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Generalist foundation models such as GPT-4 have displayed surprising
capabilities in a wide variety of domains and tasks. Yet, there is a prevalent
assumption that they cannot match specialist capabilities of fine-tuned models.
For example, most explorations to date on medical competency benchmarks have
leveraged domain-specific training, as exemplified by efforts on BioGPT and
Med-PaLM. We build on a prior study of GPT-4's capabilities on medical
challenge benchmarks in the absence of special training. Rather than using
simple prompting to highlight the model's out-of-the-box capabilities, we
perform a systematic exploration of prompt engineering. We find that prompting
innovation can unlock deeper specialist capabilities and show that GPT-4 easily
tops prior leading results for medical benchmarks. The prompting methods we
explore are general purpose, and make no specific use of domain expertise,
removing the need for expert-curated content. Our experimental design carefully
controls for overfitting during the prompt engineering process. We introduce
Medprompt, based on a composition of several prompting strategies. With
Medprompt, GPT-4 achieves state-of-the-art results on all nine of the benchmark
datasets in the MultiMedQA suite. The method outperforms leading specialist
models such as Med-PaLM 2 by a significant margin with an order of magnitude
fewer calls to the model. Steering GPT-4 with Medprompt achieves a 27%
reduction in error rate on the MedQA dataset over the best methods to date
achieved with specialist models and surpasses a score of 90% for the first
time. Beyond medical problems, we show the power of Medprompt to generalize to
other domains and provide evidence for the broad applicability of the approach
via studies of the strategy on exams in electrical engineering, machine
learning, philosophy, accounting, law, nursing, and clinical psychology.Comment: 21 pages, 7 figure