8,714 research outputs found
PadChest: A large chest x-ray image dataset with multi-label annotated reports
We present a labeled large-scale, high resolution chest x-ray dataset for the
automated exploration of medical images along with their associated reports.
This dataset includes more than 160,000 images obtained from 67,000 patients
that were interpreted and reported by radiologists at Hospital San Juan
Hospital (Spain) from 2009 to 2017, covering six different position views and
additional information on image acquisition and patient demography. The reports
were labeled with 174 different radiographic findings, 19 differential
diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and
mapped onto standard Unified Medical Language System (UMLS) terminology. Of
these reports, 27% were manually annotated by trained physicians and the
remaining set was labeled using a supervised method based on a recurrent neural
network with attention mechanisms. The labels generated were then validated in
an independent test set achieving a 0.93 Micro-F1 score. To the best of our
knowledge, this is one of the largest public chest x-ray database suitable for
training supervised models concerning radiographs, and the first to contain
radiographic reports in Spanish. The PadChest dataset can be downloaded from
http://bimcv.cipf.es/bimcv-projects/padchest/
Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data
Audio Word2Vec offers vector representations of fixed dimensionality for
variable-length audio segments using Sequence-to-sequence Autoencoder (SA).
These vector representations are shown to describe the sequential phonetic
structures of the audio segments to a good degree, with real world applications
such as query-by-example Spoken Term Detection (STD). This paper examines the
capability of language transfer of Audio Word2Vec. We train SA from one
language (source language) and use it to extract the vector representation of
the audio segments of another language (target language). We found that SA can
still catch phonetic structure from the audio segments of the target language
if the source and target languages are similar. In query-by-example STD, we
obtain the vector representations from the SA learned from a large amount of
source language data, and found them surpass the representations from naive
encoder and SA directly learned from a small amount of target language data.
The result shows that it is possible to learn Audio Word2Vec model from
high-resource languages and use it on low-resource languages. This further
expands the usability of Audio Word2Vec.Comment: arXiv admin note: text overlap with arXiv:1603.0098
Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition
Previous work has established that a person's demographics and speech style
affect how well speech processing models perform for them. But where does this
bias come from? In this work, we present the Speech Embedding Association Test
(SpEAT), a method for detecting bias in one type of model used for many speech
tasks: pre-trained models. The SpEAT is inspired by word embedding association
tests in natural language processing, which quantify intrinsic bias in a
model's representations of different concepts, such as race or valence
(something's pleasantness or unpleasantness) and capture the extent to which a
model trained on large-scale socio-cultural data has learned human-like biases.
Using the SpEAT, we test for six types of bias in 16 English speech models
(including 4 models also trained on multilingual data), which come from the
wav2vec 2.0, HuBERT, WavLM, and Whisper model families. We find that 14 or more
models reveal positive valence (pleasantness) associations with abled people
over disabled people, with European-Americans over African-Americans, with
females over males, with U.S. accented speakers over non-U.S. accented
speakers, and with younger people over older people. Beyond establishing that
pre-trained speech models contain these biases, we also show that they can have
real world effects. We compare biases found in pre-trained models to biases in
downstream models adapted to the task of Speech Emotion Recognition (SER) and
find that in 66 of the 96 tests performed (69%), the group that is more
associated with positive valence as indicated by the SpEAT also tends to be
predicted as speaking with higher valence by the downstream model. Our work
provides evidence that, like text and image-based models, pre-trained speech
based-models frequently learn human-like biases. Our work also shows that bias
found in pre-trained models can propagate to the downstream task of SER
Text Analytics: the convergence of Big Data and Artificial Intelligence
The analysis of the text content in emails, blogs,
tweets, forums and other forms of textual communication
constitutes what we call text analytics. Text analytics is applicable
to most industries: it can help analyze millions of emails; you can
analyze customers’ comments and questions in forums; you can
perform sentiment analysis using text analytics by measuring
positive or negative perceptions of a company, brand, or product.
Text Analytics has also been called text mining, and is a subcategory
of the Natural Language Processing (NLP) field, which is one of the
founding branches of Artificial Intelligence, back in the 1950s, when
an interest in understanding text originally developed. Currently
Text Analytics is often considered as the next step in Big Data
analysis. Text Analytics has a number of subdivisions: Information
Extraction, Named Entity Recognition, Semantic Web annotated
domain’s representation, and many more. Several techniques are
currently used and some of them have gained a lot of attention,
such as Machine Learning, to show a semisupervised enhancement
of systems, but they also present a number of limitations which
make them not always the only or the best choice. We conclude
with current and near future applications of Text Analytics
Clinical narrative analytics challenges
Precision medicine or evidence based medicine is based on
the extraction of knowledge from medical records to provide individuals
with the appropriate treatment in the appropriate moment according to
the patient features. Despite the efforts of using clinical narratives for
clinical decision support, many challenges have to be faced still today
such as multilinguarity, diversity of terms and formats in different services,
acronyms, negation, to name but a few. The same problems exist
when one wants to analyze narratives in literature whose analysis would
provide physicians and researchers with highlights. In this talk we will
analyze challenges, solutions and open problems and will analyze several
frameworks and tools that are able to perform NLP over free text to
extract medical entities by means of Named Entity Recognition process.
We will also analyze a framework we have developed to extract and validate
medical terms. In particular we present two uses cases: (i) medical
entities extraction of a set of infectious diseases description texts provided
by MedlinePlus and (ii) scales of stroke identification in clinical
narratives written in Spanish
A Neurocomputational Model of Grounded Language Comprehension and Production at the Sentence Level
While symbolic and statistical approaches to natural language processing have become undeniably impressive in recent years, such systems still display a tendency to make errors that are inscrutable to human onlookers. This disconnect with human processing may stem from the vast differences in the substrates that underly natural language processing in artificial systems versus biological systems.
To create a more relatable system, this dissertation turns to the more biologically inspired substrate of neural networks, describing the design and implementation of a model that learns to comprehend and produce language at the sentence level. The model's task is to ground simulated speech streams, representing a simple subset of English, in terms of a virtual environment. The model learns to understand and answer full-sentence questions about the environment by mimicking the speech stream of another speaker, much as a human language learner would. It is the only known neural model to date that can learn to map natural language questions to full-sentence natural language answers, where both question and answer are represented sublexically as phoneme sequences.
The model addresses important points for which most other models, neural and otherwise, fail to account. First, the model learns to ground its linguistic knowledge using human-like sensory representations, gaining language understanding at a deeper level than that of syntactic structure. Second, analysis provides evidence that the model learns combinatorial internal representations, thus gaining the compositionality of symbolic approaches to cognition, which is vital for computationally efficient encoding and decoding of meaning. The model does this while retaining the fully distributed representations characteristic of neural networks, providing the resistance to damage and graceful degradation that are generally lacking in symbolic and statistical approaches. Finally, the model learns via direct imitation of another speaker, allowing it to emulate human processing with greater fidelity, thus increasing the relatability of its behavior.
Along the way, this dissertation develops a novel training algorithm that, for the first time, requires only local computations to train arbitrary second-order recurrent neural networks. This algorithm is evaluated on its overall efficacy, biological feasibility, and ability to reproduce peculiarities of human learning such as age-correlated effects in second language acquisition
- …