4,903 research outputs found
Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models
Background: In this paper we present the approaches and methods employed in
order to deal with a large scale multi-label semantic indexing task of
biomedical papers. This work was mainly implemented within the context of the
BioASQ challenge of 2014. Methods: The main contribution of this work is a
multi-label ensemble method that incorporates a McNemar statistical
significance test in order to validate the combination of the constituent
machine learning algorithms. Some secondary contributions include a study on
the temporal aspects of the BioASQ corpus (observations apply also to the
BioASQ's super-set, the PubMed articles collection) and the proper adaptation
of the algorithms used to deal with this challenging classification task.
Results: The ensemble method we developed is compared to other approaches in
experimental scenarios with subsets of the BioASQ corpus giving positive
results. During the BioASQ 2014 challenge we obtained the first place during
the first batch and the third in the two following batches. Our success in the
BioASQ challenge proved that a fully automated machine-learning approach, which
does not implement any heuristics and rule-based approaches, can be highly
competitive and outperform other approaches in similar challenging contexts
ICLabel: An automated electroencephalographic independent component classifier, dataset, and website
The electroencephalogram (EEG) provides a non-invasive, minimally
restrictive, and relatively low cost measure of mesoscale brain dynamics with
high temporal resolution. Although signals recorded in parallel by multiple,
near-adjacent EEG scalp electrode channels are highly-correlated and combine
signals from many different sources, biological and non-biological, independent
component analysis (ICA) has been shown to isolate the various source generator
processes underlying those recordings. Independent components (IC) found by ICA
decomposition can be manually inspected, selected, and interpreted, but doing
so requires both time and practice as ICs have no particular order or intrinsic
interpretations and therefore require further study of their properties.
Alternatively, sufficiently-accurate automated IC classifiers can be used to
classify ICs into broad source categories, speeding the analysis of EEG studies
with many subjects and enabling the use of ICA decomposition in near-real-time
applications. While many such classifiers have been proposed recently, this
work presents the ICLabel project comprised of (1) an IC dataset containing
spatiotemporal measures for over 200,000 ICs from more than 6,000 EEG
recordings, (2) a website for collecting crowdsourced IC labels and educating
EEG researchers and practitioners about IC interpretation, and (3) the
automated ICLabel classifier. The classifier improves upon existing methods in
two ways: by improving the accuracy of the computed label estimates and by
enhancing its computational efficiency. The ICLabel classifier outperforms or
performs comparably to the previous best publicly available method for all
measured IC categories while computing those labels ten times faster than that
classifier as shown in a rigorous comparison against all other publicly
available EEG IC classifiers.Comment: Intended for NeuroImage. Updated from version one with minor
editorial and figure change
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
With the rise of social media, millions of people are routinely expressing
their moods, feelings, and daily struggles with mental health issues on social
media platforms like Twitter. Unlike traditional observational cohort studies
conducted through questionnaires and self-reported surveys, we explore the
reliable detection of clinical depression from tweets obtained unobtrusively.
Based on the analysis of tweets crawled from users with self-reported
depressive symptoms in their Twitter profiles, we demonstrate the potential for
detecting clinical depression symptoms which emulate the PHQ-9 questionnaire
clinicians use today. Our study uses a semi-supervised statistical model to
evaluate how the duration of these symptoms and their expression on Twitter (in
terms of word usage patterns and topical preferences) align with the medical
findings reported via the PHQ-9. Our proactive and automatic screening tool is
able to identify clinical depressive symptoms with an accuracy of 68% and
precision of 72%.Comment: 8 pages, Advances in Social Networks Analysis and Mining (ASONAM),
2017 IEEE/ACM International Conferenc
Joint & Progressive Learning from High-Dimensional Data for Multi-Label Classification
Despite the fact that nonlinear subspace learning techniques (e.g. manifold
learning) have successfully applied to data representation, there is still room
for improvement in explainability (explicit mapping), generalization
(out-of-samples), and cost-effectiveness (linearization). To this end, a novel
linearized subspace learning technique is developed in a joint and progressive
way, called \textbf{j}oint and \textbf{p}rogressive \textbf{l}earning
str\textbf{a}teg\textbf{y} (J-Play), with its application to multi-label
classification. The J-Play learns high-level and semantically meaningful
feature representation from high-dimensional data by 1) jointly performing
multiple subspace learning and classification to find a latent subspace where
samples are expected to be better classified; 2) progressively learning
multi-coupled projections to linearly approach the optimal mapping bridging the
original space with the most discriminative subspace; 3) locally embedding
manifold structure in each learnable latent subspace. Extensive experiments are
performed to demonstrate the superiority and effectiveness of the proposed
method in comparison with previous state-of-the-art methods.Comment: accepted in ECCV 201
- …