20,071 research outputs found
Integrating image caption information into biomedical document classification in support of biocuration.
Gathering information from the scientific literature is essential for biomedical research, as much knowledge is conveyed through publications. However, the large and rapidly increasing publication rate makes it impractical for researchers to quickly identify all and only those documents related to their interest. As such, automated biomedical document classification attracts much interest. Such classification is critical in the curation of biological databases, because biocurators must scan through a vast number of articles to identify pertinent information within documents most relevant to the database. This is a slow, labor-intensive process that can benefit from effective automation.
We present a document classification scheme aiming to identify papers containing information relevant to a specific topic, among a large collection of articles, for supporting the biocuration classification task. Our framework is based on a meta-classification scheme we have introduced before; here we incorporate into it features gathered from figure captions, in addition to those obtained from titles and abstracts. We trained and tested our classifier over a large imbalanced dataset, originally curated by the Gene Expression Database (GXD). GXD collects all the gene expression information in the Mouse Genome Informatics (MGI) resource. As part of the MGI literature classification pipeline, GXD curators identify MGI-selected papers that are relevant for GXD. The dataset consists of ~60 000 documents (5469 labeled as relevant; 52 866 as irrelevant), gathered throughout 2012-2016, in which each document is represented by the text of its title, abstract and figure captions. Our classifier attains precision 0.698, recall 0.784, f-measure 0.738 and Matthews correlation coefficient 0.711, demonstrating that the proposed framework effectively addresses the high imbalance in the GXD classification task. Moreover, our classifier\u27s performance is significantly improved by utilizing information from image captions compared to using titles and abstracts alone; this observation clearly demonstrates that image captions provide substantial information for supporting biomedical document classification and curation.
Database URL
Signal2Image Modules in Deep Neural Networks for EEG Classification
Deep learning has revolutionized computer vision utilizing the increased
availability of big data and the power of parallel computational units such as
graphical processing units. The vast majority of deep learning research is
conducted using images as training data, however the biomedical domain is rich
in physiological signals that are used for diagnosis and prediction problems.
It is still an open research question how to best utilize signals to train deep
neural networks.
In this paper we define the term Signal2Image (S2Is) as trainable or
non-trainable prefix modules that convert signals, such as
Electroencephalography (EEG), to image-like representations making them
suitable for training image-based deep neural networks defined as `base
models'. We compare the accuracy and time performance of four S2Is (`signal as
image', spectrogram, one and two layer Convolutional Neural Networks (CNNs))
combined with a set of `base models' (LeNet, AlexNet, VGGnet, ResNet, DenseNet)
along with the depth-wise and 1D variations of the latter. We also provide
empirical evidence that the one layer CNN S2I performs better in eleven out of
fifteen tested models than non-trainable S2Is for classifying EEG signals and
we present visual comparisons of the outputs of the S2Is.Comment: 4 pages, 2 figures, 1 table, EMBC 201
Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization
Nonnegative Matrix Factorization (NMF) has been continuously evolving in
several areas like pattern recognition and information retrieval methods. It
factorizes a matrix into a product of 2 low-rank non-negative matrices that
will define parts-based, and linear representation of nonnegative data.
Recently, Graph regularized NMF (GrNMF) is proposed to find a compact
representation,which uncovers the hidden semantics and simultaneously respects
the intrinsic geometric structure. In GNMF, an affinity graph is constructed
from the original data space to encode the geometrical information. In this
paper, we propose a novel idea which engages a Multiple Kernel Learning
approach into refining the graph structure that reflects the factorization of
the matrix and the new data space. The GrNMF is improved by utilizing the graph
refined by the kernel learning, and then a novel kernel learning method is
introduced under the GrNMF framework. Our approach shows encouraging results of
the proposed algorithm in comparison to the state-of-the-art clustering
algorithms like NMF, GrNMF, SVD etc.Comment: This paper has been withdrawn by the author due to the terrible
writin
Three-Dimensional GPU-Accelerated Active Contours for Automated Localization of Cells in Large Images
Cell segmentation in microscopy is a challenging problem, since cells are
often asymmetric and densely packed. This becomes particularly challenging for
extremely large images, since manual intervention and processing time can make
segmentation intractable. In this paper, we present an efficient and highly
parallel formulation for symmetric three-dimensional (3D) contour evolution
that extends previous work on fast two-dimensional active contours. We provide
a formulation for optimization on 3D images, as well as a strategy for
accelerating computation on consumer graphics hardware. The proposed software
takes advantage of Monte-Carlo sampling schemes in order to speed up
convergence and reduce thread divergence. Experimental results show that this
method provides superior performance for large 2D and 3D cell segmentation
tasks when compared to existing methods on large 3D brain images
Utilizing image and caption information for biomedical document classification.
MOTIVATION: Biomedical research findings are typically disseminated through publications. To simplify access to domain-specific knowledge while supporting the research community, several biomedical databases devote significant effort to manual curation of the literature-a labor intensive process. The first step toward biocuration requires identifying articles relevant to the specific area on which the database focuses. Thus, automatically identifying publications relevant to a specific topic within a large volume of publications is an important task toward expediting the biocuration process and, in turn, biomedical research. Current methods focus on textual contents, typically extracted from the title-and-abstract. Notably, images and captions are often used in publications to convey pivotal evidence about processes, experiments and results.
RESULTS: We present a new document classification scheme, using both image and caption information, in addition to titles-and-abstracts. To use the image information, we introduce a new image representation, namely Figure-word, based on class labels of subfigures. We use word embeddings for representing captions and titles-and-abstracts. To utilize all three types of information, we introduce two information integration methods. The first combines Figure-words and textual features obtained from captions and titles-and-abstracts into a single larger vector for document representation; the second employs a meta-classification scheme. Our experiments and results demonstrate the usefulness of the newly proposed Figure-words for representing images. Moreover, the results showcase the value of Figure-words, captions and titles-and-abstracts in providing complementary information for document classification; these three sources of information when combined, lead to an overall improved classification performance.
AVAILABILITY AND IMPLEMENTATION: Source code and the list of PMIDs of the publications in our datasets are available upon request
Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks
Early and accurate diagnosis of interstitial lung diseases (ILDs) is crucial
for making treatment decisions, but can be challenging even for experienced
radiologists. The diagnostic procedure is based on the detection and
recognition of the different ILD pathologies in thoracic CT scans, yet their
manifestation often appears similar. In this study, we propose the use of a
deep purely convolutional neural network for the semantic segmentation of ILD
patterns, as the basic component of a computer aided diagnosis (CAD) system for
ILDs. The proposed CNN, which consists of convolutional layers with dilated
filters, takes as input a lung CT image of arbitrary size and outputs the
corresponding label map. We trained and tested the network on a dataset of 172
sparsely annotated CT scans, within a cross-validation scheme. The training was
performed in an end-to-end and semi-supervised fashion, utilizing both labeled
and non-labeled image regions. The experimental results show significant
performance improvement with respect to the state of the art
- …