25,754 research outputs found
Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models
Background: In this paper we present the approaches and methods employed in
order to deal with a large scale multi-label semantic indexing task of
biomedical papers. This work was mainly implemented within the context of the
BioASQ challenge of 2014. Methods: The main contribution of this work is a
multi-label ensemble method that incorporates a McNemar statistical
significance test in order to validate the combination of the constituent
machine learning algorithms. Some secondary contributions include a study on
the temporal aspects of the BioASQ corpus (observations apply also to the
BioASQ's super-set, the PubMed articles collection) and the proper adaptation
of the algorithms used to deal with this challenging classification task.
Results: The ensemble method we developed is compared to other approaches in
experimental scenarios with subsets of the BioASQ corpus giving positive
results. During the BioASQ 2014 challenge we obtained the first place during
the first batch and the third in the two following batches. Our success in the
BioASQ challenge proved that a fully automated machine-learning approach, which
does not implement any heuristics and rule-based approaches, can be highly
competitive and outperform other approaches in similar challenging contexts
Recommended from our members
Improving Patch-Based Convolutional Neural Networks for MRI Brain Tumor Segmentation by Leveraging Location Information.
The manual brain tumor annotation process is time consuming and resource consuming, therefore, an automated and accurate brain tumor segmentation tool is greatly in demand. In this paper, we introduce a novel method to integrate location information with the state-of-the-art patch-based neural networks for brain tumor segmentation. This is motivated by the observation that lesions are not uniformly distributed across different brain parcellation regions and that a locality-sensitive segmentation is likely to obtain better segmentation accuracy. Toward this, we use an existing brain parcellation atlas in the Montreal Neurological Institute (MNI) space and map this atlas to the individual subject data. This mapped atlas in the subject data space is integrated with structural Magnetic Resonance (MR) imaging data, and patch-based neural networks, including 3D U-Net and DeepMedic, are trained to classify the different brain lesions. Multiple state-of-the-art neural networks are trained and integrated with XGBoost fusion in the proposed two-level ensemble method. The first level reduces the uncertainty of the same type of models with different seed initializations, and the second level leverages the advantages of different types of neural network models. The proposed location information fusion method improves the segmentation performance of state-of-the-art networks including 3D U-Net and DeepMedic. Our proposed ensemble also achieves better segmentation performance compared to the state-of-the-art networks in BraTS 2017 and rivals state-of-the-art networks in BraTS 2018. Detailed results are provided on the public multimodal brain tumor segmentation (BraTS) benchmarks
Multilabel Consensus Classification
In the era of big data, a large amount of noisy and incomplete data can be
collected from multiple sources for prediction tasks. Combining multiple models
or data sources helps to counteract the effects of low data quality and the
bias of any single model or data source, and thus can improve the robustness
and the performance of predictive models. Out of privacy, storage and bandwidth
considerations, in certain circumstances one has to combine the predictions
from multiple models or data sources to obtain the final predictions without
accessing the raw data. Consensus-based prediction combination algorithms are
effective for such situations. However, current research on prediction
combination focuses on the single label setting, where an instance can have one
and only one label. Nonetheless, data nowadays are usually multilabeled, such
that more than one label have to be predicted at the same time. Direct
applications of existing prediction combination methods to multilabel settings
can lead to degenerated performance. In this paper, we address the challenges
of combining predictions from multiple multilabel classifiers and propose two
novel algorithms, MLCM-r (MultiLabel Consensus Maximization for ranking) and
MLCM-a (MLCM for microAUC). These algorithms can capture label correlations
that are common in multilabel classifications, and optimize corresponding
performance metrics. Experimental results on popular multilabel classification
tasks verify the theoretical analysis and effectiveness of the proposed
methods
Recommended from our members
Integrative machine learning approach for multi-class SCOP protein fold classification
Classification and prediction of protein structure has been a central research theme in structural bioinformatics. Due to the imbalanced distribution of proteins over multi SCOP classification, most discriminative machine learning suffers the well-known ‘False Positives ’ problem when learning over these types of problems. We have devised eKISS, an ensemble machine learning specifically designed to increase the coverage of positive examples when learning under multiclass imbalanced data sets. We have applied eKISS to classify 25 SCOP folds and show that our learning system improved over classical learning methods
A multilabel fuzzy relevance clustering system for malware attack attribution in the edge layer of cyber-physical networks
The rapid increase in the number of malicious programs has made malware forensics a daunting task and caused users’ systems to become in danger. Timely identification of malware characteristics including its origin and the malware sample family would significantly limit the potential damage of malware. This is a more profound risk in Cyber-Physical Systems (CPSs), where a malware attack may cause significant physical damage to the infrastructure. Due to limited on-device available memory and processing power in CPS devices, most of the efforts for protecting CPS networks are focused on the edge layer, where the majority of security mechanisms are deployed.
Since the majority of advanced and sophisticated malware programs are combining features from different families, these malicious programs are not similar enough to any existing malware family and easily evade binary classifier detection. Therefore, in this article, we propose a novel multilabel fuzzy clustering system for malware attack attribution. Our system is deployed on the edge layer to provide insight into applicable malware threats to the CPS network. We leverage static analysis by utilizing Opcode frequencies as the feature space to classify malware families.
We observed that a multilabel classifier does not classify a part of samples. We named this problem the instance coverage problem. To overcome this problem, we developed an ensemble-based multilabel fuzzy classification method to suggest the relevance of a malware instance to the stricken families. This classifier identified samples of VirusShare, RansomwareTracker, and BIG2015 with an accuracy of 94.66%, 94.26%, and 97.56%, respectively
- …