74,965 research outputs found

    Comparison of Fuzzy Integral-Fuzzy Measure based Ensemble Algorithms with the State-of-the-art Ensemble Algorithms

    Get PDF
    The Fuzzy Integral (FI) is a non-linear aggregation operator which enables the fusion of information from multiple sources in respect to a Fuzzy Measure (FM) which captures the worth of both the individual sources and all their possible combinations. Based on the expected potential of non-linear aggregation offered by the FI, its application to decision-level fusion in ensemble classifiers, i.e. to fuse multiple classifiers outputs towards one superior decision level output, has recently been explored. A key example of such a FI-FM ensemble classification method is the Decision-level Fuzzy Integral Multiple Kernel Learning (DeFIMKL) algorithm, which aggregates the outputs of kernel based classifiers through the use of the Choquet FI with respect to a FM learned through a regularised quadratic programming approach. While the approach has been validated against a number of classifiers based on multiple kernel learning, it has thus far not been compared to the state-of-the-art in ensemble classification. Thus, this paper puts forward a detailed comparison of FI-FM based ensemble methods, specifically the DeFIMKL algorithm, with state-of-the art ensemble methods including Adaboost, Bagging, Random Forest and Majority Voting over 20 public datasets from the UCI machine learning repository. The results on the selected datasets suggest that the FI based ensemble classifier performs both well and efficiently, indicating that it is a viable alternative when selecting ensemble classifiers and indicating that the non-linear fusion of decision level outputs offered by the FI provides expected potential and warrants further study

    Learning with Graphs using Kernels from Propagated Information

    Get PDF
    Traditional machine learning approaches are designed to learn from independent vector-valued data points. The assumption that instances are independent, however, is not always true. On the contrary, there are numerous domains where data points are cross-linked, for example social networks, where persons are linked by friendship relations. These relations among data points make traditional machine learning diffcult and often insuffcient. Furthermore, data points themselves can have complex structure, for example molecules or proteins constructed from various bindings of different atoms. Networked and structured data are naturally represented by graphs, and for learning we aimto exploit their structure to improve upon non-graph-based methods. However, graphs encountered in real-world applications often come with rich additional information. This naturally implies many challenges for representation and learning: node information is likely to be incomplete leading to partially labeled graphs, information can be aggregated from multiple sources and can therefore be uncertain, or additional information on nodes and edges can be derived from complex sensor measurements, thus being naturally continuous. Although learning with graphs is an active research area, learning with structured data, substantially modeling structural similarities of graphs, mostly assumes fully labeled graphs of reasonable sizes with discrete and certain node and edge information, and learning with networked data, naturally dealing with missing information and huge graphs, mostly assumes homophily and forgets about structural similarity. To close these gaps, we present a novel paradigm for learning with graphs, that exploits the intermediate results of iterative information propagation schemes on graphs. Originally developed for within-network relational and semi-supervised learning, these propagation schemes have two desirable properties: they capture structural information and they can naturally adapt to the aforementioned issues of real-world graph data. Additionally, information propagation can be efficiently realized by random walks leading to fast, flexible, and scalable feature and kernel computations. Further, by considering intermediate random walk distributions, we can model structural similarity for learning with structured and networked data. We develop several approaches based on this paradigm. In particular, we introduce propagation kernels for learning on the graph level and coinciding walk kernels and Markov logic sets for learning on the node level. Finally, we present two application domains where kernels from propagated information successfully tackle real-world problems

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Combining multiple resolutions into hierarchical representations for kernel-based image classification

    Get PDF
    Geographic object-based image analysis (GEOBIA) framework has gained increasing interest recently. Following this popular paradigm, we propose a novel multiscale classification approach operating on a hierarchical image representation built from two images at different resolutions. They capture the same scene with different sensors and are naturally fused together through the hierarchical representation, where coarser levels are built from a Low Spatial Resolution (LSR) or Medium Spatial Resolution (MSR) image while finer levels are generated from a High Spatial Resolution (HSR) or Very High Spatial Resolution (VHSR) image. Such a representation allows one to benefit from the context information thanks to the coarser levels, and subregions spatial arrangement information thanks to the finer levels. Two dedicated structured kernels are then used to perform machine learning directly on the constructed hierarchical representation. This strategy overcomes the limits of conventional GEOBIA classification procedures that can handle only one or very few pre-selected scales. Experiments run on an urban classification task show that the proposed approach can highly improve the classification accuracy w.r.t. conventional approaches working on a single scale.Comment: International Conference on Geographic Object-Based Image Analysis (GEOBIA 2016), University of Twente in Enschede, The Netherland

    Use of Machine Learning for Partial Discharge Discrimination

    No full text
    Partial discharge (PD) measurements are an important tool for assessing the condition of power equipment. Different sources of PD have different effects on the insulation performance of power apparatus. Therefore, discrimination between PD sources is of great interest to both system utilities and equipment manufacturers. This paper investigates the use of a wide bandwidth PD on-line measurement system to facilitate automatic PD source identification. Three artificial PD models were used to simulate typical PD sources which may exist within power systems. Wavelet analysis was applied to pre-process the obtained measurement data. This data was then processed using correlation analysis to cluster the discharges into different groups. A machine learning technique, namely the support vector machine (SVM) was then used to identify between the different PD sources. The SVM is trained to differentiate between the inherent features of each discharge source signal. Laboratory experiments indicate that this approach is applicable for use with field measurement data

    Robust Ensemble Classification Methodology for I123-Ioflupane SPECT Images and Multiple Heterogeneous Biomarkers in the Diagnosis of Parkinson’s Disease

    Get PDF
    In last years, several approaches to develop an effective Computer-Aided-Diagnosis (CAD) system for Parkinson’s Disease (PD) have been proposed. Most of these methods have focused almost exclusively on brain images through the use of Machine-Learning algorithms suitable to characterize structural or functional patterns. Those patterns provide enough information about the status and/or the progression at intermediate and advanced stages of Parkinson’s Disease. Nevertheless this information could be insufficient at early stages of the pathology. The Parkinson’s ProgressionMarkers Initiative (PPMI) database includes neurological images along with multiple biomedical tests. This information opens up the possibility of comparing different biomarker classification results. As data come from heterogeneous sources, it is expected that we could include some of these biomarkers in order to obtain new information about the pathology. Based on that idea, this work presents an Ensemble Classification model with Performance Weighting. This proposal has been tested comparing Healthy Control subjects (HC) vs. patients with PD (considering both PD and SWEDD labeled subjects as the same class). This model combines several Support-Vector-Machine (SVM) with linear kernel classifiers for different biomedical group of tests—including CerebroSpinal Fluid (CSF), RNA, and Serum tests—and pre-processed neuroimages features (Voxels-As-Features and a list of definedMorphological Features) fromPPMI database subjects. The proposed methodology makes use of all data sources and selects the most discriminant features (mainly from neuroimages). Using this performance-weighted ensemble classification model, classification results up to 96% were obtained.This work was supported by the MINECO/FEDER under the TEC2015-64718-R project and the Ministry of Economy, Innovation, Science and Employment of the Junta de Andalucía under the Excellence Project P11-TIC-7103

    A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization

    Full text link
    Existing Android malware detection approaches use a variety of features such as security sensitive APIs, system calls, control-flow structures and information flows in conjunction with Machine Learning classifiers to achieve accurate detection. Each of these feature sets provides a unique semantic perspective (or view) of apps' behaviours with inherent strengths and limitations. Meaning, some views are more amenable to detect certain attacks but may not be suitable to characterise several other attacks. Most of the existing malware detection approaches use only one (or a selected few) of the aforementioned feature sets which prevent them from detecting a vast majority of attacks. Addressing this limitation, we propose MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localisation. The rationale is that, while a malware app can disguise itself in some views, disguising in every view while maintaining malicious intent will be much harder. MKLDroid uses a graph kernel to capture structural and contextual information from apps' dependency graphs and identify malice code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views which yields the best detection accuracy. Besides multi-view learning, MKLDroid's unique and salient trait is its ability to locate fine-grained malice code portions in dependency graphs (e.g., methods/classes). Through our large-scale experiments on several datasets (incl. wild apps), we demonstrate that MKLDroid outperforms three state-of-the-art techniques consistently, in terms of accuracy while maintaining comparable efficiency. In our malicious code localisation experiments on a dataset of repackaged malware, MKLDroid was able to identify all the malice classes with 94% average recall
    • …
    corecore