855 research outputs found

    Feature and Decision Level Fusion Using Multiple Kernel Learning and Fuzzy Integrals

    Get PDF
    The work collected in this dissertation addresses the problem of data fusion. In other words, this is the problem of making decisions (also known as the problem of classification in the machine learning and statistics communities) when data from multiple sources are available, or when decisions/confidence levels from a panel of decision-makers are accessible. This problem has become increasingly important in recent years, especially with the ever-increasing popularity of autonomous systems outfitted with suites of sensors and the dawn of the ``age of big data.\u27\u27 While data fusion is a very broad topic, the work in this dissertation considers two very specific techniques: feature-level fusion and decision-level fusion. In general, the fusion methods proposed throughout this dissertation rely on kernel methods and fuzzy integrals. Both are very powerful tools, however, they also come with challenges, some of which are summarized below. I address these challenges in this dissertation. Kernel methods for classification is a well-studied area in which data are implicitly mapped from a lower-dimensional space to a higher-dimensional space to improve classification accuracy. However, for most kernel methods, one must still choose a kernel to use for the problem. Since there is, in general, no way of knowing which kernel is the best, multiple kernel learning (MKL) is a technique used to learn the aggregation of a set of valid kernels into a single (ideally) superior kernel. The aggregation can be done using weighted sums of the pre-computed kernels, but determining the summation weights is not a trivial task. Furthermore, MKL does not work well with large datasets because of limited storage space and prediction speed. These challenges are tackled by the introduction of many new algorithms in the following chapters. I also address MKL\u27s storage and speed drawbacks, allowing MKL-based techniques to be applied to big data efficiently. Some algorithms in this work are based on the Choquet fuzzy integral, a powerful nonlinear aggregation operator parameterized by the fuzzy measure (FM). These decision-level fusion algorithms learn a fuzzy measure by minimizing a sum of squared error (SSE) criterion based on a set of training data. The flexibility of the Choquet integral comes with a cost, however---given a set of N decision makers, the size of the FM the algorithm must learn is 2N. This means that the training data must be diverse enough to include 2N independent observations, though this is rarely encountered in practice. I address this in the following chapters via many different regularization functions, a popular technique in machine learning and statistics used to prevent overfitting and increase model generalization. Finally, it is worth noting that the aggregation behavior of the Choquet integral is not intuitive. I tackle this by proposing a quantitative visualization strategy allowing the FM and Choquet integral behavior to be shown simultaneously

    Comparison of Fuzzy Integral-Fuzzy Measure based Ensemble Algorithms with the State-of-the-art Ensemble Algorithms

    Get PDF
    The Fuzzy Integral (FI) is a non-linear aggregation operator which enables the fusion of information from multiple sources in respect to a Fuzzy Measure (FM) which captures the worth of both the individual sources and all their possible combinations. Based on the expected potential of non-linear aggregation offered by the FI, its application to decision-level fusion in ensemble classifiers, i.e. to fuse multiple classifiers outputs towards one superior decision level output, has recently been explored. A key example of such a FI-FM ensemble classification method is the Decision-level Fuzzy Integral Multiple Kernel Learning (DeFIMKL) algorithm, which aggregates the outputs of kernel based classifiers through the use of the Choquet FI with respect to a FM learned through a regularised quadratic programming approach. While the approach has been validated against a number of classifiers based on multiple kernel learning, it has thus far not been compared to the state-of-the-art in ensemble classification. Thus, this paper puts forward a detailed comparison of FI-FM based ensemble methods, specifically the DeFIMKL algorithm, with state-of-the art ensemble methods including Adaboost, Bagging, Random Forest and Majority Voting over 20 public datasets from the UCI machine learning repository. The results on the selected datasets suggest that the FI based ensemble classifier performs both well and efficiently, indicating that it is a viable alternative when selecting ensemble classifiers and indicating that the non-linear fusion of decision level outputs offered by the FI provides expected potential and warrants further study

    EXPLAINABLE FEATURE- AND DECISION-LEVEL FUSION

    Get PDF
    Information fusion is the process of aggregating knowledge from multiple data sources to produce more consistent, accurate, and useful information than any one individual source can provide. In general, there are three primary sources of data/information: humans, algorithms, and sensors. Typically, objective data---e.g., measurements---arise from sensors. Using these data sources, applications such as computer vision and remote sensing have long been applying fusion at different levels (signal, feature, decision, etc.). Furthermore, the daily advancement in engineering technologies like smart cars, which operate in complex and dynamic environments using multiple sensors, are raising both the demand for and complexity of fusion. There is a great need to discover new theories to combine and analyze heterogeneous data arising from one or more sources. The work collected in this dissertation addresses the problem of feature- and decision-level fusion. Specifically, this work focuses on fuzzy choquet integral (ChI)-based data fusion methods. Most mathematical approaches for data fusion have focused on combining inputs relative to the assumption of independence between them. However, often there are rich interactions (e.g., correlations) between inputs that should be exploited. The ChI is a powerful aggregation tool that is capable modeling these interactions. Consider the fusion of m sources, where there are 2m unique subsets (interactions); the ChI is capable of learning the worth of each of these possible source subsets. However, the complexity of fuzzy integral-based methods grows quickly, as the number of trainable parameters for the fusion of m sources scales as 2m. Hence, we require a large amount of training data to avoid the problem of over-fitting. This work addresses the over-fitting problem of ChI-based data fusion with novel regularization strategies. These regularization strategies alleviate the issue of over-fitting while training with limited data and also enable the user to consciously push the learned methods to take a predefined, or perhaps known, structure. Also, the existing methods for training the ChI for decision- and feature-level data fusion involve quadratic programming (QP). The QP-based learning approach for learning ChI-based data fusion solutions has a high space complexity. This has limited the practical application of ChI-based data fusion methods to six or fewer input sources. To address the space complexity issue, this work introduces an online training algorithm for learning ChI. The online method is an iterative gradient descent approach that processes one observation at a time, enabling the applicability of ChI-based data fusion on higher dimensional data sets. In many real-world data fusion applications, it is imperative to have an explanation or interpretation. This may include providing information on what was learned, what is the worth of individual sources, why a decision was reached, what evidence process(es) were used, and what confidence does the system have on its decision. However, most existing machine learning solutions for data fusion are black boxes, e.g., deep learning. In this work, we designed methods and metrics that help with answering these questions of interpretation, and we also developed visualization methods that help users better understand the machine learning solution and its behavior for different instances of data

    Efficient Data Driven Multi Source Fusion

    Get PDF
    Data/information fusion is an integral component of many existing and emerging applications; e.g., remote sensing, smart cars, Internet of Things (IoT), and Big Data, to name a few. While fusion aims to achieve better results than what any one individual input can provide, often the challenge is to determine the underlying mathematics for aggregation suitable for an application. In this dissertation, I focus on the following three aspects of aggregation: (i) efficient data-driven learning and optimization, (ii) extensions and new aggregation methods, and (iii) feature and decision level fusion for machine learning with applications to signal and image processing. The Choquet integral (ChI), a powerful nonlinear aggregation operator, is a parametric way (with respect to the fuzzy measure (FM)) to generate a wealth of aggregation operators. The FM has 2N variables and N(2N − 1) constraints for N inputs. As a result, learning the ChI parameters from data quickly becomes impractical for most applications. Herein, I propose a scalable learning procedure (which is linear with respect to training sample size) for the ChI that identifies and optimizes only data-supported variables. As such, the computational complexity of the learning algorithm is proportional to the complexity of the solver used. This method also includes an imputation framework to obtain scalar values for data-unsupported (aka missing) variables and a compression algorithm (lossy or losselss) of the learned variables. I also propose a genetic algorithm (GA) to optimize the ChI for non-convex, multi-modal, and/or analytical objective functions. This algorithm introduces two operators that automatically preserve the constraints; therefore there is no need to explicitly enforce the constraints as is required by traditional GA algorithms. In addition, this algorithm provides an efficient representation of the search space with the minimal set of vertices. Furthermore, I study different strategies for extending the fuzzy integral for missing data and I propose a GOAL programming framework to aggregate inputs from heterogeneous sources for the ChI learning. Last, my work in remote sensing involves visual clustering based band group selection and Lp-norm multiple kernel learning based feature level fusion in hyperspectral image processing to enhance pixel level classification

    Fuzzy Integral Driven Ensemble Classification using A Priori Fuzzy Measures

    Get PDF
    Aggregation operators are mathematical functions that enable the fusion of information from multiple sources. Fuzzy Integrals (FIs) are widely used aggregation operators, which combine information in respect to a Fuzzy Measure (FM) which captures the worth of both the individual sources and all their possible combinations. However, FIs suffer from the potential drawback of not fusing information according to the intuitively interpretable FM, leading to non-intuitive results. The latter is particularly relevant when a FM has been defined using external information (e.g. experts). In order to address this and provide an alternative to the FI, the Recursive Average (RAV) aggregation operator was recently proposed which enables intuitive data fusion in respect to a given FM. With an alternative fusion operator in place, in this paper, we define the concept of ‘A Priori’ FMs which are generated based on external information (e.g. classification accuracy) and thus provide an alternative to the traditional approaches of learning or manually specifying FMs. We proceed to develop one specific instance of such an a priori FM to support the decision level fusion step in ensemble classification. We evaluate the resulting approach by contrasting the performance of the ensemble classifiers for different FMs, including the recently introduced Uriz and the Sugeno lambda-measure; as well as by employing both the Choquet FI and the RAV as possible fusion operators. Results are presented for 20 datasets from machine learning repositories and contextualised to the wider literature by comparing them to state-of-the-art ensemble classifiers such as Adaboost, Bagging, Random Forest and Majority Voting

    Insights and Characterization of l1-norm Based Sparsity Learning of a Lexicographically Encoded Capacity Vector for the Choquet Integral

    Get PDF
    This thesis aims to simultaneously minimize function error and model complexity for data fusion via the Choquet integral (CI). The CI is a generator function, i.e., it is parametric and yields a wealth of aggregation operators based on the specifics of the underlying fuzzy measure. It is often the case that we desire to learn a fusion from data and the goal is to have the smallest possible sum of squared error between the trained model and a set of labels. However, we also desire to learn as “simple’’ of solutions as possible. Herein, L1-norm regularization of a lexicographically encoded capacity vector relative to the CI is explored. The impact of regularization is explored in terms of what capacities and aggregation operators it induces under different common and extreme scenarios. Synthetic experiments are provided in order to illustrate the propositions and concepts put forth

    Introducing Fuzzy Layers for Deep Learning

    Full text link
    Many state-of-the-art technologies developed in recent years have been influenced by machine learning to some extent. Most popular at the time of this writing are artificial intelligence methodologies that fall under the umbrella of deep learning. Deep learning has been shown across many applications to be extremely powerful and capable of handling problems that possess great complexity and difficulty. In this work, we introduce a new layer to deep learning: the fuzzy layer. Traditionally, the network architecture of neural networks is composed of an input layer, some combination of hidden layers, and an output layer. We propose the introduction of fuzzy layers into the deep learning architecture to exploit the powerful aggregation properties expressed through fuzzy methodologies, such as the Choquet and Sugueno fuzzy integrals. To date, fuzzy approaches taken to deep learning have been through the application of various fusion strategies at the decision level to aggregate outputs from state-of-the-art pre-trained models, e.g., AlexNet, VGG16, GoogLeNet, Inception-v3, ResNet-18, etc. While these strategies have been shown to improve accuracy performance for image classification tasks, none have explored the use of fuzzified intermediate, or hidden, layers. Herein, we present a new deep learning strategy that incorporates fuzzy strategies into the deep learning architecture focused on the application of semantic segmentation using per-pixel classification. Experiments are conducted on a benchmark data set as well as a data set collected via an unmanned aerial system at a U.S. Army test site for the task of automatic road segmentation, and preliminary results are promising.Comment: 6 pages, 4 figures, published in 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE
    • …
    corecore