6 research outputs found

    Constraints preserving genetic algorithm for learning fuzzy measures with an application to ontology matching

    Get PDF
    Abstract. Both the fuzzy measure and integral have been widely studied for multi-source information fusion. A number of researchers have proposed optimization techniques to learn a fuzzy measure from training data. In part, this task is difficult as the fuzzy measure can have a large number of free parameters (2 N − 2 for N sources) and it has many (monotonicity) constraints. In this paper, a new genetic algorithm approach to constraint preserving optimization of the fuzzy measure is present for the task of learning and fusing different ontology matching results. Preliminary results are presented to show the stability of the leaning algorithm and its effectiveness compared to existing approaches

    The arithmetic recursive average as an instance of the recursive weighted power mean

    Get PDF
    The aggregation of multiple information sources has a long history and ranges from sensor fusion to the aggregation of individual algorithm outputs and human knowledge. A popular approach to achieve such aggregation is the fuzzy integral (FI) which is defined with respect to a fuzzy measure (FM (i.e. a normal, monotone capacity). In practice, the discrete FI aggregates information contributed by a discrete number of sources through a weighted aggregation (post-sorting), where the weights are captured by a FM that models the typically subjective ‘worth’ of subsets of the overall set of sources. While the combination of FI and FM has been very successful, challenges remain both in regards to the behavior of the resulting aggregation operators—which for example do not produce symmetrically mirrored outputs for symmetrically mirrored inputs—and also in a manifest difference between the intuitive interpretation of a stand-alone FM and its actual role and impact when used as part of information fusion with a FI. This paper elucidates these challenges and introduces a novel family of recursive average (RAV) operators as an alternative to the FI in aggregation with respect to a FM; focusing specifically on the arithmetic recursive average. The RAV is designed to address the above challenges, while also facilitating fine-grained analysis of the resulting aggregation of different combinations of sources. We provide the mathematical foundations of the RAV and include initial experiments and comparisons to the FI for both numeric and interval-valued data

    Insights and Characterization of l1-norm Based Sparsity Learning of a Lexicographically Encoded Capacity Vector for the Choquet Integral

    Get PDF
    This thesis aims to simultaneously minimize function error and model complexity for data fusion via the Choquet integral (CI). The CI is a generator function, i.e., it is parametric and yields a wealth of aggregation operators based on the specifics of the underlying fuzzy measure. It is often the case that we desire to learn a fusion from data and the goal is to have the smallest possible sum of squared error between the trained model and a set of labels. However, we also desire to learn as “simple’’ of solutions as possible. Herein, L1-norm regularization of a lexicographically encoded capacity vector relative to the CI is explored. The impact of regularization is explored in terms of what capacities and aggregation operators it induces under different common and extreme scenarios. Synthetic experiments are provided in order to illustrate the propositions and concepts put forth

    Extension of the fuzzy integral for general fuzzy set-valued information

    Get PDF
    The fuzzy integral (FI) is an extremely flexible aggregation operator. It is used in numerous applications, such as image processing, multicriteria decision making, skeletal age-at-death estimation, and multisource (e.g., feature, algorithm, sensor, and confidence) fusion. To date, a few works have appeared on the topic of generalizing Sugeno's original real-valued integrand and fuzzy measure (FM) for the case of higher order uncertain information (both integrand and measure). For the most part, these extensions are motivated by, and are consistent with, Zadeh's extension principle (EP). Namely, existing extensions focus on fuzzy number (FN), i.e., convex and normal fuzzy set- (FS) valued integrands. Herein, we put forth a new definition, called the generalized FI (gFI), and efficient algorithm for calculation for FS-valued integrands. In addition, we compare the gFI, numerically and theoretically, with our non-EP-based FI extension called the nondirect FI (NDFI). Examples are investigated in the areas of skeletal age-at-death estimation in forensic anthropology and multisource fusion. These applications help demonstrate the need and benefit of the proposed work. In particular, we show there is not one supreme technique. Instead, multiple extensions are of benefit in different contexts and applications

    EXPLAINABLE FEATURE- AND DECISION-LEVEL FUSION

    Get PDF
    Information fusion is the process of aggregating knowledge from multiple data sources to produce more consistent, accurate, and useful information than any one individual source can provide. In general, there are three primary sources of data/information: humans, algorithms, and sensors. Typically, objective data---e.g., measurements---arise from sensors. Using these data sources, applications such as computer vision and remote sensing have long been applying fusion at different levels (signal, feature, decision, etc.). Furthermore, the daily advancement in engineering technologies like smart cars, which operate in complex and dynamic environments using multiple sensors, are raising both the demand for and complexity of fusion. There is a great need to discover new theories to combine and analyze heterogeneous data arising from one or more sources. The work collected in this dissertation addresses the problem of feature- and decision-level fusion. Specifically, this work focuses on fuzzy choquet integral (ChI)-based data fusion methods. Most mathematical approaches for data fusion have focused on combining inputs relative to the assumption of independence between them. However, often there are rich interactions (e.g., correlations) between inputs that should be exploited. The ChI is a powerful aggregation tool that is capable modeling these interactions. Consider the fusion of m sources, where there are 2m unique subsets (interactions); the ChI is capable of learning the worth of each of these possible source subsets. However, the complexity of fuzzy integral-based methods grows quickly, as the number of trainable parameters for the fusion of m sources scales as 2m. Hence, we require a large amount of training data to avoid the problem of over-fitting. This work addresses the over-fitting problem of ChI-based data fusion with novel regularization strategies. These regularization strategies alleviate the issue of over-fitting while training with limited data and also enable the user to consciously push the learned methods to take a predefined, or perhaps known, structure. Also, the existing methods for training the ChI for decision- and feature-level data fusion involve quadratic programming (QP). The QP-based learning approach for learning ChI-based data fusion solutions has a high space complexity. This has limited the practical application of ChI-based data fusion methods to six or fewer input sources. To address the space complexity issue, this work introduces an online training algorithm for learning ChI. The online method is an iterative gradient descent approach that processes one observation at a time, enabling the applicability of ChI-based data fusion on higher dimensional data sets. In many real-world data fusion applications, it is imperative to have an explanation or interpretation. This may include providing information on what was learned, what is the worth of individual sources, why a decision was reached, what evidence process(es) were used, and what confidence does the system have on its decision. However, most existing machine learning solutions for data fusion are black boxes, e.g., deep learning. In this work, we designed methods and metrics that help with answering these questions of interpretation, and we also developed visualization methods that help users better understand the machine learning solution and its behavior for different instances of data

    Feature and Decision Level Fusion Using Multiple Kernel Learning and Fuzzy Integrals

    Get PDF
    The work collected in this dissertation addresses the problem of data fusion. In other words, this is the problem of making decisions (also known as the problem of classification in the machine learning and statistics communities) when data from multiple sources are available, or when decisions/confidence levels from a panel of decision-makers are accessible. This problem has become increasingly important in recent years, especially with the ever-increasing popularity of autonomous systems outfitted with suites of sensors and the dawn of the ``age of big data.\u27\u27 While data fusion is a very broad topic, the work in this dissertation considers two very specific techniques: feature-level fusion and decision-level fusion. In general, the fusion methods proposed throughout this dissertation rely on kernel methods and fuzzy integrals. Both are very powerful tools, however, they also come with challenges, some of which are summarized below. I address these challenges in this dissertation. Kernel methods for classification is a well-studied area in which data are implicitly mapped from a lower-dimensional space to a higher-dimensional space to improve classification accuracy. However, for most kernel methods, one must still choose a kernel to use for the problem. Since there is, in general, no way of knowing which kernel is the best, multiple kernel learning (MKL) is a technique used to learn the aggregation of a set of valid kernels into a single (ideally) superior kernel. The aggregation can be done using weighted sums of the pre-computed kernels, but determining the summation weights is not a trivial task. Furthermore, MKL does not work well with large datasets because of limited storage space and prediction speed. These challenges are tackled by the introduction of many new algorithms in the following chapters. I also address MKL\u27s storage and speed drawbacks, allowing MKL-based techniques to be applied to big data efficiently. Some algorithms in this work are based on the Choquet fuzzy integral, a powerful nonlinear aggregation operator parameterized by the fuzzy measure (FM). These decision-level fusion algorithms learn a fuzzy measure by minimizing a sum of squared error (SSE) criterion based on a set of training data. The flexibility of the Choquet integral comes with a cost, however---given a set of N decision makers, the size of the FM the algorithm must learn is 2N. This means that the training data must be diverse enough to include 2N independent observations, though this is rarely encountered in practice. I address this in the following chapters via many different regularization functions, a popular technique in machine learning and statistics used to prevent overfitting and increase model generalization. Finally, it is worth noting that the aggregation behavior of the Choquet integral is not intuitive. I tackle this by proposing a quantitative visualization strategy allowing the FM and Choquet integral behavior to be shown simultaneously
    corecore