12 research outputs found

    Supervised learning using a symmetric bilinear form for record linkage

    Get PDF
    Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data gives an estimation of the disclosure risk. In this paper we propose a new parameterized aggregation operator and a supervised learning method for disclosure risk assessment. The parameterized operator is a symmetric bilinear form and the supervised learning method is formalized as an optimization problem. The target of the optimization problem is to find the values of the aggregation parameters that maximize the number of re-identification (or correct links). We evaluate and compare our proposal with other non-parametrized variations of record linkage, such as those using the Mahalanobis distance and the Euclidean distance (one of the most used approaches for this purpose). Additionally, we also compare it with other previously presented parameterized aggregation operators for record linkage such as the weighted mean and the Choquet integral. From these comparisons we show how the proposed aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators. We also study which are the necessary optimization problem conditions to consider the described aggregation functions as metric functions

    Online Equivalence Learning Through A Quasi-Newton Method

    Get PDF
    International audienceRecently, the community has shown a growing interest in building online learning models. In this paper, we are interested in the framework of fuzzy equivalences obtained by residual implications. Models are generally based on the relevance degree between pairs of objects of the learning set, and the update is obtained by using a standard stochastic (online) gradient descent. This paper proposes another method for learning fuzzy equivalences using a Quasi-Newton optimization. The two methods are extensively compared on real data sets for the task of nearest sample(s) classification

    Supervised learning using a symmetric bilinear form for record linkage

    Full text link

    Authentication and Authorization for Mobile IoT Devices Using Biofeatures: Recent Advances and Future Trends

    Get PDF
    Biofeatures are fast becoming a key tool to authenticate the IoT devices; in this sense, the purpose of this investigation is to summarise the factors that hinder biometrics models’ development and deployment on a large scale, including human physiological (e.g., face, eyes, fingerprints-palm, or electrocardiogram) and behavioral features (e.g., signature, voice, gait, or keystroke). The different machine learning and data mining methods used by authentication and authorization schemes for mobile IoT devices are provided. Threat models and countermeasures used by biometrics-based authentication schemes for mobile IoT devices are also presented. More specifically, we analyze the state of the art of the existing biometric-based authentication schemes for IoT devices. Based on the current taxonomy, we conclude our paper with different types of challenges for future research efforts in biometrics-based authentication schemes for IoT devices

    Information fusion in content based image retrieval: A comprehensive overview

    Get PDF
    An ever increasing part of communication between persons involve the use of pictures, due to the cheap availability of powerful cameras on smartphones, and the cheap availability of storage space. The rising popularity of social networking applications such as Facebook, Twitter, Instagram, and of instant messaging applications, such as WhatsApp, WeChat, is the clear evidence of this phenomenon, due to the opportunity of sharing in real-time a pictorial representation of the context each individual is living in. The media rapidly exploited this phenomenon, using the same channel, either to publish their reports, or to gather additional information on an event through the community of users. While the real-time use of images is managed through metadata associated with the image (i.e., the timestamp, the geolocation, tags, etc.), their retrieval from an archive might be far from trivial, as an image bears a rich semantic content that goes beyond the description provided by its metadata. It turns out that after more than 20 years of research on Content-Based Image Retrieval (CBIR), the giant increase in the number and variety of images available in digital format is challenging the research community. It is quite easy to see that any approach aiming at facing such challenges must rely on different image representations that need to be conveniently fused in order to adapt to the subjectivity of image semantics. This paper offers a journey through the main information fusion ingredients that a recipe for the design of a CBIR system should include to meet the demanding needs of users

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Définition et évaluation de modèles d'agrégation pour l'estimation de la pertinence multidimensionnelle en recherche d'information

    Get PDF
    The main research topic of this document revolve around the information retrieval (IR) field. Traditional IR models rank documents by computing single scores separately with respect to one single objective criterion. Recently, an increasing number of IR studies has triggered a resurgence of interest in redefining the algorithmic estimation of relevance, which implies a shift from topical to multidimensional relevance assessment.In our work, we specifically address the multidimensional relevance assessment and evaluation problems. To tackle this challenge, state-of-the-art approaches are often based on linear combination mechanisms. However, However, these methods rely on the unrealistic additivity hypothesis and independence of the relevance dimensions, which makes it unsuitable in many real situations where criteria are correlated.Other techniques from the machine learning area have also been proposed. The latter learn a model from example inputs and generalize it to combine the different criteria. Nonetheless, these methods tend to offer only limited insight on how to consider the importance and the interaction between the criteria. In addition to the parameters sensitivity used within these algorithms, it is quite difficult to understand why a criteria is more preferred over another one.To address this problem, we proposed a model based on a multi-criteria aggregation operator that is able to overcome the problem of additivity. Our model is based on a fuzzy measure that offer semantic interpretations of the correlations and interactions between the criteria. We have adapted this model to the multidimensional relevance estimation in two scenarii: (i) a tweet search task and (ii) two personalized IR settings. The second line of research focuses on the integration of the temporal factor in the aggregation process, in order to consider the changes of document collections over time. To do so, we have proposed a time-aware IR model for combining the temporal relavance criterion with the topical relevance one. Then, we performed a time series analysis to identify the temporal query nature, and we proposed an evaluation framework within a time-aware IR setting.La problématique générale de notre travail s'inscrit dans le domaine scientifique de la recherche d'information (RI). Les modèles de RI classiques sont généralement basés sur une définition de la notion de pertinence qui est liée essentiellement à l'adéquation thématique entre le sujet de la requête et le sujet du document. Le concept de pertinence a été revisité selon différents niveaux intégrant ainsi différents facteurs liés à l'utilisateur et à son environnement dans une situation de RI. Dans ce travail, nous abordons spécifiquement le problème lié à la modélisation de la pertinence multidimensionnelle à travers la définition de nouveaux modèles d'agrégation des critères et leur évaluation dans des tâches de recherche de RI. Pour répondre à cette problématique, les travaux de l'état de l'art se basent principalement sur des combinaisons linéaires simples. Cependant, ces méthodes se reposent sur l'hypothèse non réaliste d'additivité ou d'indépendance des dimensions, ce qui rend le modèle non approprié dans plusieurs situations de recherche réelles dans lesquelles les critères étant corrélés ou présentant des interactions entre eux. D'autres techniques issues du domaine de l'apprentissage automatique ont été aussi proposées, permettant ainsi d'apprendre un modèle par l'exemple et de le généraliser dans l'ordonnancement et l'agrégation des critères. Toutefois, ces méthodes ont tendance à offrir un aperçu limité sur la façon de considérer l'importance et l'interaction entre les critères. En plus de la sensibilité des paramètres utilisés dans ces algorithmes, est très difficile de comprendre pourquoi un critère est préféré par rapport à un autre. Pour répondre à cette première direction de recherche, nous avons proposé un modèle de combinaison de pertinence multicritères basé sur un opérateur d'agrégation qui permet de surmonter le problème d'additivité des fonctions de combinaison classiques. Notre modèle se base sur une mesure qui permet de donner une idée plus claire sur les corrélations et interactions entre les critères. Nous avons ainsi adapté ce modèle pour deux scénarios de combinaison de pertinence multicritères : (i) un cadre de recherche d'information multicritères dans un contexte de recherche de tweets et (ii) deux cadres de recherche d'information personnalisée. Le deuxième axe de recherche s'intéresse à l'intégration du facteur temporel dans le processus d'agrégation afin de tenir compte des changements occurrents sur les collection de documents au cours du temps. Pour ce faire, nous avons proposé donc un modèle d'agrégation sensible au temps pour combinant le facteur temporel avec le facteur de pertinence thématique. Dans cet objectif, nous avons effectué une analyse temporelle pour éliciter l'aspect temporel des requêtes, et nous avons proposé une évaluation de ce modèle dans une tâche de recherche sensible au temps

    Sports inequalities using Gini coefficient and other inequality indices

    Full text link
    Variations of competitiveness among sport teams directly affect to the economy of professional leagues. We measured the sports inequality in terms of wins, using different indices initially designed for measuring income inequality. Among these indices, Gini Coefficient is found to be powerful tool to analyze the inequalities associated with sports
    corecore