814 research outputs found

    Supervised learning using a symmetric bilinear form for record linkage

    Get PDF
    Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data gives an estimation of the disclosure risk. In this paper we propose a new parameterized aggregation operator and a supervised learning method for disclosure risk assessment. The parameterized operator is a symmetric bilinear form and the supervised learning method is formalized as an optimization problem. The target of the optimization problem is to find the values of the aggregation parameters that maximize the number of re-identification (or correct links). We evaluate and compare our proposal with other non-parametrized variations of record linkage, such as those using the Mahalanobis distance and the Euclidean distance (one of the most used approaches for this purpose). Additionally, we also compare it with other previously presented parameterized aggregation operators for record linkage such as the weighted mean and the Choquet integral. From these comparisons we show how the proposed aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators. We also study which are the necessary optimization problem conditions to consider the described aggregation functions as metric functions

    Supervised learning using a symmetric bilinear form for record linkage

    Full text link

    Reliable Location-Based Services from Radio Navigation Systems

    Get PDF
    Loran is a radio-based navigation system originally designed for naval applications. We show that Loran-C’s high-power and high repeatable accuracy are fantastic for security applications. First, we show how to derive a precise location tag—with a sensitivity of about 20 meters—that is difficult to project to an exact location. A device can use our location tag to block or allow certain actions, without knowing its precise location. To ensure that our tag is reproducible we make use of fuzzy extractors, a mechanism originally designed for biometric authentication. We build a fuzzy extractor specifically designed for radio-type errors and give experimental evidence to show its effectiveness. Second, we show that our location tag is difficult to predict from a distance. For example, an observer cannot predict the location tag inside a guarded data center from a few hundreds of meters away. As an application, consider a location-aware disk drive that will only work inside the data center. An attacker who steals the device and is capable of spoofing Loran-C signals, still cannot make the device work since he does not know what location tag to spoof. We provide experimental data supporting our unpredictability claim

    Handling metadata in the scope of coreference detection in data collections

    Get PDF

    Methods for Analysing Endothelial Cell Shape and Behaviour in Relation to the Focal Nature of Atherosclerosis

    Get PDF
    The aim of this thesis is to develop automated methods for the analysis of the spatial patterns, and the functional behaviour of endothelial cells, viewed under microscopy, with applications to the understanding of atherosclerosis. Initially, a radial search approach to segmentation was attempted in order to trace the cell and nuclei boundaries using a maximum likelihood algorithm; it was found inadequate to detect the weak cell boundaries present in the available data. A parametric cell shape model was then introduced to fit an equivalent ellipse to the cell boundary by matching phase-invariant orientation fields of the image and a candidate cell shape. This approach succeeded on good quality images, but failed on images with weak cell boundaries. Finally, a support vector machines based method, relying on a rich set of visual features, and a small but high quality training dataset, was found to work well on large numbers of cells even in the presence of strong intensity variations and imaging noise. Using the segmentation results, several standard shear-stress dependent parameters of cell morphology were studied, and evidence for similar behaviour in some cell shape parameters was obtained in in-vivo cells and their nuclei. Nuclear and cell orientations around immature and mature aortas were broadly similar, suggesting that the pattern of flow direction near the wall stayed approximately constant with age. The relation was less strong for the cell and nuclear length-to-width ratios. Two novel shape analysis approaches were attempted to find other properties of cell shape which could be used to annotate or characterise patterns, since a wide variability in cell and nuclear shapes was observed which did not appear to fit the standard parameterisations. Although no firm conclusions can yet be drawn, the work lays the foundation for future studies of cell morphology. To draw inferences about patterns in the functional response of cells to flow, which may play a role in the progression of disease, single-cell analysis was performed using calcium sensitive florescence probes. Calcium transient rates were found to change with flow, but more importantly, local patterns of synchronisation in multi-cellular groups were discernable and appear to change with flow. The patterns suggest a new functional mechanism in flow-mediation of cell-cell calcium signalling

    Robust techniques and applications in fuzzy clustering

    Get PDF
    This dissertation addresses issues central to frizzy classification. The issue of sensitivity to noise and outliers of least squares minimization based clustering techniques, such as Fuzzy c-Means (FCM) and its variants is addressed. In this work, two novel and robust clustering schemes are presented and analyzed in detail. They approach the problem of robustness from different perspectives. The first scheme scales down the FCM memberships of data points based on the distance of the points from the cluster centers. Scaling done on outliers reduces their membership in true clusters. This scheme, known as the Mega-clustering, defines a conceptual mega-cluster which is a collective cluster of all data points but views outliers and good points differently (as opposed to the concept of Dave\u27s Noise cluster). The scheme is presented and validated with experiments and similarities with Noise Clustering (NC) are also presented. The other scheme is based on the feasible solution algorithm that implements the Least Trimmed Squares (LTS) estimator. The LTS estimator is known to be resistant to noise and has a high breakdown point. The feasible solution approach also guarantees convergence of the solution set to a global optima. Experiments show the practicability of the proposed schemes in terms of computational requirements and in the attractiveness of their simplistic frameworks. The issue of validation of clustering results has often received less attention than clustering itself. Fuzzy and non-fuzzy cluster validation schemes are reviewed and a novel methodology for cluster validity using a test for random position hypothesis is developed. The random position hypothesis is tested against an alternative clustered hypothesis on every cluster produced by the partitioning algorithm. The Hopkins statistic is used as a basis to accept or reject the random position hypothesis, which is also the null hypothesis in this case. The Hopkins statistic is known to be a fair estimator of randomness in a data set. The concept is borrowed from the clustering tendency domain and its applicability to validating clusters is shown here. A unique feature selection procedure for use with large molecular conformational datasets with high dimensionality is also developed. The intelligent feature extraction scheme not only helps in reducing dimensionality of the feature space but also helps in eliminating contentious issues such as the ones associated with labeling of symmetric atoms in the molecule. The feature vector is converted to a proximity matrix, and is used as an input to the relational fuzzy clustering (FRC) algorithm with very promising results. Results are also validated using several cluster validity measures from literature. Another application of fuzzy clustering considered here is image segmentation. Image analysis on extremely noisy images is carried out as a precursor to the development of an automated real time condition state monitoring system for underground pipelines. A two-stage FCM with intelligent feature selection is implemented as the segmentation procedure and results on a test image are presented. A conceptual framework for automated condition state assessment is also developed

    Disease diagnosis in smart healthcare: Innovation, technologies and applications

    Get PDF
    To promote sustainable development, the smart city implies a global vision that merges artificial intelligence, big data, decision making, information and communication technology (ICT), and the internet-of-things (IoT). The ageing issue is an aspect that researchers, companies and government should devote efforts in developing smart healthcare innovative technology and applications. In this paper, the topic of disease diagnosis in smart healthcare is reviewed. Typical emerging optimization algorithms and machine learning algorithms are summarized. Evolutionary optimization, stochastic optimization and combinatorial optimization are covered. Owning to the fact that there are plenty of applications in healthcare, four applications in the field of diseases diagnosis (which also list in the top 10 causes of global death in 2015), namely cardiovascular diseases, diabetes mellitus, Alzheimer’s disease and other forms of dementia, and tuberculosis, are considered. In addition, challenges in the deployment of disease diagnosis in healthcare have been discussed
    • …
    corecore