77 research outputs found

    Review and classification of variability analysis techniques with clinical applications

    Get PDF
    Analysis of patterns of variation of time-series, termed variability analysis, represents a rapidly evolving discipline with increasing applications in different fields of science. In medicine and in particular critical care, efforts have focussed on evaluating the clinical utility of variability. However, the growth and complexity of techniques applicable to this field have made interpretation and understanding of variability more challenging. Our objective is to provide an updated review of variability analysis techniques suitable for clinical applications. We review more than 70 variability techniques, providing for each technique a brief description of the underlying theory and assumptions, together with a summary of clinical applications. We propose a revised classification for the domains of variability techniques, which include statistical, geometric, energetic, informational, and invariant. We discuss the process of calculation, often necessitating a mathematical transform of the time-series. Our aims are to summarize a broad literature, promote a shared vocabulary that would improve the exchange of ideas, and the analyses of the results between different studies. We conclude with challenges for the evolving science of variability analysis

    Generating tabular datasets under differential privacy

    Full text link
    Machine Learning (ML) is accelerating progress across fields and industries, but relies on accessible and high-quality training data. Some of the most important datasets are found in biomedical and financial domains in the form of spreadsheets and relational databases. But this tabular data is often sensitive in nature. Synthetic data generation offers the potential to unlock sensitive data, but generative models tend to memorise and regurgitate training data, which undermines the privacy goal. To remedy this, researchers have incorporated the mathematical framework of Differential Privacy (DP) into the training process of deep neural networks. But this creates a trade-off between the quality and privacy of the resulting data. Generative Adversarial Networks (GANs) are the dominant paradigm for synthesising tabular data under DP, but suffer from unstable adversarial training and mode collapse, which are exacerbated by the privacy constraints and challenging tabular data modality. This work optimises the quality-privacy trade-off of generative models, producing higher quality tabular datasets with the same privacy guarantees. We implement novel end-to-end models that leverage attention mechanisms to learn reversible tabular representations. We also introduce TableDiffusion, the first differentially-private diffusion model for tabular data synthesis. Our experiments show that TableDiffusion produces higher-fidelity synthetic datasets, avoids the mode collapse problem, and achieves state-of-the-art performance on privatised tabular data synthesis. By implementing TableDiffusion to predict the added noise, we enabled it to bypass the challenges of reconstructing mixed-type tabular data. Overall, the diffusion paradigm proves vastly more data and privacy efficient than the adversarial paradigm, due to augmented re-use of each data batch and a smoother iterative training process

    Graph Analysis and Applications in Clustering and Content-based Image Retrieval

    Get PDF
    About 300 years ago, when studying Seven Bridges of Kƶnigsberg problem - a famous problem concerning paths on graphs - the great mathematician Leonhard Euler said, ā€œThis question is very banal, but seems to me worthy of attentionā€. Since then, graph theory and graph analysis have not only become one of the most important branches of mathematics, but have also found an enormous range of important applications in many other areas. A graph is a mathematical model that abstracts entities and the relationships between them as nodes and edges. Many types of interactions between the entities can be modeled by graphs, for example, social interactions between people, the communications between the entities in computer networks and relations between biological species. Although not appearing to be a graph, many other types of data can be converted into graphs by cer- tain operations, for example, the k-nearest neighborhood graph built from pixels in an image. Cluster structure is a common phenomenon in many real-world graphs, for example, social networks. Finding the clusters in a large graph is important to understand the underlying relationships between the nodes. Graph clustering is a technique that partitions nodes into clus- ters such that connections among nodes in a cluster are dense and connections between nodes in diļ¬€erent clusters are sparse. Various approaches have been proposed to solve graph clustering problems. A common approach is to optimize a predeļ¬ned clustering metric using diļ¬€erent optimization methods. However, most of these optimization problems are NP-hard due to the discrete set-up of the hard-clustering. These optimization problems can be relaxed, and a sub-optimal solu- tion can be found. A diļ¬€erent approach is to apply data clustering algorithms in solving graph clustering problems. With this approach, one must ļ¬rst ļ¬nd appropriate features for each node that represent the local structure of the graph. Limited Random Walk algorithm uses the random walk procedure to explore the graph and extracts ef- ļ¬cient features for the nodes. It incorporates the embarrassing parallel paradigm, thus, it can process large graph data eļ¬ƒciently using mod- ern high-performance computing facilities. This thesis gives the details of this algorithm and analyzes the stability issues of the algorithm. Based on the study of the cluster structures in a graph, we deļ¬ne the authenticity score of an edge as the diļ¬€erence between the actual and the expected number of edges that connect the two groups of the neighboring nodes of the two end nodes. Authenticity score can be used in many important applications, such as graph clustering, outlier detection, and graph data preprocessing. In particular, a data clus- tering algorithm that uses the authenticity scores on mutual k-nearest neighborhood graph achieves more reliable and superior performance comparing to other popular algorithms. This thesis also theoretically proves that this algorithm can asymptotically ļ¬nd the complete re- covery of the ground truth of the graphs that were generated by a stochastic r-block model. Content-based image retrieval (CBIR) is an important application in computer vision, media information retrieval, and data mining. Given a query image, a CBIR system ranks the images in a large image database by their ā€œsimilaritiesā€ to the query image. However, because of the ambiguities of the deļ¬nition of the ā€œsimilarityā€, it is very diļ¬ƒ- cult for a CBIR system to select the optimal feature set and ranking algorithm to satisfy the purpose of the query. Graph technologies have been used to improve the performance of CBIR systems in var- ious ways. In this thesis, a novel method is proposed to construct a visual-semantic graphā€”a graph where nodes represent semantic concepts and edges represent visual associations between concepts. The constructed visual-semantic graph not only helps the user to locate the target images quickly but also helps answer the questions related to the query image. Experiments show that the eļ¬€orts of locating the target image are reduced by 25% with the help of visual-semantic graphs. Graph analysis will continue to play an important role in future data analysis. In particular, the visual-semantic graph that captures important and interesting visual associations between the concepts is worthyof further attention

    ON THE CATEGORIZATION OF HIGH ACTIVITY OBJECTS USING DIFFERENTIAL ATTRIBUTE PROFILES

    Get PDF
    Change detection represents a broad field of research being on demand for different applications (e.g. disaster management and land use / land cover monitoring). Since the detection itself only delivers information about location and date of the change event, it is limited against approaches dealing with the category, type, or class of the change objects. In contrast to classification, categorization denotes a feature-based clustering of entities (here: change objects) without using any class catalogue information. Therefore, the extraction of suitable features has to be performed leading to a clear distinction of the resulting clusters.In previous work, a change analysis workflow has been accomplished, which comprises both the detection, the categorization, and the classification of so-called high activity change objects extracted from a TerraSAR-X time series dataset. With focus on the features used in this study, the morphological differential attribute profiles (DAPs) turned out to be very promising. It was shown, that the DAP were essential for the construction of the principal components.In this paper, this circumstance is considered. Moreover, a change categorization based only on different and complementary DAP features is performed. An assessment concerning the best suitable features is given.</p
    • ā€¦
    corecore