801 research outputs found

    GRAIMATTER Green Paper:Recommendations for disclosure control of trained Machine Learning (ML) models from Trusted Research Environments (TREs)

    Get PDF
    TREs are widely, and increasingly used to support statistical analysis of sensitive data across a range of sectors (e.g., health, police, tax and education) as they enable secure and transparent research whilst protecting data confidentiality.There is an increasing desire from academia and industry to train AI models in TREs. The field of AI is developing quickly with applications including spotting human errors, streamlining processes, task automation and decision support. These complex AI models require more information to describe and reproduce, increasing the possibility that sensitive personal data can be inferred from such descriptions. TREs do not have mature processes and controls against these risks. This is a complex topic, and it is unreasonable to expect all TREs to be aware of all risks or that TRE researchers have addressed these risks in AI-specific training.GRAIMATTER has developed a draft set of usable recommendations for TREs to guard against the additional risks when disclosing trained AI models from TREs. The development of these recommendations has been funded by the GRAIMATTER UKRI DARE UK sprint research project. This version of our recommendations was published at the end of the project in September 2022. During the course of the project, we have identified many areas for future investigations to expand and test these recommendations in practice. Therefore, we expect that this document will evolve over time. The GRAIMATTER DARE UK sprint project has also developed a minimal viable product (MVP) as a suite of attack simulations that can be applied by TREs and can be accessed here (https://github.com/AI-SDC/AI-SDC).If you would like to provide feedback or would like to learn more, please contact Smarti Reel ([email protected]) and Emily Jefferson ([email protected]).The summary of our recommendations for a general public audience can be found at DOI: 10.5281/zenodo.708951

    Location Analytics for Location-Based Social Networks

    Get PDF

    Random Subspace Learning on Outlier Detection and Classification with Minimum Covariance Determinant Estimator

    Get PDF
    The questions brought by high dimensional data is interesting and challenging. Our study is targeting on the particular type of data in this situation that namely “large p, small n”. Since the dimensionality is massively larger than the number of observations in the data, any measurement of covariance and its inverse will be miserably affected. The definition of high dimension in statistics has been changed throughout decades. Modern datasets with over thousands of dimensions are demanding the ability to gain deeper understanding but hindered by the curse of dimensionality. We decide to review and explore further to negotiate with the curse and extend previous studies to pave a new way for estimating robustness then apply it to outlier detection and classification. We explored the random subspace learning and expand other classification and outlier detection algorithms to adapt its framework. Our proposed methods can handle both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned

    Data Mining

    Get PDF
    The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

    PROFILING - CONCEPTS AND APPLICATIONS

    Get PDF
    Profiling is an approach to put a label or a set of labels on a subject, considering the characteristics of this subject. The New Oxford American Dictionary defines profiling as: “recording and analysis of a person’s psychological and behavioral characteristics, so as to assess or predict his/her capabilities in a certain sphere or to assist in identifying a particular subgroup of people”. This research extends this definition towards things demonstrating that many methods used for profiling of people may be applied for a different type of subjects, namely things. The goal of this research concerns proposing methods for discovery of profiles of users and things with application of Data Science methods. The profiles are utilized in vertical and 2 horizontal scenarios and concern such domains as smart grid and telecommunication (vertical scenarios), and support provided both for the needs of authorization and personalization (horizontal usage).:The thesis consists of eight chapters including an introduction and a summary. First chapter describes motivation for work that was carried out for the last 8 years together with discussion on its importance both for research and business practice. The motivation for this work is much broader and emerges also from business importance of profiling and personalization. The introduction summarizes major research directions, provides research questions, goals and supplementary objectives addressed in the thesis. Research methodology is also described, showing impact of methodological aspects on the work undertaken. Chapter 2 provides introduction to the notion of profiling. The definition of profiling is introduced. Here, also a relation of a user profile to an identity is discussed. The papers included in this chapter show not only how broadly a profile may be understood, but also how a profile may be constructed considering different data sources. Profiling methods are introduced in Chapter 3. This chapter refers to the notion of a profile developed using the BFI-44 personality test and outcomes of a survey related to color preferences of people with a specific personality. Moreover, insights into profiling of relations between people are provided, with a focus on quality of a relation emerging from contacts between two entities. Chapters from 4 to 7 present different scenarios that benefit from application of profiling methods. Chapter 4 starts with introducing the notion of a public utility company that in the thesis is discussed using examples from smart grid and telecommunication. Then, in chapter 4 follows a description of research results regarding profiling for the smart grid, focusing on a profile of a prosumer and forecasting demand and production of the electric energy in the smart grid what can be influenced e.g. by weather or profiles of appliances. Chapter 5 presents application of profiling techniques in the field of telecommunication. Besides presenting profiling methods based on telecommunication data, in particular on Call Detail Records, also scenarios and issues related to privacy and trust are addressed. Chapter 6 and Chapter 7 target at horizontal applications of profiling that may be of benefit for multiple domains. Chapter 6 concerns profiling for authentication using un-typical data sources such as Call Detail Records or data from a mobile phone describing the user behavior. Besides proposing methods, also limitations are discussed. In addition, as a side research effect a methodology for evaluation of authentication methods is proposed. Chapter 7 concerns personalization and consists of two diverse parts. Firstly, behavioral profiles to change interface and behavior of the system are proposed and applied. The performance of solutions personalizing content either locally or on the server is studied. Then, profiles of customers of shopping centers are created based on paths identified using Call Detail Records. The analysis demonstrates that the data that is collected for one purpose, may significantly influence other business scenarios. Chapter 8 summarizes the research results achieved by the author of this document. It presents contribution over state of the art as well as some insights into the future work planned

    Algorithms to Explore the Structure and Evolution of Biological Networks

    Get PDF
    High-throughput experimental protocols have revealed thousands of relationships amongst genes and proteins under various conditions. These putative associations are being aggressively mined to decipher the structural and functional architecture of the cell. One useful tool for exploring this data has been computational network analysis. In this thesis, we propose a collection of novel algorithms to explore the structure and evolution of large, noisy, and sparsely annotated biological networks. We first introduce two information-theoretic algorithms to extract interesting patterns and modules embedded in large graphs. The first, graph summarization, uses the minimum description length principle to find compressible parts of the graph. The second, VI-Cut, uses the variation of information to non-parametrically find groups of topologically cohesive and similarly annotated nodes in the network. We show that both algorithms find structure in biological data that is consistent with known biological processes, protein complexes, genetic diseases, and operational taxonomic units. We also propose several algorithms to systematically generate an ensemble of near-optimal network clusterings and show how these multiple views can be used together to identify clustering dynamics that any single solution approach would miss. To facilitate the study of ancient networks, we introduce a framework called ``network archaeology'') for reconstructing the node-by-node and edge-by-edge arrival history of a network. Starting with a present-day network, we apply a probabilistic growth model backwards in time to find high-likelihood previous states of the graph. This allows us to explore how interactions and modules may have evolved over time. In experiments with real-world social and biological networks, we find that our algorithms can recover significant features of ancestral networks that have long since disappeared. Our work is motivated by the need to understand large and complex biological systems that are being revealed to us by imperfect data. As data continues to pour in, we believe that computational network analysis will continue to be an essential tool towards this end

    Influence Maximization in Social Networks: A Survey

    Full text link
    Online social networks have become an important platform for people to communicate, share knowledge and disseminate information. Given the widespread usage of social media, individuals' ideas, preferences and behavior are often influenced by their peers or friends in the social networks that they participate in. Since the last decade, influence maximization (IM) problem has been extensively adopted to model the diffusion of innovations and ideas. The purpose of IM is to select a set of k seed nodes who can influence the most individuals in the network. In this survey, we present a systematical study over the researches and future directions with respect to IM problem. We review the information diffusion models and analyze a variety of algorithms for the classic IM algorithms. We propose a taxonomy for potential readers to understand the key techniques and challenges. We also organize the milestone works in time order such that the readers of this survey can experience the research roadmap in this field. Moreover, we also categorize other application-oriented IM studies and correspondingly study each of them. What's more, we list a series of open questions as the future directions for IM-related researches, where a potential reader of this survey can easily observe what should be done next in this field

    Analysis of High Frequency Smart Meter Energy Consumption Data

    Get PDF
    • …
    corecore