138 research outputs found

    Quantitatively measuring privacy in interactive query settings within RDBMS framework

    Get PDF
    Little attention has been paid to the measurement of risk to privacy in Database Management Systems, despite their prevalence as a modality of data access. This paper proposes PriDe, a quantitative privacy metric that provides a measure (privacy score) of privacy risk when executing queries in relational database management systems. PriDe measures the degree to which attribute values, retrieved by a principal (user) engaging in an interactive query session, represent a reduction of privacy with respect to the attribute values previously retrieved by the principal. It can be deployed in interactive query settings where the user sends SQL queries to the database and gets results at run-time and provides privacy-conscious organizations with a way to monitor the usage of the application data made available to third parties in terms of privacy. The proposed approach, without loss of generality, is applicable to BigSQL-style technologies. Additionally, the paper proposes a privacy equivalence relation that facilitates the computation of the privacy score

    On the detection of privacy and security anomalies

    Get PDF
    Data analytics over generated personal data has the potential to derive meaningful insights to enable clarity of trends and predictions, for instance, disease outbreak prediction as well as it allows for data-driven decision making for contemporary organisations. Predominantly, the collected personal data is managed, stored, and accessed using a Database Management System (DBMS) by insiders as employees of an organisation. One of the data security and privacy concerns is of insider threats, where legitimate users of the system abuse the access privileges they hold. Insider threats come in two flavours; one is an insider threat to data security (security attacks), and the other is an insider threat to data privacy (privacy attacks). The insider threat to data security means that an insider steals or leaks sensitive personal information. The insider threat to data privacy is when the insider maliciously access information resulting in the violation of an individual’s privacy, for instance, browsing through customers bank account balances or attempting to narrow down to re-identify an individual who has the highest salary. Much past work has been done on detecting security attacks by insiders using behavioural-based anomaly detection approaches. This dissertation looks at to what extent these kinds of techniques can be used to detect privacy attacks by insiders. The dissertation proposes approaches for modelling insider querying behaviour by considering sequence and frequency-based correlations in order to identify anomalous correlations between SQL queries in the querying behaviour of a malicious insider. A behavioural-based anomaly detection using an n-gram based approach is proposed that considers sequences of SQL queries to model querying behaviour. The results demonstrate the effectiveness of detecting malicious insiders accesses to the DBMS as anomalies, based on query correlations. This dissertation looks at the modelling of normative behaviour from a DBMS perspective and proposes a record/DBMS-oriented approach by considering frequency-based correlations to detect potentially malicious insiders accesses as anomalies. Additionally, the dissertation investigates modelling of malicious insider SQL querying behaviour as rare behaviour by considering sequence and frequency-based correlations using (frequent and rare) item-sets mining. This dissertation proposes the notion of ‘Privacy-Anomaly Detection’ and considers the question whether behavioural-based anomaly detection approaches can have a privacy semantic interpretation and whether the detected anomalies can be related to the conventional (formal) definitions of privacy semantics such as k-anonymity and the discrimination rate privacy metric. The dissertation considers privacy attacks (violations of formal privacy definition) based on a sequence of SQL queries (query correlations). It is shown that interactive querying settings are vulnerable to privacy attacks based on query correlation. Whether these types of privacy attacks can potentially manifest themselves as anomalies, specifically as privacy-anomalies, is investigated. One result is that privacy attacks (violation of formal privacy definition) can be detected as privacy-anomalies by applying behavioural-based anomaly detection using n-gram over the logs of interactive querying mechanisms

    Thinking interactively with visualization

    Get PDF
    Interaction is becoming an integral part of using visualization for analysis. When interaction is tightly and appropriately coupled with visualization, it can transform the visualization from display- ing static imagery to assisting comprehensive analysis of data at all scales. In this relationship, a deeper understanding of the role of interaction, its effects, and how visualization relates to interaction is necessary for designing systems in which the two components complement each other. This thesis approaches interaction in visualization from three different perspectives. First, it considers the cost of maintaining interaction in manipulating visualization of large datasets. Namely, large datasets often require a simplification process for the visualization to maintain interactivity, and this thesis examines how simplification affects the resulting visualization. Secondly, example interactive visual analytical systems are presented to demonstrate how interactivity could be applied in visualization. Specifically, four fully developed systems for four distinct problem domains are discussed to determine the common role of interactivity in these visualizations that make the systems successful. Lastly, this thesis presents evidence that interactions are important for analytical tasks using visualizations. Interaction logs of financial analysts using a visualization were collected, coded, and examined to determine the amount of analysis strategies contained within the interaction logs. The finding supports the benefits of high interactivity in analytical tasks when using a visualization. The example visualizations used to support these three perspectives are diverse in their goals and features. However, they all share similar design guidelines and visualization principles. Based on their characteristics, this thesis groups these visualizations into urban visualization, visual analytical systems, and interaction capturing and discusses them separately in terms of lessons learned and future directions

    Differential Privacy - A Balancing Act

    Get PDF
    Data privacy is an ever important aspect of data analyses. Historically, a plethora of privacy techniques have been introduced to protect data, but few have stood the test of time. From investigating the overlap between big data research, and security and privacy research, I have found that differential privacy presents itself as a promising defender of data privacy.Differential privacy is a rigorous, mathematical notion of privacy. Nevertheless, privacy comes at a cost. In order to achieve differential privacy, we need to introduce some form of inaccuracy (i.e. error) to our analyses. Hence, practitioners need to engage in a balancing act between accuracy and privacy when adopting differential privacy. As a consequence, understanding this accuracy/privacy trade-off is vital to being able to use differential privacy in real data analyses.In this thesis, I aim to bridge the gap between differential privacy in theory, and differential privacy in practice. Most notably, I aim to convey a better understanding of the accuracy/privacy trade-off, by 1) implementing tools to tweak accuracy/privacy in a real use case, 2) presenting a methodology for empirically predicting error, and 3) systematizing and analyzing known accuracy improvement techniques for differentially private algorithms. Additionally, I also put differential privacy into context by investigating how it can be applied in the automotive domain. Using the automotive domain as an example, I introduce the main challenges that constitutes the balancing act, and provide advice for moving forward

    Reinventing the Social Scientist and Humanist in the Era of Big Data

    Get PDF
    This book explores the big data evolution by interrogating the notion that big data is a disruptive innovation that appears to be challenging existing epistemologies in the humanities and social sciences. Exploring various (controversial) facets of big data such as ethics, data power, and data justice, the book attempts to clarify the trajectory of the epistemology of (big) data-driven science in the humanities and social sciences

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    Geospatial Information Research: State of the Art, Case Studies and Future Perspectives

    Get PDF
    Geospatial information science (GI science) is concerned with the development and application of geodetic and information science methods for modeling, acquiring, sharing, managing, exploring, analyzing, synthesizing, visualizing, and evaluating data on spatio-temporal phenomena related to the Earth. As an interdisciplinary scientific discipline, it focuses on developing and adapting information technologies to understand processes on the Earth and human-place interactions, to detect and predict trends and patterns in the observed data, and to support decision making. The authors – members of DGK, the Geoinformatics division, as part of the Committee on Geodesy of the Bavarian Academy of Sciences and Humanities, representing geodetic research and university teaching in Germany – have prepared this paper as a means to point out future research questions and directions in geospatial information science. For the different facets of geospatial information science, the state of art is presented and underlined with mostly own case studies. The paper thus illustrates which contributions the German GI community makes and which research perspectives arise in geospatial information science. The paper further demonstrates that GI science, with its expertise in data acquisition and interpretation, information modeling and management, integration, decision support, visualization, and dissemination, can help solve many of the grand challenges facing society today and in the future

    An evaluation of the challenges of Multilingualism in Data Warehouse development

    Get PDF
    In this paper we discuss Business Intelligence and define what is meant by support for Multilingualism in a Business Intelligence reporting context. We identify support for Multilingualism as a challenging issue which has implications for data warehouse design and reporting performance. Data warehouses are a core component of most Business Intelligence systems and the star schema is the approach most widely used to develop data warehouses and dimensional Data Marts. We discuss the way in which Multilingualism can be supported in the Star Schema and identify that current approaches have serious limitations which include data redundancy and data manipulation, performance and maintenance issues. We propose a new approach to enable the optimal application of multilingualism in Business Intelligence. The proposed approach was found to produce satisfactory results when used in a proof-of-concept environment. Future work will include testing the approach in an enterprise environmen

    Civil Good - A Platform For Sustainable and Inclusive Online Discussion

    Get PDF
    Civil Good is a website concept proposed by Alan Mandel with the goal of enabling safe, anonymous, productive, and civil discourse without the disruptive behavior and language common to much of the Internet. The goal of Civil Good is to improve the critical thinking and discussion skills of its users while combating the effects of political polarization and misinformation in society. This paper analyzes Mandel\u27s proposed concept, providing additional research to either support or refute the various features proposed, and recommendations to simplify user interactions. It also examines topics mentioned only briefly or not discussed by Mandel, such as data protection methods, the psychology of Web browsing, marketing, operational costs, legal issues, monetization options, and mobile presence
    corecore