707 research outputs found

    A Semantic Framework for the Analysis of Privacy Policies

    Get PDF

    JURI SAYS:An Automatic Judgement Prediction System for the European Court of Human Rights

    Get PDF
    In this paper we present the web platform JURI SAYS that automatically predicts decisions of the European Court of Human Rights based on communicated cases, which are published by the court early in the proceedings and are often available many years before the final decision is made. Our system therefore predicts future judgements of the court. The platform is available at jurisays.com and shows the predictions compared to the actual decisions of the court. It is automatically updated every month by including the prediction for the new cases. Additionally, the system highlights the sentences and paragraphs that are most important for the prediction (i.e. violation vs. no violation of human rights)

    Owl ontology quality assessment and optimization in the cybersecurity domain

    Get PDF
    The purpose of this dissertation is to assess the quality of ontologies in patterns perceived by cybersecurity context. A content analysis between ontologies indicated that there were more pronounced differences in OWL ontologies in the cybersecurity field. Results showed an increase of relevance from expressivity to variability. Additionally, no differences were found in strategies used in most of the incidents. The ontology background needs to be emphasized to understand the quality of the phenomena. In addition, ontologies are a means of representing an area of knowledge through their semantic structure. The search of information and integration of data from different origins provides a common base that guarantees the coherence of the data. This can be categorized and described in a normative way. The unification of information with the world that surrounds us allows to create synergies between entities and relationships. However, the area of cybersecurity is one of the real-world domains where knowledge is uncertain. It is therefore necessary to analyze the challenges of choosing the appropriate representation of un-structured information. Vulnerabilities are identified, but incident response is not an automatic mechanism for understanding and processing unstructured text found on the web.O objetivo desta dissertação foi avaliar a qualidade das ontologias, em padrões percebidos pelo contexto de cibersegurança. Uma análise de conteúdo entre ontologias indicou que havia diferenças mais pronunciadas por ontologias OWL no campo da cibersegurança. Os resultados mostram um aumento da relevância de expressividade para a variabilidade. Além disso, não foram encontradas diferenças em estratégias utilizadas na maioria dos incidentes. O conhecimento das ontologias precisa de ser enfatizado para se entender os fenómenos de qualidade. Além disso, as ontologias são um meio de representar uma área de conhecimento através da sua estrutura semântica e facilita a pesquisa de informações e a integração de dados de diferentes origens, pois fornecem uma base comum que garante a coerência dos dados, categorizados e descritos, de forma normativa. A unificação da informação com o mundo que nos rodeia permite criar sinergias entre entidades e relacionamentos. No entanto, a área de cibersegurança é um dos domínios do mundo real em que o conhecimento é incerto e é fundamental analisar os desafios de escolher a representação apropriada de informações não estruturadas. As vulnerabilidades são identificadas, mas a resposta a incidentes não é um mecanismo automático para se entender e processar textos não estruturados encontrados na web

    Knowledge Components and Methods for Policy Propagation in Data Flows

    Get PDF
    Data-oriented systems and applications are at the centre of current developments of the World Wide Web (WWW). On the Web of Data (WoD), information sources can be accessed and processed for many purposes. Users need to be aware of any licences or terms of use, which are associated with the data sources they want to use. Conversely, publishers need support in assigning the appropriate policies alongside the data they distribute. In this work, we tackle the problem of policy propagation in data flows - an expression that refers to the way data is consumed, manipulated and produced within processes. We pose the question of what kind of components are required, and how they can be acquired, managed, and deployed, to support users on deciding what policies propagate to the output of a data-intensive system from the ones associated with its input. We observe three scenarios: applications of the Semantic Web, workflow reuse in Open Science, and the exploitation of urban data in City Data Hubs. Starting from the analysis of Semantic Web applications, we propose a data-centric approach to semantically describe processes as data flows: the Datanode ontology, which comprises a hierarchy of the possible relations between data objects. By means of Policy Propagation Rules, it is possible to link data flow steps and policies derivable from semantic descriptions of data licences. We show how these components can be designed, how they can be effectively managed, and how to reason efficiently with them. In a second phase, the developed components are verified using a Smart City Data Hub as a case study, where we developed an end-to-end solution for policy propagation. Finally, we evaluate our approach and report on a user study aimed at assessing both the quality and the value of the proposed solution

    Data-Driven, Personalized Usable Privacy

    Get PDF
    We live in the "inverse-privacy" world, where service providers derive insights from users' data that the users do not even know about. This has been fueled by the advancements in machine learning technologies, which allowed providers to go beyond the superficial analysis of users' transactions to the deep inspection of users' content. Users themselves have been facing several problems in coping with this widening information discrepancy. Although the interfaces of apps and websites are generally equipped with privacy indicators (e.g., permissions, policies, ...), this has not been enough to create the counter-effect. We particularly identify three of the gaps that hindered the effectiveness and usability of privacy indicators: - Scale Adaptation: The scale at which service providers are collecting data has been growing on multiple fronts. Users, on the other hand, have limited time, effort, and technological resources to cope with this scale. - Risk Communication: Although providers utilize privacy indicators to announce what and (less often) why they need particular pieces of information, they rarely relay what can be potentially inferred from this data. Without this knowledge, users are less equipped to make informed decisions when they sign in to a site or install an application. - Language Complexity: The information practices of service providers are buried in complex, long privacy policies. Generally, users do not have the time and sometimes the skills to decipher such policies, even when they are interested in knowing particular pieces of it. In this thesis, we approach usable privacy from a data perspective. Instead of static privacy interfaces that are obscure, recurring, or unreadable, we develop techniques that bridge the understanding gap between users and service providers. Towards that, we make the following contributions: - Crowdsourced, data-driven privacy decision-making: In an effort to combat the growing scale of data exposure, we consider the context of files uploaded to cloud services. We propose C3P, a framework for automatically assessing the sensitivity of files, thus enabling realtime, fine-grained policy enforcement on top of unstructured data. - Data-driven app privacy indicators: We introduce PrivySeal, which involves a new paradigm of dynamic, personalized app privacy indicators that bridge the risk under- standing gap between users and providers. Through PrivySeal's online platform, we also study the emerging problem of interdependent privacy in the context of cloud apps and provide a usable privacy indicator to mitigate it. - Automated question answering about privacy practices: We introduce PriBot, the first automated question-answering system for privacy policies, which allows users to pose their questions about the privacy practices of any company with their own language. Through a user study, we show its effectiveness at achieving high accuracy and relevance for users, thus narrowing the complexity gap in navigating privacy policies. A core aim of this thesis is paving the road for a future where privacy indicators are not bound by a specific medium or pre-scripted wording. We design and develop techniques that enable privacy to be communicated effectively in an interface that is approachable to the user. For that, we go beyond textual interfaces to enable dynamic, visual, and hands-free privacy interfaces that are fit for the variety of emerging technologies

    Doctor of Philosophy

    Get PDF
    dissertationBiomedical data are a rich source of information and knowledge. Not only are they useful for direct patient care, but they may also offer answers to important population-based questions. Creating an environment where advanced analytics can be performed against biomedical data is nontrivial, however. Biomedical data are currently scattered across multiple systems with heterogeneous data, and integrating these data is a bigger task than humans can realistically do by hand; therefore, automatic biomedical data integration is highly desirable but has never been fully achieved. This dissertation introduces new algorithms that were devised to support automatic and semiautomatic integration of heterogeneous biomedical data. The new algorithms incorporate both data mining and biomedical informatics techniques to create "concept bags" that are used to compute similarity between data elements in the same way that "word bags" are compared in data mining. Concept bags are composed of controlled medical vocabulary concept codes that are extracted from text using named-entity recognition software. To test the new algorithm, three biomedical text similarity use cases were examined: automatically aligning data elements between heterogeneous data sets, determining degrees of similarity between medical terms using a published benchmark, and determining similarity between ICU discharge summaries. The method is highly configurable and 5 different versions were tested. The concept bag method performed particularly well aligning data elements and outperformed the compared algorithms by iv more than 5%. Another configuration that included hierarchical semantics performed particularly well at matching medical terms, meeting or exceeding 30 of 31 other published results using the same benchmark. Results for the third scenario of computing ICU discharge summary similarity were less successful. Correlations between multiple methods were low, including between terminologists. The concept bag algorithms performed consistently and comparatively well and appear to be viable options for multiple scenarios. New applications of the method and ideas for improving the algorithm are being discussed for future work, including several performance enhancements, configuration-based enhancements, and concept vector weighting using the TF-IDF formulas

    The Application of Semantic Web Technologies to Content Analysis in Sociology

    Get PDF
    In der Soziologie werden Texte als soziale Phänomene verstanden, die als Mittel zur Analyse von sozialer Wirklichkeit dienen können. Im Laufe der Jahre hat sich eine breite Palette von Techniken in der soziologischen Textanalyse entwickelt, du denen quantitative und qualitative Methoden, sowie vollständig manuelle und computergestützte Ansätze gehören. Die Entwicklung des World Wide Web und sozialer Medien, aber auch technische Entwicklungen wie maschinelle Schrift- und Spracherkennung tragen dazu bei, dass die Menge an verfügbaren und analysierbaren Texten enorm angestiegen ist. Dies führte in den letzten Jahren dazu, dass auch Soziologen auf mehr computergestützte Ansätze zur Textanalyse setzten, wie zum Beispiel statistische ’Natural Language Processing’ (NLP) Techniken. Doch obwohl vielseitige Methoden und Technologien für die soziologische Textanalyse entwickelt wurden, fehlt es an einheitlichen Standards zur Analyse und Veröffentlichung textueller Daten. Dieses Problem führt auch dazu, dass die Transparenz von Analyseprozessen und Wiederverwendbarkeit von Forschungsdaten leidet. Das ’Semantic Web’ und damit einhergehend ’Linked Data’ bieten eine Reihe von Standards zur Darstellung und Organisation von Informationen und Wissen. Diese Standards werden von zahlreichen Anwendungen genutzt, darunter befinden sich auch Methoden zur Veröffentlichung von Daten und ’Named Entity Linking’, eine spezielle Form von NLP. Diese Arbeit versucht die Frage zu diskutieren, in welchem Umfang diese Standards und Tools aus der SemanticWeb- und Linked Data- Community die computergestützte Textanalyse in der Soziologie unterstützen können. Die dafür notwendigen Technologien werden kurz vorgsetellt und danach auf einen Beispieldatensatz der aus Verfassungstexten der Niederlande von 1883 bis 2016 bestand angewendet. Dabei wird demonstriert wie aus den Dokumenten RDF Daten generiert und veröffentlicht werden können, und wie darauf zugegriffen werden kann. Es werden Abfragen erstellt die sich zunächst ausschließlich auf die lokalen Daten beziehen und daraufhin wird demonstriert wie dieses lokale Wissen durch Informationen aus externen Wissensbases angereichert werden kann. Die vorgestellten Ansätze werden im Detail diskutiert und es werden Schnittpunkte für ein mögliches Engagement der Soziologen im Semantic Web Bereich herausgearbeitet, die die vogestellten Analysen und Abfragemöglichkeiten in Zukunft erweitern können

    Abstracts: HASTAC 2017: The Possible Worlds of Digital Humanities

    Get PDF
    The document contains abstracts for HASTAC 2017

    Interdisciplining Digital Humanities: Boundary Work in an Emerging Field

    Get PDF
    The first book to test the claim that the emerging field of Digital Humanities is interdisciplinary and also examines the boundary work of establishing and sustaining a new field of stud
    • …
    corecore