537 research outputs found

    Active document enrichment using adaptive information extraction from text

    Get PDF
    The traditional process of document annotation for knowledge identification and extraction in the Semantic Web (SW) is complex and time consuming, as it requires manual annotation by domain experts. There is currently a strong interest in Text Mining technologies (and in particular in Human Language-based Technologies), for reducing the burden of text annotation for Knowledge Management (KM). In this poster we present Melita, an annotation interface that uses Adaptive Information Extraction from texts for reducing the burden of text annotation.peer-reviewe

    Timely and nonintrusive active document annotation via adaptive information extraction

    Get PDF
    The current work has been carried on in the framework of the AKT project (Advanced Knowledge Technologies, http://www.aktors.org), an Interdisciplinary Research Collaboration (IRC) sponsored by the UK Engineering and Physical Sciences Research Council (grant GR/N15764/01). AKT involves the Universities of Aberdeen, Edinburgh, Sheffield, Southampton and the Open University (www.aktors.org). AKT is a multimillion pound six year research project that started in 2000.The process of document annotation for the Semantic Web is complex and time consuming, as it requires a great deal of manual annotation. Information extraction from texts (IE) is a technology used by some of the most recent systems for actively supporting users in the process and reducing the burden of annotation. The integration of IE systems in annotation tools is quite a new development and in our opinion there is still the necessity of thinking the impact of the IE system in the process of annotation. In this paper we discuss two main requirements for active annotation: timeliness and tuning of intrusiveness. Then we present and discuss a model of interaction that addresses the two issues and Melita, an annotation framework that implements such methodology.peer-reviewe

    An Expressive Model for the Web Infrastructure: Definition and Application to the BrowserID SSO System

    Full text link
    The web constitutes a complex infrastructure and as demonstrated by numerous attacks, rigorous analysis of standards and web applications is indispensable. Inspired by successful prior work, in particular the work by Akhawe et al. as well as Bansal et al., in this work we propose a formal model for the web infrastructure. While unlike prior works, which aim at automatic analysis, our model so far is not directly amenable to automation, it is much more comprehensive and accurate with respect to the standards and specifications. As such, it can serve as a solid basis for the analysis of a broad range of standards and applications. As a case study and another important contribution of our work, we use our model to carry out the first rigorous analysis of the BrowserID system (a.k.a. Mozilla Persona), a recently developed complex real-world single sign-on system that employs technologies such as AJAX, cross-document messaging, and HTML5 web storage. Our analysis revealed a number of very critical flaws that could not have been captured in prior models. We propose fixes for the flaws, formally state relevant security properties, and prove that the fixed system in a setting with a so-called secondary identity provider satisfies these security properties in our model. The fixes for the most critical flaws have already been adopted by Mozilla and our findings have been rewarded by the Mozilla Security Bug Bounty Program.Comment: An abridged version appears in S&P 201

    Larger Residuals, Less Work: Active Document Scheduling for Latent Dirichlet Allocation

    Full text link

    Analyzing the BrowserID SSO System with Primary Identity Providers Using an Expressive Model of the Web

    Full text link
    BrowserID is a complex, real-world Single Sign-On (SSO) System for web applications recently developed by Mozilla. It employs new HTML5 features (such as web messaging and web storage) and cryptographic assertions to provide decentralized login, with the intent to respect users' privacy. It can operate in a primary and a secondary identity provider mode. While in the primary mode BrowserID runs with arbitrary identity providers (IdPs), in the secondary mode there is one IdP only, namely Mozilla's default IdP. We recently proposed an expressive general model for the web infrastructure and, based on this web model, analyzed the security of the secondary IdP mode of BrowserID. The analysis revealed several severe vulnerabilities. In this paper, we complement our prior work by analyzing the even more complex primary IdP mode of BrowserID. We do not only study authentication properties as before, but also privacy properties. During our analysis we discovered new and practical attacks that do not apply to the secondary mode: an identity injection attack, which violates a central authentication property of SSO systems, and attacks that break an important privacy promise of BrowserID and which do not seem to be fixable without a major redesign of the system. Some of our attacks on privacy make use of a browser side channel that has not gained a lot of attention so far. For the authentication bug, we propose a fix and formally prove in a slight extension of our general web model that the fixed system satisfies all the requirements we consider. This constitutes the most complex formal analysis of a web application based on an expressive model of the web infrastructure so far. As another contribution, we identify and prove important security properties of generic web features in the extended web model to facilitate future analysis efforts of web standards and web applications.Comment: arXiv admin note: substantial text overlap with arXiv:1403.186

    The Web SSO Standard OpenID Connect: In-Depth Formal Security Analysis and Security Guidelines

    Full text link
    Web-based single sign-on (SSO) services such as Google Sign-In and Log In with Paypal are based on the OpenID Connect protocol. This protocol enables so-called relying parties to delegate user authentication to so-called identity providers. OpenID Connect is one of the newest and most widely deployed single sign-on protocols on the web. Despite its importance, it has not received much attention from security researchers so far, and in particular, has not undergone any rigorous security analysis. In this paper, we carry out the first in-depth security analysis of OpenID Connect. To this end, we use a comprehensive generic model of the web to develop a detailed formal model of OpenID Connect. Based on this model, we then precisely formalize and prove central security properties for OpenID Connect, including authentication, authorization, and session integrity properties. In our modeling of OpenID Connect, we employ security measures in order to avoid attacks on OpenID Connect that have been discovered previously and new attack variants that we document for the first time in this paper. Based on these security measures, we propose security guidelines for implementors of OpenID Connect. Our formal analysis demonstrates that these guidelines are in fact effective and sufficient.Comment: An abridged version appears in CSF 2017. Parts of this work extend the web model presented in arXiv:1411.7210, arXiv:1403.1866, arXiv:1508.01719, and arXiv:1601.0122

    Mapping and Displaying Structural Transformations between XML and PDF

    Get PDF
    Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving. Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'. This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window. Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document

    Study of Tools Interoperability

    Get PDF
    Interoperability of tools usually refers to a combination of methods and techniques that address the problem of making a collection of tools to work together. In this study we survey different notions that are used in this context: interoperability, interaction and integration. We point out relation between these notions, and how it maps to the interoperability problem. We narrow the problem area to the tools development in academia. Tools developed in such environment have a small basis for development, documentation and maintenance. We scrutinise some of the problems and potential solutions related with tools interoperability in such environment. Moreover, we look at two tools developed in the Formal Methods and Tools group1, and analyse the use of different integration techniques
    • ā€¦
    corecore