2,901 research outputs found

    ATLAS: A flexible and extensible architecture for linguistic annotation

    Full text link
    We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure

    A Web-Based Tool for Analysing Normative Documents in English

    Full text link
    Our goal is to use formal methods to analyse normative documents written in English, such as privacy policies and service-level agreements. This requires the combination of a number of different elements, including information extraction from natural language, formal languages for model representation, and an interface for property specification and verification. We have worked on a collection of components for this task: a natural language extraction tool, a suitable formalism for representing such documents, an interface for building models in this formalism, and methods for answering queries asked of a given model. In this work, each of these concerns is brought together in a web-based tool, providing a single interface for analysing normative texts in English. Through the use of a running example, we describe each component and demonstrate the workflow established by our tool

    Proposed AIS Binary Message Format Using XML for Providing Hydrographic-related Information

    Get PDF
    UNH is working with the USCG and NOAA to use XML (Extensible Markup Language) to define binary messages for maritime-based AIS (Automatic Identification System). A draft specification format is under development that will enable hydrographic and maritime safety agencies to encode AIS message contents by providing a bit-level description in XML (informally known the AIS Binary Message Decoder Ring ). An AIS binary message definition in XML specifies the order, length, and type of fields following a subset of that used by the ITU-R.M.1371-1. The specification is independent of programming language (e.g., can be implemented in C, C++, C#, Java, Python, etc.) to allow vendors to integrate the system into their individual design requirements. The draft specification also contains a reference implementation of an AIS XML to Python compiler that has been released as open-source under the GNU General Public License (GPL) version 2. A XML schema and an additional program will provide validation of the XML message definitions. A XSLT style sheet produces reference documentation in ‘html’ format. Although the XML message definition file specifies the order, size, and type of the bit stream, it does not specify semantics or how binary messages should be displayed on a shipboard ECDIS, or presented on other shipboard/shore-side display devices

    Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development

    Full text link
    Annotation graphs and annotation servers offer infrastructure to support the analysis of human language resources in the form of time-series data such as text, audio and video. This paper outlines areas of common need among empirical linguists and computational linguists. After reviewing examples of data and tools used or under development for each of several areas, it proposes a common framework for future tool development, data annotation and resource sharing based upon annotation graphs and servers.Comment: 8 pages, 6 figure

    Reverse Proxy Framework using Sanitization Technique for Intrusion Prevention in Database

    Full text link
    With the increasing importance of the internet in our day to day life, data security in web application has become very crucial. Ever increasing on line and real time transaction services have led to manifold rise in the problems associated with the database security. Attacker uses illegal and unauthorized approaches to hijack the confidential information like username, password and other vital details. Hence the real time transaction requires security against web based attacks. SQL injection and cross site scripting attack are the most common application layer attack. The SQL injection attacker pass SQL statement through a web applications input fields, URL or hidden parameters and get access to the database or update it. The attacker take a benefit from user provided data in such a way that the users input is handled as a SQL code. Using this vulnerability an attacker can execute SQL commands directly on the database. SQL injection attacks are most serious threats which take users input and integrate it into SQL query. Reverse Proxy is a technique which is used to sanitize the users inputs that may transform into a database attack. In this technique a data redirector program redirects the users input to the proxy server before it is sent to the application server. At the proxy server, data cleaning algorithm is triggered using a sanitizing application. In this framework we include detection and sanitization of the tainted information being sent to the database and innovate a new prototype.Comment: 9 pages, 6 figures, 3 tables; CIIT 2013 International Conference, Mumba

    Document Image Analysis for World War II Personal Records

    No full text
    Complete collections of invaluable documents of unique historical and political significance are decaying and at the same time they are virtually inaccessible, necessitating the invention of robust and efficient methods for their conversion into a searchable electronic form. This paper presents the issues encountered and problems addressed in the MEMORIAL project, whose goal is the establishment of a digital document workbench enabling the creation of distributed virtual archives based on documents existing in libraries, archives, museums, memorials, and public record offices. Successful approaches are described in the context of the chosen data class: a variety of typewritten documents containing personal information relating to the presence of individuals in World War II Nazi concentration camps

    1st INCF Workshop on Sustainability of Neuroscience Databases

    Get PDF
    The goal of the workshop was to discuss issues related to the sustainability of neuroscience databases, identify problems and propose solutions, and formulate recommendations to the INCF. The report summarizes the discussions of invited participants from the neuroinformatics community as well as from other disciplines where sustainability issues have already been approached. The recommendations for the INCF involve rating, ranking, and supporting database sustainability

    Multimodal Grammar Implementation

    Get PDF
    This paper reports on an implementation of a multimodal grammar of speech and co-speech gesture within the LKB/PET grammar engineering environment. The implementation extends the English Resource Grammar (ERG, Flickinger (2000)) with HPSG types and rules that capture the form of the linguistic signal, the form of the gestural signal and their relative timing to constrain the meaning of the multimodal action. The grammar yields a single parse tree that integrates the spoken and gestural modality thereby drawing on standard semantic composition techniques to derive the multimodal meaning representation. Using the current machinery, the main challenge for the grammar engineer is the nonlinear input: the modalities can overlap temporally. We capture this by identical speech and gesture token edges. Further, the semantic contribution of gestures is encoded by lexical rules transforming a speech phrase into a multimodal entity of conjoined spoken and gestural semantics.
    • 

    corecore