Search CORE

2,901 research outputs found

ATLAS: A flexible and extensible architecture for linguistic annotation

Author: Bird Steven
Day David
Garofolo John
Henderson John
Laprun Christophe
Liberman Mark
Publication venue
Publication date: 01/01/2000
Field of study

We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

A Web-Based Tool for Analysing Normative Documents in English

Author: Azzopardi Shaun
Azzopardi Shaun
Mercatali Pietro
Prisacariu Cristian
Ranta Aarne
Wyner Adam
Wyner Adam
Publication venue
Publication date: 13/07/2017
Field of study

Our goal is to use formal methods to analyse normative documents written in English, such as privacy policies and service-level agreements. This requires the combination of a number of different elements, including information extraction from natural language, formal languages for model representation, and an interface for property specification and verification. We have worked on a collection of components for this task: a natural language extraction tool, a suitable formalism for representing such documents, an interface for building models in this formalism, and methods for answering queries asked of a given model. In this work, each of these concerns is brought together in a web-based tool, providing a single interface for analysing normative texts in English. Through the use of a running example, we describe each component and demonstrate the workflow established by our tool

arXiv.org e-Print Archive

Crossref

Chalmers Research

Proposed AIS Binary Message Format Using XML for Providing Hydrographic-related Information

Author: Alexander Lee
Kurt Schwehr
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/05/2007
Field of study

UNH is working with the USCG and NOAA to use XML (Extensible Markup Language) to define binary messages for maritime-based AIS (Automatic Identification System). A draft specification format is under development that will enable hydrographic and maritime safety agencies to encode AIS message contents by providing a bit-level description in XML (informally known the AIS Binary Message Decoder Ring ). An AIS binary message definition in XML specifies the order, length, and type of fields following a subset of that used by the ITU-R.M.1371-1. The specification is independent of programming language (e.g., can be implemented in C, C++, C#, Java, Python, etc.) to allow vendors to integrate the system into their individual design requirements. The draft specification also contains a reference implementation of an AIS XML to Python compiler that has been released as open-source under the GNU General Public License (GPL) version 2. A XML schema and an additional program will provide validation of the XML message definitions. A XSLT style sheet produces reference documentation in ‘html’ format. Although the XML message definition file specifies the order, size, and type of the bit stream, it does not specify semantics or how binary messages should be displayed on a shipboard ECDIS, or presented on other shipboard/shore-side display devices

UNH Scholars' Repository

Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development

Author: Bird Steven
Cieri Christopher
Publication venue
Publication date: 01/01/2001
Field of study

Annotation graphs and annotation servers offer infrastructure to support the analysis of human language resources in the form of time-series data such as text, audio and video. This paper outlines areas of common need among empirical linguists and computational linguists. After reviewing examples of data and tools used or under development for each of several areas, it proposes a common framework for future tool development, data annotation and resource sharing based upon annotation graphs and servers.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Reverse Proxy Framework using Sanitization Technique for Intrusion Prevention in Database

Author: Chougule Archana
Mukhopadhyay Debajyoti
Randhe Vrushali
Publication venue
Publication date: 01/01/2013
Field of study

With the increasing importance of the internet in our day to day life, data security in web application has become very crucial. Ever increasing on line and real time transaction services have led to manifold rise in the problems associated with the database security. Attacker uses illegal and unauthorized approaches to hijack the confidential information like username, password and other vital details. Hence the real time transaction requires security against web based attacks. SQL injection and cross site scripting attack are the most common application layer attack. The SQL injection attacker pass SQL statement through a web applications input fields, URL or hidden parameters and get access to the database or update it. The attacker take a benefit from user provided data in such a way that the users input is handled as a SQL code. Using this vulnerability an attacker can execute SQL commands directly on the database. SQL injection attacks are most serious threats which take users input and integrate it into SQL query. Reverse Proxy is a technique which is used to sanitize the users inputs that may transform into a database attack. In this technique a data redirector program redirects the users input to the proxy server before it is sent to the application server. At the proxy server, data cleaning algorithm is triggered using a sanitizing application. In this framework we include detection and sanitization of the tainted information being sent to the database and innovate a new prototype.Comment: 9 pages, 6 figures, 3 tables; CIIT 2013 International Conference, Mumba

arXiv.org e-Print Archive

Crossref

Document Image Analysis for World War II Personal Records

Author: Antonacopoulos Apostolos
Karatzas Dimosthenis
Publication venue
Publication date: 01/01/2004
Field of study

Complete collections of invaluable documents of unique historical and political significance are decaying and at the same time they are virtually inaccessible, necessitating the invention of robust and efficient methods for their conversion into a searchable electronic form. This paper presents the issues encountered and problems addressed in the MEMORIAL project, whose goal is the establishment of a digital document workbench enabling the creation of distributed virtual archives based on documents existing in libraries, archives, museums, memorials, and public record offices. Successful approaches are described in the context of the chosen data class: a variety of typewritten documents containing personal information relating to the presence of individuals in World War II Nazi concentration camps

CiteSeerX

Southampton (e-Prints Soton)

1st INCF Workshop on Sustainability of Neuroscience Databases

Author: Jaap van Pelt
Jack Van Horn
Publication venue
Publication date: 17/06/2008
Field of study

The goal of the workshop was to discuss issues related to the sustainability of neuroscience databases, identify problems and propose solutions, and formulate recommendations to the INCF. The report summarizes the discussions of invited participants from the neuroinformatics community as well as from other disciplines where sustainability issues have already been approached. The recommendations for the INCF involve rating, ranking, and supporting database sustainability

Crossref

Nature Precedings

Multimodal Grammar Implementation

Author: Alahverdzhieva Katya
Flickinger Dan
Lascarides Alex
Publication venue
Publication date: 01/01/2012
Field of study

This paper reports on an implementation of a multimodal grammar of speech and co-speech gesture within the LKB/PET grammar engineering environment. The implementation extends the English Resource Grammar (ERG, Flickinger (2000)) with HPSG types and rules that capture the form of the linguistic signal, the form of the gestural signal and their relative timing to constrain the meaning of the multimodal action. The grammar yields a single parse tree that integrates the spoken and gestural modality thereby drawing on standard semantic composition techniques to derive the multimodal meaning representation. Using the current machinery, the main challenge for the grammar engineer is the nonlinear input: the modalities can overlap temporally. We capture this by identical speech and gesture token edges. Further, the semantic contribution of gestures is encoded by lexical rules transforming a speech phrase into a multimodal entity of conjoined spoken and gestural semantics.

CiteSeerX

Edinburgh Research Explorer