2,901 research outputs found
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
A Web-Based Tool for Analysing Normative Documents in English
Our goal is to use formal methods to analyse normative documents written in
English, such as privacy policies and service-level agreements. This requires
the combination of a number of different elements, including information
extraction from natural language, formal languages for model representation,
and an interface for property specification and verification. We have worked on
a collection of components for this task: a natural language extraction tool, a
suitable formalism for representing such documents, an interface for building
models in this formalism, and methods for answering queries asked of a given
model. In this work, each of these concerns is brought together in a web-based
tool, providing a single interface for analysing normative texts in English.
Through the use of a running example, we describe each component and
demonstrate the workflow established by our tool
Proposed AIS Binary Message Format Using XML for Providing Hydrographic-related Information
UNH is working with the USCG and NOAA to use XML (Extensible Markup Language) to define binary messages for maritime-based AIS (Automatic Identification System). A draft specification format is under development that will enable hydrographic and maritime safety agencies to encode AIS message contents by providing a bit-level description in XML (informally known the AIS Binary Message Decoder Ring ). An AIS binary message definition in XML specifies the order, length, and type of fields following a subset of that used by the ITU-R.M.1371-1. The specification is independent of programming language (e.g., can be implemented in C, C++, C#, Java, Python, etc.) to allow vendors to integrate the system into their individual design requirements. The draft specification also contains a reference implementation of an AIS XML to Python compiler that has been released as open-source under the GNU General Public License (GPL) version 2. A XML schema and an additional program will provide validation of the XML message definitions. A XSLT style sheet produces reference documentation in âhtmlâ format. Although the XML message definition file specifies the order, size, and type of the bit stream, it does not specify semantics or how binary messages should be displayed on a shipboard ECDIS, or presented on other shipboard/shore-side display devices
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Annotation graphs and annotation servers offer infrastructure to support the
analysis of human language resources in the form of time-series data such as
text, audio and video. This paper outlines areas of common need among empirical
linguists and computational linguists. After reviewing examples of data and
tools used or under development for each of several areas, it proposes a common
framework for future tool development, data annotation and resource sharing
based upon annotation graphs and servers.Comment: 8 pages, 6 figure
Reverse Proxy Framework using Sanitization Technique for Intrusion Prevention in Database
With the increasing importance of the internet in our day to day life, data
security in web application has become very crucial. Ever increasing on line
and real time transaction services have led to manifold rise in the problems
associated with the database security. Attacker uses illegal and unauthorized
approaches to hijack the confidential information like username, password and
other vital details. Hence the real time transaction requires security against
web based attacks. SQL injection and cross site scripting attack are the most
common application layer attack. The SQL injection attacker pass SQL statement
through a web applications input fields, URL or hidden parameters and get
access to the database or update it. The attacker take a benefit from user
provided data in such a way that the users input is handled as a SQL code.
Using this vulnerability an attacker can execute SQL commands directly on the
database. SQL injection attacks are most serious threats which take users input
and integrate it into SQL query. Reverse Proxy is a technique which is used to
sanitize the users inputs that may transform into a database attack. In this
technique a data redirector program redirects the users input to the proxy
server before it is sent to the application server. At the proxy server, data
cleaning algorithm is triggered using a sanitizing application. In this
framework we include detection and sanitization of the tainted information
being sent to the database and innovate a new prototype.Comment: 9 pages, 6 figures, 3 tables; CIIT 2013 International Conference,
Mumba
Document Image Analysis for World War II Personal Records
Complete collections of invaluable documents of unique historical and political significance are decaying and at the same time they are virtually inaccessible, necessitating the invention of robust and efficient methods for their conversion into a searchable electronic form. This paper presents the issues encountered and problems addressed in the MEMORIAL project, whose goal is the establishment of a digital document workbench enabling the creation of distributed virtual archives based on documents existing in libraries, archives, museums, memorials, and public record offices. Successful approaches are described in the context of the chosen data class: a variety of typewritten documents containing personal information relating to the presence of individuals in World War II Nazi concentration camps
1st INCF Workshop on Sustainability of Neuroscience Databases
The goal of the workshop was to discuss issues related to the sustainability of neuroscience databases, identify problems and propose solutions, and formulate recommendations to the INCF. The report summarizes the discussions of invited participants from the neuroinformatics community as well as from other disciplines where sustainability issues have already been approached. The recommendations for the INCF involve rating, ranking, and supporting database sustainability
Multimodal Grammar Implementation
This paper reports on an implementation of a multimodal grammar of speech and co-speech gesture within the LKB/PET grammar engineering environment. The implementation extends the English Resource Grammar (ERG, Flickinger (2000)) with HPSG types and rules that capture the form of the linguistic signal, the form of the gestural signal and their relative timing to constrain the meaning of the multimodal action. The grammar yields a single parse tree that integrates the spoken and gestural modality thereby drawing on standard semantic composition techniques to derive the multimodal meaning representation. Using the current machinery, the main challenge for the grammar engineer is the nonlinear input: the modalities can overlap temporally. We capture this by identical speech and gesture token edges. Further, the semantic contribution of gestures is encoded by lexical rules transforming a speech phrase into a multimodal entity of conjoined spoken and gestural semantics.
- âŠ