Search CORE

8,273 research outputs found

A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

Author: Lampesberger Harald
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Learning Language from a Large (Unannotated) Corpus

Author: Goertzel Ben
Vepstas Linas
Publication venue
Publication date: 14/01/2014
Field of study

A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa

arXiv.org e-Print Archive

CiteSeerX

A Requirements-Based Exploration of Open-Source Software Development Projects – Towards a Natural Language Processing Software Analysis Framework

Author: Vlas Radu
Publication venue: ScholarWorks @ Georgia State University
Publication date: 07/08/2012
Field of study

Open source projects do have requirements; they are, however, mostly informal, text descriptions found in requests, forums, and other correspondence. Understanding such requirements provides insight into the nature of open source projects. Unfortunately, manual analysis of natural language requirements is time-consuming, and for large projects, error-prone. Automated analysis of natural language requirements, even partial, will be of great benefit. Towards that end, I describe the design and validation of an automated natural language requirements classifier for open source software development projects. I compare two strategies for recognizing requirements in open forums of software features. The results suggest that classifying text at the forum post aggregation and sentence aggregation levels may be effective. Initial results suggest that it can reduce the effort required to analyze requirements of open source software development projects. Software development organizations and communities currently employ a large number of software development techniques and methodologies. This implied complexity is also enhanced by a wide range of software project types and development environments. The resulting lack of consistency in the software development domain leads to one important challenge that researchers encounter while exploring this area: specificity. This results in an increased difficulty of maintaining a consistent unit of measure or analysis approach while exploring a wide variety of software development projects and environments. The problem of specificity is more prominently exhibited in an area of software development characterized by a dynamic evolution, a unique development environment, and a relatively young history of research when compared to traditional software development: the open-source domain. While performing research on open source and the associated communities of developers, one can notice the same challenge of specificity being present in requirements engineering research as in the case of closed-source software development. Whether research is aimed at performing longitudinal or cross-sectional analyses, or attempts to link requirements to other aspects of software development projects and their management, specificity calls for a flexible analysis tool capable of adapting to the needs and specifics of the explored context. This dissertation covers the design, implementation, and evaluation of a model, a method, and a software tool comprising a flexible software development analysis framework. These design artifacts use a rule-based natural language processing approach and are built to meet the specifics of a requirements-based analysis of software development projects in the open-source domain. This research follows the principles of design science research as defined by Hevner et. al. and includes stages of problem awareness, suggestion, development, evaluation, and results and conclusion (Hevner et al. 2004; Vaishnavi and Kuechler 2007). The long-term goal of the research stream stemming from this dissertation is to propose a flexible, customizable, requirements-based natural language processing software analysis framework which can be adapted to meet the research needs of multiple different types of domains or different categories of analyses

ScholarWorks @ Georgia State University

Modeling Perception-Action Loops: Comparing Sequential Models with Frame-Based Classifiers

Author: Bailly Gérard
Mihoub Alaeddine
Wolf Christian
Publication venue: HAL CCSD
Publication date: 28/10/2014
Field of study

International audienceModeling multimodal perception-action loops in face-to-face interactions is a crucial step in the process of building sensory-motor behaviors for social robots or users-aware Embodied Conversational Agents (ECA). In this paper, we compare trainable behavioral models based on sequential models (HMMs) and classifiers (SVMs and Decision Trees) inherently inappropriate to model sequential aspects. These models aim at giving pertinent perception/action skills for robots in order to generate optimal actions given the perceived actions of others and joint goals. We applied these models to parallel speech and gaze data collected from interacting dyads. The challenge was to predict the gaze of one subject given the gaze of the interlocutor and the voice activity of both. We show that Incremental Discrete HMM (IDHMM) generally outperforms classifiers and that injecting input context in the modeling process significantly improves the performances of all algorithms

CiteSeerX

Hal - Université Grenoble Alpes

Hal-Diderot

Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing

Author: Kübler Sandra
Piskorski Jakub
Przepiorkowski Adam
Publication venue
Publication date: 03/11/2008
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Preface

Author: Abraham Ajith
Jain Lakhmi
van der Zwaag B.J.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2004
Field of study

University of Twente Research Information