178 research outputs found

    DISI -Via Sommarive 14 -38123 Povo

    Get PDF
    Abstract Handling everyday tasks such as search, classification and integration is becoming increasingly difficult and sometimes even impossible due to the increasing streams of data available. To overcome such an information overload we need more accurate information processing tools capable of handling big amounts of data. In particular, handling metadata can give us leverage over the data and enable structured processing of data, however, while some of this metadata is in a computer readable format, some of it is manually created in ambiguous natural language. Thus, accessing the semantics of natural language can increase the quality of information processing. We propose a natural language metadata understanding architecture that enables applications such as semantic matching, classification and search based on natural language metadata by providing a translation into a formal language which outperforms the state of the art by 15%

    Why use or?

    Get PDF
    Or constructions introduce a set of alternatives into the discourse. But alternativity does not exhaust speakers' intended messages. Speakers use the profiled or alternatives as a starting point for expressing a variety of readings. Ever since (Grice, H. Paul. 1989. Studies in the way of words. Cambridge, MA: Harvard University Press) and (Horn. 1972. On the semantic properties of the logical operators in English. Los Angeles, CA: University of California Los Angeles dissertation), the standard approach has assumed that or has an inclusive lexical meaning and a predominantly exclusive use, thus focusing on two readings. While another, "free choice", reading has been added to the repertoire, accounting for the exclusive reading remains a goal all or theorists must meet. We here propose that both "inclusive" and "exclusive" interpretations, as currently defined, do not capture speakers' intended readings, which we equate with the relevance-theoretic explicature. Adopting a usage-based approach to language, we examined all the or occurrences in the Santa Barbara Corpus of spoken American English (1053 tokens), and found that speakers use or utterances for a far richer variety of readings than has been recognized. In line with Cognitive Linguistics, we propose that speakers' communicated intentions are better analyzed in terms of subjective construals, rather than the objective conditions obtaining when the or proposition is true. We argue that in two of these readings speakers are not necessarily committed to even one of the alternatives being the case. In the most frequent reading, the overt disjuncts only serve as pointers to a higher-level concept, and it is that concept that the speaker intends to refer to

    Implementation of a knowledge discovery and enhancement module from structured information gained from unstructured sources of information

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Ontologies on the semantic web

    Get PDF
    As an informational technology, the World Wide Web has enjoyed spectacular success. In just ten years it has transformed the way information is produced, stored, and shared in arenas as diverse as shopping, family photo albums, and high-level academic research. The “Semantic Web” was touted by its developers as equally revolutionary but has not yet achieved anything like the Web’s exponential uptake. This 17 000 word survey article explores why this might be so, from a perspective that bridges both philosophy and IT

    Identifying nocuous ambiguity in natural language requirements

    Get PDF
    This dissertation is an investigation into how ambiguity should be classified for authors and readers of text, and how this process can be automated. Usually, authors and readers disambiguate ambiguity, either consciously or unconsciously. However, disambiguation is not always appropriate. For instance, a linguistic construction may be read differently by different people, with no consensus about which reading is the intended one. This is particularly dangerous if they do not realise that other readings are possible. Misunderstandings may then occur. This is particularly serious in the field of requirements engineering. If requirements are misunderstood, systems may be built incorrectly, and this can prove very costly. Our research uses natural language processing techniques to address ambiguity in requirements. We develop a model of ambiguity, and a method of applying it, which represent a novel approach to the problem described here. Our model is based on the notion that human perception is the only valid criterion for judging ambiguity. If people perceive very differently how an ambiguity should be read, it will cause misunderstandings. Assigning a preferred reading to it is therefore unwise. In text, such ambiguities should be located and rewritten in a less ambiguous form; others need not be reformulated. We classify the former as nocuous and the latter as innocuous. We allow the dividing line between these two classifications to be adjustable. We term this the ambiguity threshold, and it represents a level of intolerance to ambiguity. A nocuous ambiguity can be an unacknowledged or an acknowledged ambiguity for a given set of readers. In the former case, they assign disparate readings to the ambiguity, but each is unaware that the others read it differently. In the latter case, they recognise that the ambiguity has more than one reading, but this fact may be unacknowledged by new readers. We present an automated approach to determine whether ambiguities in text are nocuous or innocuous. We use heuristics to distinguish ambiguities for which there is a strong consensus about how they should be read. These are innocuous ambiguities. The remaining nocuous ambiguities can then be rewritten at a later stage. We find consensus opinions about ambiguities by surveying human perceptions on them. Our heuristics try to predict these perceptions automatically. They utilise various types of linguistic information: generic corpus data, morphology and lexical subcategorisations are the most successful. We use coordination ambiguity as the test case for this research. This occurs where the scope of words such as and and or is unclear. Our research contributes to both the requirements engineering and the natural language processing literatures. Ambiguity is known to be a serious problem in requirements engineering, but has rarely been dealt with effectively and thoroughly. Our approach is an appropriate solution, and our flexible ambiguity threshold is a particularly useful concept. For instance, high ambiguity intolerance can be implemented when writing requirements for safety-critical systems. Coordination ambiguities are widespread and known to cause misunderstandings, but have received comparatively little attention. Our heuristics show that linguistic data can be used successfully to predict preferred readings of very diverse coordinations. Used in combination, these heuristics demonstrate that nocuous ambiguity can be distinguished from innocuous ambiguity under certain conditions. Employing appropriate ambiguity thresholds, accuracy representing 28% improvement on the baselines can be achieved

    The Morphosyntactic Parser: Developing and testing a sentence processor that uses underspecified morphosyntactic features

    Get PDF
    This dissertation presents a fundamentally new approach to describe not only the architecture of the language system but also the processes behind its capability to predict, analyze and integrate linguistic input into its representation in a parsimonious way. By the example of morphosyntax, underspecified case, the use of decomposed, binary case, number and gender features to account for syncretism, will offer insights into both: Carrying over this idea to language processing raises the question whether the language system—limited in its storage capacity—makes use of similar means of representational parsimony during the processing of linguistic input. This thesis will propose a processing system that is tightly related to the aforementioned architectural assumptions of morphosyntactically underspecified lexical entries as a parsimonious way of representation. In that sense, prediction is viewed as the language system’s drive to avoid feature deviance from one incrementally available linguistic element to another subsequentially incoming one. In this way, the parser’s goal is to maintain minimal feature deviance or at best feature identity to keep processing load as low as possible. This approach allows for position-dependent hypothesis with regard to the expected processing load. To test the processor’s claims, the electrophysiological data of a series of event-related brain potential (ERP) experiments will be presented. The results suggest that with the input’s increased feature deviance the amplitude of an ERP component sensitive for prediction error increases. In comparison to that, elements that rather maintain feature identity and that do not lack or introduce additional features to the analysis do not increase processing difficulty. These results indicate that the language processing system uses the available features of morphosyntactically underspecified mental entries to build up larger constituents. The experiments showed, that this buildup process is determined by the language system’s drive to avoid feature deviance

    The Progress of Ambiguity: Uncertain Imagery in Digital Culture

    Get PDF
    Within a culture of persistent efficiency, ambiguous imagery represents a critical alternative. This thesis bridges studies in technology history, network and political theory, and art history. It attempts to account for contemporary artistic practices that critically address some of the objectionable tendencies within digital culture. These practices, this thesis proposes, may be best characterized by their radical use of ambiguity and un-certainty – qualities at clear odds with the rational, efficient nature of digital technologies. This thesis indicates a lineage of this nature in computer and Internet history, twentieth-century cybernetics, and larger philosophic histories. Rooted in symbolic logic, digital technologies carry a heritage of disambiguation—a dominancy of overdetermined, reason-based principles writ furtively in algorithms and protocols. They thus espouse ideologies via systematized calculation and centralized command, despite the commonly-perceived transparency, fluidity and egalitarianism of the Net. Working within-but-against these surreptitious structures are radical practices that critique, undermine, leverage, and offer alternatives to ideologies of disambiguation. In opposition to a contracted, answers-fixated dominant culture, artists are advantageously positioned to point back to the realm of questions – in all of its arable uncertainty, inquisitiveness and ambiguity. This thesis is structured around case-studies of artwork made by Constant Dullaart, Rosa Menkman, Jon Rafman, Internet Surfing Clubs, Ryan Trecartin, and Oliver Laric. Their practices contest the disambiguous nature of digital technologies to open up critical fissures in the semantic structure of digital culture

    Closed-Loop Learning of Visual Control Policies

    Full text link
    In this paper we present a general, flexible framework for learning mappings from images to actions by interacting with the environment. The basic idea is to introduce a feature-based image classifier in front of a reinforcement learning algorithm. The classifier partitions the visual space according to the presence or absence of few highly informative local descriptors that are incrementally selected in a sequence of attempts to remove perceptual aliasing. We also address the problem of fighting overfitting in such a greedy algorithm. Finally, we show how high-level visual features can be generated when the power of local descriptors is insufficient for completely disambiguating the aliased states. This is done by building a hierarchy of composite features that consist of recursive spatial combinations of visual features. We demonstrate the efficacy of our algorithms by solving three visual navigation tasks and a visual version of the classical Car on the Hill control problem

    CORLEONE - Core Linguistic Entity Online Extraction

    Get PDF
    This report presents CORLEONE (Core Linguistic Entity Online Extraction) - a pool of loosely coupled general-purpose basic lightweight linguistic processing resources, which can be independently used to identify core linguistic entities and their features in free texts. Currently, CORLEONE consists of five processing resources: (a) a basic tokenizer, (b) a tokenizer which performs fine-grained token classification, (c) a component for performing morphological analysis, and (d) a memory-efficient database-like dictionary look-up component, and (e) sentence splitter. Linguistic resources for several languages are provided. Additionally, CORLEONE includes a comprehensive library of string distance metrics relevant for the task of name variant matching. CORLEONE has been developed in the Java programming language and heavily deploys state-of-the-art finite-state techniques. Noteworthy, CORLEONE components are used as basic linguistic processing resources in ExPRESS, a pattern matching engine based on regular expressions over feature structures and in the real-time news event extraction system, which were developed by the Web Mining and Intelligence Group of the Support to External Security Unit of IPSC. This report constitutes an end-user guide for COLREONE and provides scientifically interesting details of how it was implemented.JRC.G.2-Support to external securit
    corecore