Search CORE

27 research outputs found

Formal Linguistic Models and Knowledge Processing. A Structuralist Approach to Rule-Based Ontology Learning and Population

Author: Di Buono Maria Pia
Publication venue: Universita degli studi di Salerno
Publication date: 02/03/2016
Field of study

2013 - 2014The main aim of this research is to propose a structuralist approach for knowledge processing by means of ontology learning and population, achieved starting from unstructured and structured texts. The method suggested includes distributional semantic approaches and NL formalization theories, in order to develop a framework, which relies upon deep linguistic analysis... [edited by author]XIII n.s

EleA@UniSA - Università degli Studi di Salerno

Computational linguistics in the Netherlands 1996 : papers from the 7th CLIN meeting, November 15, 1996, Eindhoven

Author
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/1997
Field of study

Pure OAI Repository

Computational linguistics in the Netherlands 1996 : papers from the 7th CLIN meeting, November 15, 1996, Eindhoven

Author
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/1997
Field of study

Pure OAI Repository

Darstellung und stochastische Auflösung von Ambiguität in constraint-basiertem Parsing

Author: Eisele Andreas
Publication venue
Publication date: 05/02/2013
Field of study

Diese Arbeit untersucht zwei komplementäre Ansätze zum Umgang mit Mehrdeutigkeiten bei der automatischen Verarbeitung natürlicher Sprache. Zunächst werden Methoden vorgestellt, die es erlauben, viele konkurrierende Interpretationen in einer gemeinsamen Datenstruktur kompakt zu repräsentieren. Dann werden Ansätze vorgeschlagen, die verschiedenen Interpretationen mit Hilfe von stochastischen Modellen zu bewerten. Für das dabei auftretende Problem, Wahrscheinlichkeiten von seltenen Ereignissen zu schätzen, die in den Trainingsdaten nicht auftraten, werden neuartige Methoden vorgeschlagen.This thesis investigates two complementary approches to cope with ambiguities in natural language processing. It first presents methods that allow to store many competing interpretations compactly in one shared datastructure. It then suggests approaches to score the different interpretations using stochastic models. This leads to the problem of estimation of probabilities of rare events that have not been observed in the training data, for which novel methods are proposed

Inquiries into words, constraints and contexts : Festschrift in the honour of Kimmo Koskenniemi on his 60th birthday

Author: Arppe Antti
Carlson Lauri
Linden Krister
Piitulainen Jussi Olavi
Suominen Mickael
Vainio Martti
Westerlund Hanna
Yli-Jyrä Anssi Mikael
Publication venue: CSLI publications
Publication date: 01/01/2005
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Active Learning - An Explicit Treatment of Unreliable Parameters

Author: Becker Markus
Publication venue
Publication date: 01/01/2008
Field of study

Institute for Communicating and Collaborative SystemsActive learning reduces annotation costs for supervised learning by concentrating labelling efforts on the most informative data. Most active learning methods assume that the model structure is fixed in advance and focus upon improving parameters within that structure. However, this is not appropriate for natural language processing where the model structure and associated parameters are determined using labelled data. Applying traditional active learning methods to natural language processing can fail to produce expected reductions in annotation cost. We show that one of the reasons for this problem is that active learning can only select examples which are already covered by the model. In this thesis, we better tailor active learning to the need of natural language processing as follows. We formulate the Unreliable Parameter Principle: Active learning should explicitly and additionally address unreliably trained model parameters in order to optimally reduce classification error. In order to do so, we should target both missing events and infrequent events. We demonstrate the effectiveness of such an approach for a range of natural language processing tasks: prepositional phrase attachment, sequence labelling, and syntactic parsing. For prepositional phrase attachment, the explicit selection of unknown prepositions significantly improves coverage and classification performance for all examined active learning methods. For sequence labelling, we introduce a novel active learning method which explicitly targets unreliable parameters by selecting sentences with many unknown words and a large number of unobserved transition probabilities. For parsing, targeting unparseable sentences significantly improves coverage and f-measure in active learning

CiteSeerX

Edinburgh Research Archive

Combining bayesian and support vector machines learning to automatically complete syntactical information for HPSG-like formalisms

Author: George Kokkinakis
Katia Kermanidis
Manolis Maragoudakis
Nikos Fakotakis
Publication venue
Publication date
Field of study

Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We have made use of minimal linguistic resources, such as basic morphological tagging and phrase chunking, to demonstrate that verb subcategorization, which is of great significance for developing robust natural language human computer interaction systems, could be achieved using large corpora, without having any general-purpose syntactic parser at all. Moreover, by taking advantage of the plethora in unlabeled data found in text corpora in addition to some available labeled examples, we overcome the expensive task of annotating the whole set of training data and the performance of the subcategorization frames learner is increased. We argue that a classifier generated from BBN and SVM is well suited for learning to identify verb subcategorization frames. Empirical results will support this claim. Performance has been methodically evaluated using two different corpora, one balanced and one domain-specific in order to determine the unbiased behavior of the trained models. Limited training data are proved to endow with satisfactory results. We have been able to achieve precision exceeding 90 % on the identification of subcategorization frames which were not known beforehand. The obtained valid frames have been used to fill out the subcategorization field of verb entries in an HPSG-like lexicon using the LKB grammar development environment

CiteSeerX

Lexical Selection for Machine Translation

Author: Sabtan Yasser
Publication venue
Publication date: 01/08/2011
Field of study

The University of Manchester - Institutional Repository

Recommended from our members

Deciphering clinical text: concept recognition in primary care text notes

Author: Savkov Aleksandar Dimitrov
Publication venue
Publication date: 03/05/2017
Field of study

Electronic patient records, containing data about the health and care of a patient, are a valuable source of information for longitudinal clinical studies. The General Practice Research Database (GPRD) has collected patient records from UK primary care practices since the late 1980s. These records contain both structured data (in the form of codes and numeric values) and free text notes. While the structured data have been used extensively in clinical studies, there are significant practical obstacles in extracting information from the free text notes. The main obstacles are data access restrictions, due to the presence of sensitive information, and the specific language of medical practitioners, which renders standard language processing tools ineffective. The aim of this research is to investigate approaches for computer analysis of free text notes. The research involved designing a primary care text corpus (the Harvey Corpus) annotated with syntactic chunks and clinically-relevant semantic entities, developing a statistical chunking model, and devising a novel method for applying machine learning for entity recognition based on chunk annotation. The tools produced would facilitate reliable information extraction from primary care patient records, needed for the development of clinically-related research. The three medical concept types targeted in this thesis could contribute to epidemiological studies by enhancing the detection of co-morbidities, and better analysing the descriptions of patient experiences and treatments. The main contributions of the research reported in this thesis are: guidelines for chunk and concept annotation of clinical text, an approach to maximising agreement between human annotators, the Harvey Corpus, a method for using a standard part-of-speech tagging model in clinical text chunking, and a novel approach to recognising clinically relevant medical concepts

Sussex Research Online

Recommended from our members

Aspects of emergent cyclicity in language and computation

Author: Krivochen Diego G.
Publication venue
Publication date
Field of study

This thesis has four parts, which correspond to the presentation and development of a theoretical framework for the study of cognitive capacities qua physical phenomena, and a case study of locality conditions over natural languages. Part I deals with computational considerations, setting the tone of the rest of the thesis, and introducing and defining critical concepts like ‘grammar’, ‘automaton’, and the relations between them . Fundamental questions concerning the place of formal language theory in linguistic inquiry, as well as the expressibility of linguistic and computational concepts in common terms, are raised in this part. Part II further explores the issues addressed in Part I with particular emphasis on how grammars are implemented by means of automata, and the properties of the formal languages that these automata generate. We will argue against the equation between effective computation and function-based computation, and introduce examples of computable procedures which are nevertheless impossible to capture using traditional function-based theories. The connection with cognition will be made in the light of dynamical frustrations: the irreconciliable tension between mutually incompatible tendencies that hold for a given dynamical system. We will provide arguments in favour of analyzing natural language as emerging from a tension between different systems (essentially, semantics and morpho-phonology) which impose orthogonal requirements over admissible outputs. The concept of level of organization or scale comes to the foreground here; and apparent contradictions and incommensurabilities between concepts and theories are revisited in a new light: that of dynamical nonlinear systems which are fundamentally frustrated. We will also characterize the computational system that emerges from such an architecture: the goal is to get a syntactic component which assigns the simplest possible structural description to sub-strings, in terms of its computational complexity. A system which can oscillate back and forth in the hierarchy of formal languages in assigning structural representations to local domains will be referred to as a computationally mixed system. Part III is where the really fun stuff starts. Field theory is introduced, and its applicability to neurocognitive phenomena is made explicit, with all due scale considerations. Physical and mathematical concepts are permanently interacting as we analyze phrase structure in terms of pseudo-fractals (in Mandelbrot’s sense) and define syntax as a (possibly unary) set of topological operations over completely Hausdorff (CH) ultrametric spaces. These operations, which makes field perturbations interfere, transform that initial completely Hausdorff ultrametric space into a metric, Hausdorff space with a weaker separation axiom. Syntax, in this proposal, is not ‘generative’ in any traditional sense –except the ‘fully explicit theory’ one-: rather, it partitions (technically, ‘parametrizes’) a topological space. Syntactic dependencies are defined as interferences between perturbations over a field, which reduce the total entropy of the system per cycles, at the cost of introducing further dimensions where attractors corresponding to interpretations for a phrase marker can be found. Part IV is a sample of what we can gain by further pursuing the physics of language approach, both in terms of empirical adequacy and theoretical elegance, not to mention the unlimited possibilities of interdisciplinary collaboration. In this section we set our focus on island phenomena as defined by Ross (1967), critically revisiting the most relevant literature on this topic, and establishing a typology of constructions that are strong islands, which cannot be violated. These constructions are particularly interesting because they limit the phase space of what is expressible via natural language, and thus reveal crucial aspects of its underlying dynamics. We will argue that a dynamically frustrated system which is characterized by displaying mixed computational dependencies can provide straightforward characterizations of cyclicity in terms of changes in dependencies in local domains

Central Archive at the University of Reading