588 research outputs found

    Memory-Based Lexical Acquisition and Processing

    Get PDF
    Current approaches to computational lexicology in language technology are knowledge-based (competence-oriented) and try to abstract away from specific formalisms, domains, and applications. This results in severe complexity, acquisition and reusability bottlenecks. As an alternative, we propose a particular performance-oriented approach to Natural Language Processing based on automatic memory-based learning of linguistic (lexical) tasks. The consequences of the approach for computational lexicology are discussed, and the application of the approach on a number of lexical acquisition and disambiguation tasks in phonology, morphology and syntax is described.Comment: 18 page

    MBT: A Memory-Based Part of Speech Tagger-Generator

    Full text link
    We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using {\em IGTree}, a tree-based formalism for indexing and searching huge case bases.} The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure

    Nodalida 2005 - proceedings of the 15th NODALIDA conference

    Get PDF

    Interpretation of Natural-language Robot Instructions: Probabilistic Knowledge Representation, Learning, and Reasoning

    Get PDF
    A robot that can be simply told in natural language what to do -- this has been one of the ultimate long-standing goals in both Artificial Intelligence and Robotics research. In near-future applications, robotic assistants and companions will have to understand and perform commands such as set the table for dinner'', make pancakes for breakfast'', or cut the pizza into 8 pieces.'' Although such instructions are only vaguely formulated, complex sequences of sophisticated and accurate manipulation activities need to be carried out in order to accomplish the respective tasks. The acquisition of knowledge about how to perform these activities from huge collections of natural-language instructions from the Internet has garnered a lot of attention within the last decade. However, natural language is typically massively unspecific, incomplete, ambiguous and vague and thus requires powerful means for interpretation. This work presents PRAC -- Probabilistic Action Cores -- an interpreter for natural-language instructions which is able to resolve vagueness and ambiguity in natural language and infer missing information pieces that are required to render an instruction executable by a robot. To this end, PRAC formulates the problem of instruction interpretation as a reasoning problem in first-order probabilistic knowledge bases. In particular, the system uses Markov logic networks as a carrier formalism for encoding uncertain knowledge. A novel framework for reasoning about unmodeled symbolic concepts is introduced, which incorporates ontological knowledge from taxonomies and exploits semantically similar relational structures in a domain of discourse. The resulting reasoning framework thus enables more compact representations of knowledge and exhibits strong generalization performance when being learnt from very sparse data. Furthermore, a novel approach for completing directives is presented, which applies semantic analogical reasoning to transfer knowledge collected from thousands of natural-language instruction sheets to new situations. In addition, a cohesive processing pipeline is described that transforms vague and incomplete task formulations into sequences of formally specified robot plans. The system is connected to a plan executive that is able to execute the computed plans in a simulator. Experiments conducted in a publicly accessible, browser-based web interface showcase that PRAC is capable of closing the loop from natural-language instructions to their execution by a robot

    Natural Language Interpreter and Arithmetic Word Problem Solver

    Get PDF
    The field of Natural Language Processing (NLP) belongs nowadays to most studied and developing fields of Artificial Intelligence. Of countless applications of tasks of the NLP it could be particularly remarked that the intelligence test of a machine - Turing Test - involves detection of a human-like intelligence precisely through the language-based chat aimed to demonstrate sufficient mental capacities. In this sense, the computational analysis of language comprehension and production can thus be deemed of a prominent importance. This work has as its ultimate objective to combine for its outcomes results of the language parsing with notable strengths of the computers - manipulation of numbers. Therefore, two principal tasks of this project can be outlined. The parser of the natural language selected for this project - Catalan - is destined to find a syntactical representation of the given sentence, and the arithmetic word problem solver links up the established interpretation with resolution of an arithmetic word problem given in the natural language. Finally, the work concludes by discussion focused on analysis of results, opportune enhancements for the future work and possible ways how to address encountered issues and deficiencies.Avui en dia, el domini de Processament del Llenguatge Natural pertany als camps més tractats de la Intel·ligència Artificial. En el context de la varietat immensa de les seves aplicacions es pot destacar, que la prova d'intel·ligència de màquines - el test de Turing - comporta la detecció de la intel·ligència justament mitjançant el xat fent servir el llenguatge per demostrar les capacitats mentals. En aquest sentit, doncs, l'anàlisi computacional de la comprensió i producció del llenguatge pot considerar-se d'importància especial. Aquest treball té com a objectiu entrellaçar per les seves sortides els resultats de l'anàlisi de llenguatge natural amb el punt característicament fort dels ordinadors - la manipulació amb números. Dit això, es poden delimitar dues tasques principals en que consisteix el present projecte. Per una banda s'hi té l'analitzador del Català encarregat d'esbrinar la representació sintàctica de la frase donada i per l'altra, el sistema per resoldre problemes aritmètics senzills que permet passar de la interpretació de les frases formant l'enunciat en el llenguatge natural a la solució del problema. Per acabar, s'inclou la discussió pel que fa als resultats obtinguts, possibilitats de millores en el futur i causes de deficiències detectades.Hoy en día, el dominio de Procesamiento del Lenguaje Natural pertenece a los campos más tratados de la Inteligencia Artificial. Entre las aplicaciones de las tareas asociadas a éste se puede apreciar particularmente, que la prueba de inteligencia de máquinas - el test de Turing - comprende la detección de la inteligencia precisamente mediante el chat empleando el lenguaje para demostrar las habilidades mentales. En este sentido, pues, el análisis computacional de la comprensión y producción del lenguaje puede considerarse de importancia especial. El presente trabajo tiene por meta interrelacionar para sus salidas los resultados del análisis de lenguaje natural con el punto fuerte típico de los ordenadores - la manipulación con números. Entonces, se pueden diferenciar dos tareas principales que definen el ámbito de este proyecto; por un lado se tiene el analizador del idioma escogido para el mismo - Catalán - que se encarga de obtener la representación sintáctica de dada frase y por el otro, el sistema para resolver los problemas simples que permite llegar a la solución del problema a partir de la interpretación de las frases del enunciado en el lenguaje natural. Al final, se acaba razonando sobre los resultados logrados, las oportunidades de mejoras en el futuro e imperfecciones halladas

    A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics

    Full text link
    Inspired by humans' remarkable ability to master arithmetic and generalize to unseen problems, we present a new dataset, HINT, to study machines' capability of learning generalizable concepts at three different levels: perception, syntax, and semantics. In particular, concepts in HINT, including both digits and operators, are required to learn in a weakly-supervised fashion: Only the final results of handwriting expressions are provided as supervision. Learning agents need to reckon how concepts are perceived from raw signals such as images (i.e., perception), how multiple concepts are structurally combined to form a valid expression (i.e., syntax), and how concepts are realized to afford various reasoning tasks (i.e., semantics). With a focus on systematic generalization, we carefully design a five-fold test set to evaluate both the interpolation and the extrapolation of learned concepts. To tackle this challenging problem, we propose a neural-symbolic system by integrating neural networks with grammar parsing and program synthesis, learned by a novel deduction--abduction strategy. In experiments, the proposed neural-symbolic system demonstrates strong generalization capability and significantly outperforms end-to-end neural methods like RNN and Transformer. The results also indicate the significance of recursive priors for extrapolation on syntax and semantics.Comment: Preliminary wor

    Human-Level Performance on Word Analogy Questions by Latent Relational Analysis

    Get PDF
    This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus

    A two level representation for spatial relations. - Part I

    Get PDF
    A model to represent spatial relations is presented. It is used for the definition of common sense knowledge of rational agents in a multi-agent-scenario. The main idea is, that it is structured in two levels: the representation of relations may be accomplished in terms of predicate logic at one level or in expressions of Cartesian coordinates at the other. Hence reasoning is possible with common rules of deduction as well as via exact calculations of the positions. Here we give an overview on the whole structure and then investigate in the definition of a set of spatial relations at the "Logical Level". Finally special features like the handling of the context and the problem of multiple views are discussed
    • …
    corecore