588 research outputs found
Memory-Based Lexical Acquisition and Processing
Current approaches to computational lexicology in language technology are
knowledge-based (competence-oriented) and try to abstract away from specific
formalisms, domains, and applications. This results in severe complexity,
acquisition and reusability bottlenecks. As an alternative, we propose a
particular performance-oriented approach to Natural Language Processing based
on automatic memory-based learning of linguistic (lexical) tasks. The
consequences of the approach for computational lexicology are discussed, and
the application of the approach on a number of lexical acquisition and
disambiguation tasks in phonology, morphology and syntax is described.Comment: 18 page
MBT: A Memory-Based Part of Speech Tagger-Generator
We introduce a memory-based approach to part of speech tagging. Memory-based
learning is a form of supervised learning based on similarity-based reasoning.
The part of speech tag of a word in a particular context is extrapolated from
the most similar cases held in memory. Supervised learning approaches are
useful when a tagged corpus is available as an example of the desired output of
the tagger. Based on such a corpus, the tagger-generator automatically builds a
tagger which is able to tag new text the same way, diminishing development time
for the construction of a tagger considerably. Memory-based tagging shares this
advantage with other statistical or machine learning approaches. Additional
advantages specific to a memory-based approach include (i) the relatively small
tagged corpus size sufficient for training, (ii) incremental learning, (iii)
explanation capabilities, (iv) flexible integration of information in case
representations, (v) its non-parametric nature, (vi) reasonably good results on
unknown words without morphological analysis, and (vii) fast learning and
tagging. In this paper we show that a large-scale application of the
memory-based approach is feasible: we obtain a tagging accuracy that is on a
par with that of known statistical approaches, and with attractive space and
time complexity properties when using {\em IGTree}, a tree-based formalism for
indexing and searching huge case bases.} The use of IGTree has as additional
advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure
Recommended from our members
The computer comprehension of systematic metaphor
Digitisation of this thesis was sponsored by Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin
Interpretation of Natural-language Robot Instructions: Probabilistic Knowledge Representation, Learning, and Reasoning
A robot that can be simply told in natural language what to do -- this has been one of the ultimate long-standing goals in both Artificial Intelligence and Robotics research. In near-future applications, robotic assistants and companions will have to understand and perform commands such as set the table for dinner'', make pancakes for breakfast'', or cut the pizza into 8 pieces.'' Although such instructions are only vaguely formulated, complex sequences of sophisticated and accurate manipulation activities need to be carried out in order to accomplish the respective tasks. The acquisition of knowledge about how to perform these activities from huge collections of natural-language instructions from the Internet has garnered a lot of attention within the last decade. However, natural language is typically massively unspecific, incomplete, ambiguous and vague and thus requires powerful means for interpretation. This work presents PRAC -- Probabilistic Action Cores -- an interpreter for natural-language instructions which is able to resolve vagueness and ambiguity in natural language and infer missing information pieces that are required to render an instruction executable by a robot. To this end, PRAC formulates the problem of instruction interpretation as a reasoning problem in first-order probabilistic knowledge bases. In particular, the system uses Markov logic networks as a carrier formalism for encoding uncertain knowledge. A novel framework for reasoning about unmodeled symbolic concepts is introduced, which incorporates ontological knowledge from taxonomies and exploits semantically similar relational structures in a domain of discourse. The resulting reasoning framework thus enables more compact representations of knowledge and exhibits strong generalization performance when being learnt from very sparse data. Furthermore, a novel approach for completing directives is presented, which applies semantic analogical reasoning to transfer knowledge collected from thousands of natural-language instruction sheets to new situations. In addition, a cohesive processing pipeline is described that transforms vague and incomplete task formulations into sequences of formally specified robot plans. The system is connected to a plan executive that is able to execute the computed plans in a simulator. Experiments conducted in a publicly accessible, browser-based web interface showcase that PRAC is capable of closing the loop from natural-language instructions to their execution by a robot
Natural Language Interpreter and Arithmetic Word Problem Solver
The field of Natural Language Processing (NLP) belongs nowadays to most studied and developing fields of Artificial Intelligence. Of countless applications of tasks of the NLP it could be particularly remarked that the intelligence test of a machine - Turing Test - involves detection of a human-like intelligence precisely through the language-based chat aimed to demonstrate sufficient mental capacities. In this sense, the computational analysis of language comprehension and production can thus be deemed of a prominent importance. This work has as its ultimate objective to combine for its outcomes results of the language parsing with notable strengths of the computers - manipulation of numbers. Therefore, two principal tasks of this project can be outlined. The parser of the natural language selected for this project - Catalan - is destined to find a syntactical representation of the given sentence, and the arithmetic word problem solver links up the established interpretation with resolution of an arithmetic word problem given in the natural language. Finally, the work concludes by discussion focused on analysis of results, opportune enhancements for the future work and possible ways how to address encountered issues and deficiencies.Avui en dia, el domini de Processament del Llenguatge Natural pertany als camps més tractats de la Intel·ligència Artificial. En el context de la varietat immensa de les seves aplicacions es pot destacar, que la prova d'intel·ligència de mà quines - el test de Turing - comporta la detecció de la intel·ligència justament mitjançant el xat fent servir el llenguatge per demostrar les capacitats mentals. En aquest sentit, doncs, l'anà lisi computacional de la comprensió i producció del llenguatge pot considerar-se d'importà ncia especial. Aquest treball té com a objectiu entrellaçar per les seves sortides els resultats de l'anà lisi de llenguatge natural amb el punt caracterÃsticament fort dels ordinadors - la manipulació amb números. Dit això, es poden delimitar dues tasques principals en que consisteix el present projecte. Per una banda s'hi té l'analitzador del Català encarregat d'esbrinar la representació sintà ctica de la frase donada i per l'altra, el sistema per resoldre problemes aritmètics senzills que permet passar de la interpretació de les frases formant l'enunciat en el llenguatge natural a la solució del problema. Per acabar, s'inclou la discussió pel que fa als resultats obtinguts, possibilitats de millores en el futur i causes de deficiències detectades.Hoy en dÃa, el dominio de Procesamiento del Lenguaje Natural pertenece a los campos más tratados de la Inteligencia Artificial. Entre las aplicaciones de las tareas asociadas a éste se puede apreciar particularmente, que la prueba de inteligencia de máquinas - el test de Turing - comprende la detección de la inteligencia precisamente mediante el chat empleando el lenguaje para demostrar las habilidades mentales. En este sentido, pues, el análisis computacional de la comprensión y producción del lenguaje puede considerarse de importancia especial. El presente trabajo tiene por meta interrelacionar para sus salidas los resultados del análisis de lenguaje natural con el punto fuerte tÃpico de los ordenadores - la manipulación con números. Entonces, se pueden diferenciar dos tareas principales que definen el ámbito de este proyecto; por un lado se tiene el analizador del idioma escogido para el mismo - Catalán - que se encarga de obtener la representación sintáctica de dada frase y por el otro, el sistema para resolver los problemas simples que permite llegar a la solución del problema a partir de la interpretación de las frases del enunciado en el lenguaje natural. Al final, se acaba razonando sobre los resultados logrados, las oportunidades de mejoras en el futuro e imperfecciones halladas
A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics
Inspired by humans' remarkable ability to master arithmetic and generalize to
unseen problems, we present a new dataset, HINT, to study machines' capability
of learning generalizable concepts at three different levels: perception,
syntax, and semantics. In particular, concepts in HINT, including both digits
and operators, are required to learn in a weakly-supervised fashion: Only the
final results of handwriting expressions are provided as supervision. Learning
agents need to reckon how concepts are perceived from raw signals such as
images (i.e., perception), how multiple concepts are structurally combined to
form a valid expression (i.e., syntax), and how concepts are realized to afford
various reasoning tasks (i.e., semantics). With a focus on systematic
generalization, we carefully design a five-fold test set to evaluate both the
interpolation and the extrapolation of learned concepts. To tackle this
challenging problem, we propose a neural-symbolic system by integrating neural
networks with grammar parsing and program synthesis, learned by a novel
deduction--abduction strategy. In experiments, the proposed neural-symbolic
system demonstrates strong generalization capability and significantly
outperforms end-to-end neural methods like RNN and Transformer. The results
also indicate the significance of recursive priors for extrapolation on syntax
and semantics.Comment: Preliminary wor
Human-Level Performance on Word Analogy Questions by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus
A two level representation for spatial relations. - Part I
A model to represent spatial relations is presented. It is used for the definition of common sense knowledge of rational agents in a multi-agent-scenario. The main idea is, that it is structured in two levels: the representation of relations may be accomplished in terms of predicate logic at one level or in expressions of Cartesian coordinates at the other. Hence reasoning is possible with common rules of deduction as well as via exact calculations of the positions. Here we give an overview on the whole structure and then investigate in the definition of a set of spatial relations at the "Logical Level". Finally special features like the handling of the context and the problem of multiple views are discussed
- …