1,296 research outputs found
Syntax-based machine translation using dependency grammars and discriminative machine learning
Machine translation underwent huge improvements since the groundbreaking
introduction of statistical methods in the early 2000s, going from very
domain-specific systems that still performed relatively poorly despite the
painstakingly crafting of thousands of ad-hoc rules, to general-purpose
systems automatically trained on large collections of bilingual texts which
manage to deliver understandable translations that convey the general
meaning of the original input.
These approaches however still perform quite below the level of human
translators, typically failing to convey detailed meaning and register, and
producing translations that, while readable, are often ungrammatical and
unidiomatic.
This quality gap, which is considerably large compared to most other
natural language processing tasks, has been the focus of the research in
recent years, with the development of increasingly sophisticated models that
attempt to exploit the syntactical structure of human languages, leveraging
the technology of statistical parsers, as well as advanced machine learning
methods such as marging-based structured prediction algorithms and neural
networks.
The translation software itself became more complex in order to accommodate
for the sophistication of these advanced models: the main translation
engine (the decoder) is now often combined with a pre-processor which
reorders the words of the source sentences to a target language word order, or
with a post-processor that ranks and selects a translation according according
to fine model from a list of candidate translations generated by a coarse
model.
In this thesis we investigate the statistical machine translation problem
from various angles, focusing on translation from non-analytic languages
whose syntax is best described by fluid non-projective dependency grammars
rather than the relatively strict phrase-structure grammars or projectivedependency
grammars which are most commonly used in the literature.
We propose a framework for modeling word reordering phenomena
between language pairs as transitions on non-projective source dependency
parse graphs. We quantitatively characterize reordering phenomena for the
German-to-English language pair as captured by this framework, specifically
investigating the incidence and effects of the non-projectivity of source
syntax and the non-locality of word movement w.r.t. the graph structure.
We evaluated several variants of hand-coded pre-ordering rules in order to
assess the impact of these phenomena on translation quality.
We propose a class of dependency-based source pre-ordering approaches
that reorder sentences based on a flexible models trained by SVMs and and
several recurrent neural network architectures.
We also propose a class of translation reranking models, both syntax-free
and source dependency-based, which make use of a type of neural networks
known as graph echo state networks which is highly flexible and requires
extremely little training resources, overcoming one of the main limitations
of neural network models for natural language processing tasks
Recommended from our members
Probabilistic Commonsense Knowledge
Commonsense knowledge is critical to achieving artificial general intelligence. This shared common background knowledge is implicit in all human communication, facilitating efficient information exchange and understanding. But commonsense research is hampered by its immense quantity of knowledge because an explicit categorization is impossible. Furthermore, a plumber could repair a sink in a kitchen or a bathroom, indicating that common sense reveals a probable assumption rather than a definitive answer. To align with these properties of commonsense fundamentally, we want to not only model but also evaluate such knowledge human-like using abstractions and probabilistic principles. Traditional combinatorial probabilistic models, e.g., probabilistic graphical model approaches, have limitations to modeling large-scale probability distributions containing thousands or even millions of commonsensical events. On the other hand, although embedding-based representation learning has the advantage of generalizing to large combinations of events, they suffer from producing consistent probabilities under different styles of queries. Combining benefits from both sides, we introduce probabilistic box embeddings, which represent joint probability distributions on a learned latent space of geometric embeddings. By using box embeddings, it is now possible to handle queries with intersections, unions, and negations in a way similar to Venn diagram reasoning, which has faced difficulty even when using large language models. Meanwhile, existing evaluations do not reflect the probabilistic nature of commonsense knowledge. The popular multiple-choice evaluation style often misleads us into the paradigm that commonsense solved. To fill in the gap, we propose a method of retrieving commonsense related question answer distributions from human annotators as well as a novel method of generative evaluation. We utilize these approaches in two new commonsense datasets. Finally, we draw a connection between the-state-of-art NLP models --- large language models and their ability to perform commonsense reasoning tasks. According to the previous study, large language models would make inconsistent predictions while given different input texts for plausible commonsense situations. We intend to evaluate their performance using more rigorous probabilistic measurements
Fractals in the Nervous System: conceptual Implications for Theoretical Neuroscience
This essay is presented with two principal objectives in mind: first, to
document the prevalence of fractals at all levels of the nervous system, giving
credence to the notion of their functional relevance; and second, to draw
attention to the as yet still unresolved issues of the detailed relationships
among power law scaling, self-similarity, and self-organized criticality. As
regards criticality, I will document that it has become a pivotal reference
point in Neurodynamics. Furthermore, I will emphasize the not yet fully
appreciated significance of allometric control processes. For dynamic fractals,
I will assemble reasons for attributing to them the capacity to adapt task
execution to contextual changes across a range of scales. The final Section
consists of general reflections on the implications of the reviewed data, and
identifies what appear to be issues of fundamental importance for future
research in the rapidly evolving topic of this review
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Quantum physics meets biology
Quantum physics and biology have long been regarded as unrelated disciplines,
describing nature at the inanimate microlevel on the one hand and living
species on the other hand. Over the last decades the life sciences have
succeeded in providing ever more and refined explanations of macroscopic
phenomena that were based on an improved understanding of molecular structures
and mechanisms. Simultaneously, quantum physics, originally rooted in a world
view of quantum coherences, entanglement and other non-classical effects, has
been heading towards systems of increasing complexity. The present perspective
article shall serve as a pedestrian guide to the growing interconnections
between the two fields. We recapitulate the generic and sometimes unintuitive
characteristics of quantum physics and point to a number of applications in the
life sciences. We discuss our criteria for a future quantum biology, its
current status, recent experimental progress and also the restrictions that
nature imposes on bold extrapolations of quantum theory to macroscopic
phenomena.Comment: 26 pages, 4 figures, Perspective article for the HFSP Journa
Recommended from our members
Cluster damage robustness analysis and space independent community detection in complex networks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis investigates the evolution of two very different complex systems using network theory. This multi-disciplinary technique is widely used to model and analyse vastly diverse systems of multiple interacting components, and therefore, it is applied in this thesis to study the complexity of the systems. This complexity is rooted in the components’ interactions such that the whole system is more than the sum of all the individual parts. The first novelty in this research is the proposal of a new type of structural perturbation, cluster damage, for measuring another dimension of network robustness. The second novelty is the first application of a community detection method, which uncovers space-independent communities in spatial networks, to airport and linguistic networks.
A critical property of complex systems – robustness – is explored within a partial model of the Internet, by demonstrating a novel perturbation strategy based on the iterative removal of clusters. The main contribution of this theoretical case study is the methodology for cluster damage, which has not been investigated by literature on the robustness of complex networks. The model, part of the Internet at the Autonomous System level, only serves as a domain where the novel methodology is demonstrated, and it is chosen because the Internet is known to be robust due to its distributed (non-centralised) nature, even though it is often subjected to large perturbations and failures. The first applied case study is in the field of air transportation. Specifically, it explores the topology and passenger flows of the United States Airport Network (USAN) over two decades. The network model consists of a time-series of six network snapshots for the years 1990, 2000 and 2010, which capture bi-monthly passenger flows among US airports. Since the network is embedded in space, the volume of these flows is naturally affected by spatial proximity, and therefore, a model (recently proposed in the literature) accounting for this phenomenon is used to identify the communities of airports that have particularly high flows among them, given their spatial separation. The second applied case study – in the field of language acquisition – investigates the word co-occurrence network of children, as they develop their linguistic abilities at an early age. Similarly to the previous case study, the network model consists of six children and three discrete developmental stages. These networks are not embedded in physical space, but they are mapped to an artificial semantic space that defines the semantic distance between pairs of words. This novel approach allows for an additional dimension of network information that results in a more complete dataset. Then, community detection identifies groups of words that have particularly high co-occurrence frequency, given their semantic distance. This research highlights the fact that some general techniques from network theory, such as network modelling and analysis, can be successfully applied for the study of diverse systems, while others, such as community detection, need to be tailored for the specific system. However, methods originally developed for one domain may be applied somewhere completely new, as illustrated by the application of spatial community detection to a non-spatial network. This underlines the importance of inter-disciplinary research
- …