232 research outputs found
Generative Dependency Language Modeling Using Recurrent Neural Networks
Käesolev magistritöö esitleb meetodit süntaktilise infot kasutamiseks generatiivses keele modelleerimises, kus sõltuvusparseri loogikat laiendatakse, et jooksvalt parseri puhvrisse uusi sõnu genereerida. Selleks kasutatakse sisendina vastaval hetkel pinu tipus olevaid sõnu. Püstitame hüpoteesi, et antud lahendus annab eeliseid kaugete sõltuvuste modelleerimisel. Me implementeerime pakutud keelemudeli ja lähtemudeli ning näeme, et välja pakutud meetod annab märkimisväärselt parema perplexity skoori tulemuse ja seda eriti lausete puhul, mis sisaldavad kaugeid sõltuvusi. Lisaks näitab keelemudelite abil loodud lausete analüüs, et välja pakutud mudel suudab lähtemudeliga võrreldes luua terviklikumaid lauseid.This thesis proposes an approach to incorporating syntactical data to the task of generative language modeling. We modify the logic of a transition-based dependency parser to generate new words to the buffer using the top items in the stack as input. We hypothesize that the approach provides benefits in modeling long-term dependencies. We implement our system along with a baseline language model and observe that our approach provides an improvement in perplexity scores and that this improvement is more significant in modeling sentences that contain longer dependencies. Additionally, the qualitative analysis of the generated sentences demonstrates that our model is able to generate more cohesive sentences
Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties.
The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings.
Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language
Iterative parameter mixing for distributed large-margin training of structured predictors for natural language processing
The development of distributed training strategies for statistical prediction functions
is important for applications of machine learning, generally, and the development
of distributed structured prediction training strategies is important for natural
language processing (NLP), in particular. With ever-growing data sets this is, first, because,
it is easier to increase computational capacity by adding more processor nodes
than it is to increase the power of individual processor nodes, and, second, because
data sets are often collected and stored in different locations.
Iterative parameter mixing (IPM) is a distributed training strategy in which each
node in a network of processors optimizes a regularized average loss objective on its
own subset of the total available training data, making stochastic (per-example) updates
to its own estimate of the optimal weight vector, and communicating with the
other nodes by periodically averaging estimates of the optimal vector across the network.
This algorithm has been contrasted with a close relative, called here the single-mixture
optimization algorithm, in which each node stochastically optimizes an average
loss objective on its own subset of the training data, operating in isolation until
convergence, at which point the average of the independently created estimates is returned.
Recent empirical results have suggested that this IPM strategy produces better
models than the single-mixture algorithm, and the results of this thesis add to this
picture.
The contributions of this thesis are as follows.
The first contribution is to produce and analyze an algorithm for decentralized
stochastic optimization of regularized average loss objective functions. This algorithm,
which we call the distributed regularized dual averaging algorithm, improves over
prior work on distributed dual averaging by providing a simpler algorithm (used in the
rest of the thesis), better convergence bounds for the case of regularized average loss
functions, and certain technical results that are used in the sequel.
The central contribution of this thesis is to give an optimization-theoretic justification
for the IPM algorithm. While past work has focused primarily on its empirical
test-time performance, we give a novel perspective on this algorithm by showing that,
in the context of the distributed dual averaging algorithm, IPM constitutes a convergent
optimization algorithm for arbitrary convex functions, while the single-mixture
distribution algorithm is not. Experiments indeed confirm that the superior test-time
performance of models trained using IPM, compared to single-mixture, correlates with
better optimization of the objective value on the training set, a fact not previously reported.
Furthermore, our analysis of general non-smooth functions justifies the use of
distributed large-margin (support vector machine [SVM]) training of structured predictors,
which we show yields better test performance than the IPM perceptron algorithm,
the only version of the IPM to have previously been given a theoretical justification.
Our results confirm that IPM training can reach the same level of test performance
as a sequentially trained model and can reach better accuracies when one has a fixed
budget of training time.
Finally, we use the reduction in training time that distributed training allows to experiment
with adding higher-order dependency features to a state-of-the-art phrase-structure
parsing model. We demonstrate that adding these features improves out-of-domain
parsing results of even the strongest phrase-structure parsing models, yielding
a new state-of-the-art for the popular train-test pairs considered. In addition, we show
that a feature-bagging strategy, in which component models are trained separately and
later combined, is sometimes necessary to avoid feature under-training and get the best
performance out of large feature sets
Recommended from our members
Learning with Joint Inference and Latent Linguistic Structure in Graphical Models
Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system\u27s output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of telephone , combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each other in genre and style, and may not match the intended application.
In this dissertation we present a general framework for constructing and reasoning on joint graphical model formulations of NLP problems. Individual models are composed using weighted Boolean logic constraints, and inference is performed using belief propagation. The systems we develop are composed of two parts: one a representation of syntax, the other a desired end task (semantic role labeling, named entity recognition, or relation extraction). By modeling these problems jointly, both models are trained in a single, integrated process, with uncertainty propagated between them. This mitigates the accumulation of errors typical of pipelined approaches.
Additionally we propose a novel marginalization-based training method in which the error signal from end task annotations is used to guide the induction of a constrained latent syntactic representation. This allows training in the absence of syntactic training data, where the latent syntactic structure is instead optimized to best support the end task predictions. We find that across many NLP tasks this training method offers performance comparable to fully supervised training of each individual component, and in some instances improves upon it by learning latent structures which are more appropriate for the task
Evaluating NLP toxicity tools: Towards the ethical limits
In the last years we have seen and big evolution in the field of neuronal networks, and the field of natural language processing (NLP). Solutions as voice assistants write assistance, or chatbots are present, every time more often, in our daily work. In addition, these techniques are used for more sophisticated analysis as sentimental classification or hate-speech detection. In contrast, the detection of gender or racial biases in these solutions has created problems. This problem has opened a debate around the limitations and potentials of these solutions. The goal of this work is to evaluate the present tools around the sentimental analysis that are available at the moment of writing. To achieve this, we have selected a set of tools and we have compared its usability over a specific dataset focused on biased detection. In addition, we have developed a tool to evaluate these models in a real-world application by integrating these models into Content Management systems. The developed tool has the goal to help in the moderation of the content in the CMS, is developed over a popular CMS distribution (Drupal). Finally, we present a debate around the ethics and fairness in sentiment analysis using NLP
Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit
The primary focus of this thesis is to make Sanskrit manuscripts more
accessible to the end-users through natural language technologies. The
morphological richness, compounding, free word orderliness, and low-resource
nature of Sanskrit pose significant challenges for developing deep learning
solutions. We identify four fundamental tasks, which are crucial for developing
a robust NLP technology for Sanskrit: word segmentation, dependency parsing,
compound type identification, and poetry analysis. The first task, Sanskrit
Word Segmentation (SWS), is a fundamental text processing task for any other
downstream applications. However, it is challenging due to the sandhi
phenomenon that modifies characters at word boundaries. Similarly, the existing
dependency parsing approaches struggle with morphologically rich and
low-resource languages like Sanskrit. Compound type identification is also
challenging for Sanskrit due to the context-sensitive semantic relation between
components. All these challenges result in sub-optimal performance in NLP
applications like question answering and machine translation. Finally, Sanskrit
poetry has not been extensively studied in computational linguistics.
While addressing these challenges, this thesis makes various contributions:
(1) The thesis proposes linguistically-informed neural architectures for these
tasks. (2) We showcase the interpretability and multilingual extension of the
proposed systems. (3) Our proposed systems report state-of-the-art performance.
(4) Finally, we present a neural toolkit named SanskritShala, a web-based
application that provides real-time analysis of input for various NLP tasks.
Overall, this thesis contributes to making Sanskrit manuscripts more accessible
by developing robust NLP technology and releasing various resources, datasets,
and web-based toolkit.Comment: Ph.D. dissertatio
- …