7 research outputs found
Fixed Size Ordinally-Forgetting Encoding and its Applications
In this thesis, we propose the new Fixed-size Ordinally-Forgetting Encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation. FOFE can model the word order in a sequence using a simple ordinally-forgetting mechanism according to the positions of words. We address two fundamental problems in natural language processing, namely, Language Modeling (LM) and Named Entity Recognition (NER).
We have applied FOFE to FeedForward Neural Network Language Models (FFNN-LMs). Experimental results have shown that without using any recurrent feedbacks, FOFE-FFNN-LMs significantly outperform not only the standard fixed-input FFNN-LMs but also some popular Recurrent Neural Network Language Models (RNN-LMs).
Instead of treating NER as a sequence labeling problem, we propose a new local detection approach, which relies on FOFE to fully encode each sentence fragment and its left/right contexts into a fixed-size representation. This local detection approach has shown many advantages over the traditional sequence labeling methods. Our method has yielded pretty strong performance in all tasks we have examined
Modes of Truth
The aim of this volume is to open up new perspectives and to raise new research questions about a unified approach to truth, modalities, and propositional attitudes. The volume’s essays are grouped thematically around different research questions. The first theme concerns the tension between the theoretical role of the truth predicate in semantics and its expressive function in language. The second theme of the volume concerns the interaction of truth with modal and doxastic notions. The third theme covers higher-order solutions to the semantic and modal paradoxes, providing an alternative to first-order solutions embraced in the first two themes. This book will be of interest to researchers working in epistemology, logic, philosophy of logic, philosophy of language, philosophy of mathematics, and semantics
Distributional initialization of neural networks
In Natural Language Processing (NLP), together with speech, text is one of the main sources of information. Computational systems that process raw text need to perform a transformation of text input into machine-readable format. Final performance of the NLP systems depends on the quality of these input representations, that is why the main objective for representation learning is to keep and highlight important features of the input tokens (characters, words, phrases, etc.).
Traditionally, for Neural Networks (NNs) such input representations are one-hot vectors, where each word is represented with a vector of all-but-one zeros, with value 1 on the position that corresponds to the index of the word in the vocabulary. Such a representation only helps to differentiate words, but does not contain any usable information about relations between them. Word representations that are learned by NNs - word embeddings - are then arranged in a matrix, where each row corresponds to a particular word in a vocabulary and is retrieved by multiplication of the corresponding one-hot vector and the embedding matrix.
These word embeddings are initialized randomly, and during training adjust their values to capture the contextual semantic information with respect to the training objective. When a word is frequent, it is seen often during training and its representation is updated frequently; for the same reason embeddings for rare words experience much less updates. This makes it difficult for NNs to learn good word embeddings for words that occur just several times in a corpus.
In this work, we propose a method to improve quality of word embeddings of rare words. The main idea is to initialize a NN that learns embeddings with sparse distributional vectors that are precomputed for rare words from a given corpus.
We introduce and investigate several methods for building such distributional representations: with different ways to combine one-hot representations of frequent and distributional representations of rare words, different similarity functions between distributional vectors, different normalization approaches applied to the representations in order to control the input signals' amplitude.
We evaluate the performance of our proposed models on two tasks. On a word similarity judgment task, the embeddings of words are used to compute similarity scores between two words in given pairs; then these similarity scores are compared with human ratings. With use of the same NN architecture, word embeddings that are trained using distributional initialization show significantly better performance than word embeddings trained with traditional one-hot initialization.
On language modeling task, where models compete in predicting probability of a given sequence of words, models with distributional initialization show minor improvements over models with one-hot initialization.
We also study a very popular word2vec tool (Mikolov et al., 2013a) that is used to obtain word embeddings without supervision. The main question we ask is how much the quality of learned word embeddings depends on the initial random seed. The obtained results suggest that training with word2vec is stable and reliable.Text ist neben Sprache eine der Hauptinformationsquellen in der natürlichen Sprachverarbeitung. Um Rohtexte zu verarbeiten, müssen Computer die Texteingabe zunächst in maschinenlesbares Format umwandeln. Von der Qualität dieser Eingaberepräsentation hängt die finale Leistung von Sprachverarbeitungssystemen ab. Hauptziele des Repräsentationslernens sind daher der Erhalt und die Hervorhebung wichtiger Eigenschaften der Eingabe (Buchstaben, Wörter, Phrasen, etc.).
Traditionelle Eingaberepräsentationen für neuronale Netze (NNs) sind sogenannte 1-aus-N Vektoren, die jedes Wort als einen Vektor darstellen, der nur aus Nullen und einer Eins an jener Position besteht, die dem Index des Wortes im Vokabular entspricht. Solche Repräsentationen können zwar Wörter differenzieren, enthalten aber keine weiteren Informationen, z.B. bezüglich Relationen zwischen Wörtern. Wortrepräsentationen können andererseits auch von NNs gelernt werden. Diese sogenannten Worteinbettungen werden meist in einer Matrix angeordnet, in der jede Zeile einem bestimmten Wort in einem Vokabular entspricht. Für das Training kann durch Multiplikation des zugehörigen 1-aus-N Vektors und der Einbettungsmatrix auf sie zugegriffen werden.
Worteinbettungen werden meist zufällig initialisiert und während des Trainings so angepasst, dass sie kontextabhängige semantische Informationen bezüglich des Trainingsziels widerspiegeln. Da häufige Wörter oft während des Trainings gesehen werden, werden ihre Repräsentationen mehrfach aktualisiert. Aus demselben Grund werden Einbettungen seltener Wörter weitaus weniger angepasst. Dies erschwert es NNs, gute Worteinbettungen für Wörter zu lernen, die nur wenige Male in einem Korpus auftreten.
In dieser Arbeit schlagen wir eine Methode vor, um die Qualität von Worteinbettungen für seltene Wörter zu verbessern. Dazu wird ein NN, das Einbettungen lernt, mit dünnbesetzten verteilten Vektoren initialisiert, die für seltene Wörter aus einem gegebenen Korpus vorberechnet werden.
Wir führen verschiedene Methoden ein, solche verteilten Initialisierungsvektoren zu erstellen und untersuchen sie: Wir betrachten unterschiedliche Möglichkeiten, 1-aus-N Repräsentationen für häufige Wörter und verteilte Vektoren für seltene Wörter zu kombinieren, vergleichen Ähnlichkeitsfunktionen für verteilte Vektoren und stellen Normalisierungsansätze vor, die auf die Repräsentation angewandt werden können, um die Amplitude des Eingabesignals zu kontrollieren.
Zur Bewertung unserer vorgeschlagenen Modelle betrachten wir zwei Aufgaben. Die erste Aufgabe ist die Beurteilung von Wortähnlichkeiten. Dabei werden Worteinbettungen verwendet, um Ähnlichkeiten zwischen den Wörtern eines gegebenen Wortpaares zu berechnen. Diese werden dann mit menschlichen Bewertungen verglichen. Bei Verwendung der gleichen NN Architektur zeigen Worteinbettungen, die mit verteilten Initialisierungen trainiert wurden, signifikant bessere Leistungen als Worteinbettungen, die mit traditionellen 1-aus-N Initialisierungen trainiert wurden.
Die zweite Aufgabe ist Sprachmodellierung, das heißt, die Vorhersage der Wahrscheinlichkeit einer gegebenen Wortsequenz. Dabei zeigen Modelle mit verteilter Initialisierung geringfügige Verbesserungen gegenüber Modellen mit 1-aus-N Initialisierungen.
Wir betrachten außerdem das weit verbreitete word2vec (Wort-zu-Vektor) Programm (Mikolov et al., 2013a), das verwendet wird, um unüberwacht Worteinbettungen zu lernen. Die Hauptfrage, die wir untersuchen, ist, wie stark die Qualität der gelernten Worteinbettungen von dem Startwert der Zufallszahlen abhängt. Die erhaltenen Ergebnisse deuten darauf hin, dass das Training mit word2vec stabil und zuverlässig ist
Modes of Truth
The aim of this volume is to open up new perspectives and to raise new research questions about a unified approach to truth, modalities, and propositional attitudes. The volume’s essays are grouped thematically around different research questions. The first theme concerns the tension between the theoretical role of the truth predicate in semantics and its expressive function in language. The second theme of the volume concerns the interaction of truth with modal and doxastic notions. The third theme covers higher-order solutions to the semantic and modal paradoxes, providing an alternative to first-order solutions embraced in the first two themes. This book will be of interest to researchers working in epistemology, logic, philosophy of logic, philosophy of language, philosophy of mathematics, and semantics
Recommended from our members
Psychotic experiences beyond psychotic disorders: from measurement to computational mechanisms
Psychotic experiences (PEs) occur in the general population, beyond psychotic disorders.
PEs are a risk factor for mental ill health in young people but can occur benignly in selected
samples of adults. Environmental factors predispose to PEs but their underlying mechanisms
are not well-understood. Progress in understanding PEs may be limited by diverse
conceptualisations, imprecise measurement and a lack of explanatory frameworks that can
bridge the gaps between aetiological factors, their effects on the brain and their behavioural
manifestations. In this thesis, I undertook a comprehensive investigation of the measurement,
health implications, aetiology and computational mechanisms of PEs in adolescents
and young adults using data from two large cohort samples, supplemented with smaller-scale
behavioural studies.
I first investigated the measurement of PEs. I assessed and optimised the measurement
of PEs in young people by two self-report instruments. I then used latent variable modelling
to show that a self-report and interview instrument measured the same underlying psychotic
phenomena. Both instruments were able to measure severe PEs, while the self-report questionnaire
also measured more mild psychotic phenomena.
I then investigated the health implications of PEs. Using cluster analysis in both cohorts,
I found replicable patterns of PEs at similar levels of intensity and persistence but with and
without depressive symptoms and with varying risk of mental disorder. Paranoid ideation
was more associated with depressive symptoms than non-paranoid unusual perceptions and
beliefs. Childhood adversity was associated with both PE-prone groups, but later social support
from family and friends was far higher in those with PEs and low depressive symptoms
than those with PEs and high depressive symptoms.
Subsequently, I investigated the role of the social environment in the development of PEs
and psychopathology using longitudinal structural equation modelling. I found that asocial
dispositions increased or preceded increase in PEs over one year, mediated by detriment
to social support. Conversely, PEs did not precede or increase asociality. I then showed
that dimensions of PEs and depressive symptoms were promoted by childhood adversity but
differentially affected by later social support, with paranoid ideation being more influenced
by support than non-paranoid unusual perceptions/beliefs.
Finally, I investigated specific mechanisms of PEs in two behavioural studies. In the seventh
study, I used computational modelling of reward learning to link PEs to reduced ability
to modulate learning by confidence, replicating computational effects of a pharmacological
model of psychosis. I also used a novel visual task to show that the manifestation of PEs
as anomalous perceptions versus anomalous beliefs might be explained by over-reliance on
different types of prior knowledge in perceptual inference.
These results suggest that different conceptual approaches to PEs might be synthesised
despite issues with their measurement. PEs in young people, while not entirely benign, are
heterogeneously associated with psychopathology. Importantly, they characterise a minority
of young people who are at very high transdiagnostic risk of mental illness but also occur
without distress in young people, often in the context of a supportive social environment.
Health outcomes in young people with PEs are predicted and potentially modified by social
functioning and social relationships. PEs might arise from atypicalities in how the influences
of information sources on perception and belief-updating are modulated according to their
reliabilities.Neuroscience in Psychiatry Network / Friends of Peterhouse Studentship
James Baird Fun