2,247 research outputs found
Sparse distributed representations as word embeddings for language understanding
Word embeddings are vector representations of words that capture semantic and syntactic
similarities between them. Similar words tend to have closer vector representations in a N
dimensional space considering, for instance, Euclidean distance between the points associated
with the word vector representations in a continuous vector space. This property, makes word
embeddings valuable in several Natural Language Processing tasks, from word analogy and
similarity evaluation to the more complex text categorization, summarization or translation tasks.
Typically state of the art word embeddings are dense vector representations, with low
dimensionality varying from tens to hundreds of floating number dimensions, usually obtained
from unsupervised learning on considerable amounts of text data by training and optimizing an
objective function of a neural network.
This work presents a methodology to derive word embeddings as binary sparse vectors, or word
vector representations with high dimensionality, sparse representation and binary features (e.g.
composed only by ones and zeros). The proposed methodology tries to overcome some
disadvantages associated with state of the art approaches, namely the size of corpus needed for
training the model, while presenting comparable evaluations in several Natural Language
Processing tasks.
Results show that high dimensionality sparse binary vectors representations, obtained from a
very limited amount of training data, achieve comparable performances in similarity and
categorization intrinsic tasks, whereas in analogy tasks good results are obtained only for nouns
categories. Our embeddings outperformed eight state of the art word embeddings in word
similarity tasks, and two word embeddings in categorization tasks.A designação word embeddings refere-se a representações vetoriais das palavras que capturam
as similaridades semânticas e sintáticas entre estas. Palavras similares tendem a ser
representadas por vetores próximos num espaço N dimensional considerando, por exemplo, a
distância Euclidiana entre os pontos associados a estas representações vetoriais num espaço
vetorial contĂnuo. Esta propriedade, torna as word embeddings importantes em várias tarefas de
Processamento Natural da LĂngua, desde avaliações de analogia e similaridade entre palavras,
às mais complexas tarefas de categorização, sumarização e tradução automática de texto.
Tipicamente, as word embeddings sĂŁo constituĂdas por vetores densos, de dimensionalidade
reduzida. São obtidas a partir de aprendizagem não supervisionada, recorrendo a consideráveis
quantidades de dados, através da otimização de uma função objetivo de uma rede neuronal.
Este trabalho propõe uma metodologia para obter word embeddings constituĂdas por vetores
binários esparsos, ou seja, representações vetoriais das palavras simultaneamente binárias (e.g.
compostas apenas por zeros e uns), esparsas e com elevada dimensionalidade. A metodologia
proposta tenta superar algumas desvantagens associadas Ă s metodologias do estado da arte,
nomeadamente o elevado volume de dados necessário para treinar os modelos, e
simultaneamente apresentar resultados comparáveis em várias tarefas de Processamento
Natural da LĂngua.
Os resultados deste trabalho mostram que estas representações, obtidas a partir de uma
quantidade limitada de dados de treino, obtêm performances consideráveis em tarefas de
similaridade e categorização de palavras. Por outro lado, em tarefas de analogia de palavras
apenas se obtém resultados consideráveis para a categoria gramatical dos substantivos. As word
embeddings obtidas com a metodologia proposta, e comparando com o estado da arte,
superaram a performance de oito word embeddings em tarefas de similaridade, e de duas word
embeddings em tarefas de categorização de palavras
Computerized Analysis of Magnetic Resonance Images to Study Cerebral Anatomy in Developing Neonates
The study of cerebral anatomy in developing neonates is of great importance for
the understanding of brain development during the early period of life. This
dissertation therefore focuses on three challenges in the modelling of cerebral
anatomy in neonates during brain development. The methods that have been
developed all use Magnetic Resonance Images (MRI) as source data.
To facilitate study of vascular development in the neonatal period, a set of image
analysis algorithms are developed to automatically extract and model cerebral
vessel trees. The whole process consists of cerebral vessel tracking from
automatically placed seed points, vessel tree generation, and vasculature
registration and matching. These algorithms have been tested on clinical Time-of-
Flight (TOF) MR angiographic datasets.
To facilitate study of the neonatal cortex a complete cerebral cortex segmentation
and reconstruction pipeline has been developed. Segmentation of the neonatal
cortex is not effectively done by existing algorithms designed for the adult brain
because the contrast between grey and white matter is reversed. This causes pixels
containing tissue mixtures to be incorrectly labelled by conventional methods. The
neonatal cortical segmentation method that has been developed is based on a novel
expectation-maximization (EM) method with explicit correction for mislabelled
partial volume voxels. Based on the resulting cortical segmentation, an implicit
surface evolution technique is adopted for the reconstruction of the cortex in
neonates. The performance of the method is investigated by performing a detailed
landmark study.
To facilitate study of cortical development, a cortical surface registration algorithm
for aligning the cortical surface is developed. The method first inflates extracted
cortical surfaces and then performs a non-rigid surface registration using free-form
deformations (FFDs) to remove residual alignment. Validation experiments using
data labelled by an expert observer demonstrate that the method can capture local
changes and follow the growth of specific sulcus
Structural characterization of intrinsically disordered proteins by NMR spectroscopy.
Recent advances in NMR methodology and techniques allow the structural investigation of biomolecules of increasing size with atomic resolution. NMR spectroscopy is especially well-suited for the study of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) which are in general highly flexible and do not have a well-defined secondary or tertiary structure under functional conditions. In the last decade, the important role of IDPs in many essential cellular processes has become more evident as the lack of a stable tertiary structure of many protagonists in signal transduction, transcription regulation and cell-cycle regulation has been discovered. The growing demand for structural data of IDPs required the development and adaption of methods such as 13C-direct detected experiments, paramagnetic relaxation enhancements (PREs) or residual dipolar couplings (RDCs) for the study of 'unstructured' molecules in vitro and in-cell. The information obtained by NMR can be processed with novel computational tools to generate conformational ensembles that visualize the conformations IDPs sample under functional conditions. Here, we address NMR experiments and strategies that enable the generation of detailed structural models of IDPs
- …