Search CORE

73 research outputs found

The role of knowledge in determining identity of long-tail entities

Author: Hovy Eduard
Ilievski Filip
Schlobach Stefan
Vossen Piek
Xie Qizhe
Publication venue: 'Elsevier BV'
Publication date: 01/03/2020
Field of study

The NIL entities do not have an accessible representation, which means that their identity cannot be established through traditional disambiguation. Consequently, they have received little attention in entity linking systems and tasks so far. Given the non-redundancy of knowledge on NIL entities, the lack of frequency priors, their potentially extreme ambiguity, and numerousness, they form an extreme class of long-tail entities and pose a great challenge for state-of-the-art systems. In this paper, we investigate the role of knowledge when establishing the identity of NIL entities mentioned in text. What kind of knowledge can be applied to establish the identity of NILs? Can we potentially link to them at a later point? How to capture implicit knowledge and fill knowledge gaps in communication? We formulate and test hypotheses to provide insights to these questions. Due to the unavailability of instance-level knowledge, we propose to enrich the locally extracted information with profiling models that rely on background knowledge in Wikidata. We describe and implement two profiling machines based on state-of-the-art neural models. We evaluate their intrinsic behavior and their impact on the task of determining identity of NIL entities

VU Research Portal

Discovering structure without labels

Author: Damrich Sebastian
Publication venue
Publication date: 01/01/2023
Field of study

The scarcity of labels combined with an abundance of data makes unsupervised learning more attractive than ever. Without annotations, inductive biases must guide the identification of the most salient structure in the data. This thesis contributes to two aspects of unsupervised learning: clustering and dimensionality reduction. The thesis falls into two parts. In the first part, we introduce Mod Shift, a clustering method for point data that uses a distance-based notion of attraction and repulsion to determine the number of clusters and the assignment of points to clusters. It iteratively moves points towards crisp clusters like Mean Shift but also has close ties to the Multicut problem via its loss function. As a result, it connects signed graph partitioning to clustering in Euclidean space. The second part treats dimensionality reduction and, in particular, the prominent neighbor embedding methods UMAP and t-SNE. We analyze the details of UMAP's implementation and find its actual loss function. It differs drastically from the one usually stated. This discrepancy allows us to explain some typical artifacts in UMAP plots, such as the dataset size-dependent tendency to produce overly crisp substructures. Contrary to existing belief, we find that UMAP's high-dimensional similarities are not critical to its success. Based on UMAP's actual loss, we describe its precise connection to the other state-of-the-art visualization method, t-SNE. The key insight is a new, exact relation between the contrastive loss functions negative sampling, employed by UMAP, and noise-contrastive estimation, which has been used to approximate t-SNE. As a result, we explain that UMAP embeddings appear more compact than t-SNE plots due to increased attraction between neighbors. Varying the attraction strength further, we obtain a spectrum of neighbor embedding methods, encompassing both UMAP- and t-SNE-like versions as special cases. Moving from more attraction to more repulsion shifts the focus of the embedding from continuous, global to more discrete and local structure of the data. Finally, we emphasize the link between contrastive neighbor embeddings and self-supervised contrastive learning. We show that different flavors of contrastive losses can work for both of them with few noise samples

Heidelberger Dokumentenserver

LIPIcs, Volume 277, GIScience 2023, Complete Volume

Author: Beecham Roger
Long Jed A.
Smith Dianna
Wise Sarah
Zhao Qunshan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 12th International Conference on Geographic Information Science (GIScience 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 277, GIScience 2023, Complete Volum

Dagstuhl Research Online Publication Server

Deep neural networks and data augmentationfor semantic labelling in a dialogue corpus

Author: Sandstrom Daparte David Alejandro
Publication venue
Publication date: 16/06/2020
Field of study

El presente proyecto estudia y aplica técnicas de Deep Neural Networks y Data Augmentation para el etiquetado semántico en un corpus de diálogo, todo ello en el ámbito del Sentiment Analysis. El objetivo principal es abordar un problema de clasificación de temas utilizando arquitecturas basadas tanto en Convolutional Neural Networks (CNN) como en Recurrent Neural Networks (RNN). Cabe resaltar la comparación del rendimiento de cada modelo proporcionada por el proyecto. Como parte del proyecto se han desarrollado igualmente las herramientas de optimización de hiperparámetros necesarias para obtener unos resultados satisfactorios. Todo ello para clasificar los datos del conjunto de datos del proyecto Europeo EMPHATIC. Más información sobre el proyecto EMPHATIC en www.empathic-project.eu. La memoria del proyecto está realizada en Inglés.Sentiment analysis, also known as opinion mining, refers to the use of Natural LanguageProcessing (NLP), among other techniques, in order to extract and analyze subjective in-formation from text, such as emotions or the topic of a text. These techniques are normallyapplied to reviews or data from social media but, in this project, we will apply these tech-niques to the analysis of coaching dialogues involving senior adults. These dialogues havebeen collected as part of the EMPATHIC project.EMPATHIC is an European project whose goal is to implement a virtual agent designedto help elderly to live a healthy and independent life as they age [1][2]. Within this imple-mentation, a Natural-language Understanding (NLU) component plays the role of clas-sifying the utterance (spoken words) of the user into semantic components. This is amachine learning classification problem where there are multiple classes and a model hasto be taught to classify the text into these classes.Currently, the NLU model implementation is based on seq2seq models (a variant of Re-current Neural Network (RNN) networks). However, convolutional neural networks havebeen also proposed for text classification in different contexts [3][4][5].The main objective of this project will be to address a topic classification problem usingConvolutional Neural Network (CNN) based architectures in order to classify the datafrom the Empathic project’s dataset. Besides that, we will also propose and test a numberof architectures based on RNN in order to provide some comparison of the performancefrom each model

Archivo Digital para la Docencia y la Investigación

Text generation for small data regimes

Author: Quteineh Husam
Publication venue
Publication date: 26/08/2022
Field of study

In Natural Language Processing (NLP), applications trained on downstream tasks for text classification usually require enormous amounts of data to perform well. Neural Network (NN) models are among the applications that can always be trained to produce better results. Yet, a huge factor in improving results is the ability to scale over large datasets. Given that Deep NNs are known to be data hungry, having more training samples can always be beneficial. For a classification model to perform well, it could require thousands or even millions of textual training examples. Transfer learning enables us to leverage knowledge gained from general data collections to perform well on target tasks. In NLP, training language models on large data collections has been shown to achieve great results when tuned to different task-specific datasets Wang et al. (2019, 2018a). However, even with transfer learning, adequate training data remains a condition for training machine learning models. Nonetheless, we show that small textual datasets can be augmented to a degree that is enough to achieve improved classification performance. In this thesis, we make multiple contributions to data augmentation. Firstly, we transform the data generation task into an optimization problem which maximizes the usefulness of the generated output, using Monte Carlo Tree Search (MCTS) as the optimization strategy and incorporating entropy as one of the optimization criteria. Secondly, we propose a language generation approach for targeted data generation with the participation of the training classifier. With a user in the loop, we find that manual annotation of a small proportion of the generated data is enough to boost classification performance. Thirdly, under a self-learning scheme, we replace the user by an automated approach in which the classifier is trained on its own pseudo-labels. Finally, we extend the data generation approach to the knowledge distillation domain, by generating samples that a teacher model can confidently label, but not its student

University of Essex Research Repository

Towards Measuring Coherence in Poem Generation

Author: Mohseni Kiasari Peyman
Publication venue: 'University of Waterloo'
Publication date: 09/01/2022
Field of study

Large language models (LLM) based on transformer architecture and trained on massive corpora have gained prominence as text-generative models in the past few years. Even though large language models are very adept at memorizing and generating long sequences of text, their ability to generate truly novel and creative texts including poetry lines is limited. On the other hand, past research has shown that variational autoencoders (VAE) can generate original poetic lines adhering to the stylistic characteristics of the training corpus. Originality and stylistic adherence of lines generated by VAEs can be partially attributed to the fact that, firstly, VAEs can be successfully trained on small highly curated corpora in a given style, and secondly, VAEs with a recurrent neural network architecture has a relatively low memorization capacity compared to transformer networks, which leads to the generation of more creative texts. VAEs, however, are limited to producing short sentence-level texts due to fewer trainable parameters, compared to LLMs. As a result, VAEs can only generate independent poetic lines, rather than complete and coherent poems. In this thesis, we propose a new model of coherence scoring that allows the system to rank independent lines generated by a VAE and construct a coherent poem. The scoring model is based on BERT, fine-tuned as a coherence evaluator. We propose a novel training schedule for fine-tuning BERT, during which we show the system different types of lines as negative examples: lines sampled from the same vs. different poems. The results of the human evaluation show that participants perceive poems constructed by this method to be more coherent than randomly sampled lines

University of Waterloo's Institutional Repository

Third International Symposium on Space Mission Operations and Ground Data Systems, part 2

Author: Rash James L.
Publication venue
Publication date
Field of study

Under the theme of 'Opportunities in Ground Data Systems for High Efficiency Operations of Space Missions,' the SpaceOps '94 symposium included presentations of more than 150 technical papers spanning five topic areas: Mission Management, Operations, Data Management, System Development, and Systems Engineering. The symposium papers focus on improvements in the efficiency, effectiveness, and quality of data acquisition, ground systems, and mission operations. New technology, methods, and human systems are discussed. Accomplishments are also reported in the application of information systems to improve data retrieval, reporting, and archiving; the management of human factors; the use of telescience and teleoperations; and the design and implementation of logistics support for mission operations. This volume covers expert systems, systems development tools and approaches, and systems engineering issues

NASA Technical Reports Server

Efficient algorithms and data structures for compressive sensing

Author: Semper Sebastian
Publication venue
Publication date: 01/01/2022
Field of study

Wegen der kontinuierlich anwachsenden Anzahl von Sensoren, und den stetig wachsenden Datenmengen, die jene produzieren, stößt die konventielle Art Signale zu verarbeiten, beruhend auf dem Nyquist-Kriterium, auf immer mehr Hindernisse und Probleme. Die kürzlich entwickelte Theorie des Compressive Sensing (CS) formuliert das Versprechen einige dieser Hindernisse zu beseitigen, indem hier allgemeinere Signalaufnahme und -rekonstruktionsverfahren zum Einsatz kommen können. Dies erlaubt, dass hierbei einzelne Abtastwerte komplexer strukturierte Informationen über das Signal enthalten können als dies bei konventiellem Nyquistsampling der Fall ist. Gleichzeitig verändert sich die Signalrekonstruktion notwendigerweise zu einem nicht-linearen Vorgang und ebenso müssen viele Hardwarekonzepte für praktische Anwendungen neu überdacht werden. Das heißt, dass man zwischen der Menge an Information, die man über Signale gewinnen kann, und dem Aufwand für das Design und Betreiben eines Signalverarbeitungssystems abwägen kann und muss. Die hier vorgestellte Arbeit trägt dazu bei, dass bei diesem Abwägen CS mehr begünstigt werden kann, indem neue Resultate vorgestellt werden, die es erlauben, dass CS einfacher in der Praxis Anwendung finden kann, wobei die zu erwartende Leistungsfähigkeit des Systems theoretisch fundiert ist. Beispielsweise spielt das Konzept der Sparsity eine zentrale Rolle, weshalb diese Arbeit eine Methode präsentiert, womit der Grad der Sparsity eines Vektors mittels einer einzelnen Beobachtung geschätzt werden kann. Wir zeigen auf, dass dieser Ansatz für Sparsity Order Estimation zu einem niedrigeren Rekonstruktionsfehler führt, wenn man diesen mit einer Rekonstruktion vergleicht, welcher die Sparsity des Vektors unbekannt ist. Um die Modellierung von Signalen und deren Rekonstruktion effizienter zu gestalten, stellen wir das Konzept von der matrixfreien Darstellung linearer Operatoren vor. Für die einfachere Anwendung dieser Darstellung präsentieren wir eine freie Softwarearchitektur und demonstrieren deren Vorzüge, wenn sie für die Rekonstruktion in einem CS-System genutzt wird. Konkret wird der Nutzen dieser Bibliothek, einerseits für das Ermitteln von Defektpositionen in Prüfkörpern mittels Ultraschall, und andererseits für das Schätzen von Streuern in einem Funkkanal aus Ultrabreitbanddaten, demonstriert. Darüber hinaus stellen wir für die Verarbeitung der Ultraschalldaten eine Rekonstruktionspipeline vor, welche Daten verarbeitet, die im Frequenzbereich Unterabtastung erfahren haben. Wir beschreiben effiziente Algorithmen, die bei der Modellierung und der Rekonstruktion zum Einsatz kommen und wir leiten asymptotische Resultate für die benötigte Anzahl von Messwerten, sowie die zu erwartenden Lokalisierungsgenauigkeiten der Defekte her. Wir zeigen auf, dass das vorgestellte System starke Kompression zulässt, ohne die Bildgebung und Defektlokalisierung maßgeblich zu beeinträchtigen. Für die Lokalisierung von Streuern mittels Ultrabreitbandradaren stellen wir ein CS-System vor, welches auf einem Random Demodulators basiert. Im Vergleich zu existierenden Messverfahren ist die hieraus resultierende Schätzung der Kanalimpulsantwort robuster gegen die Effekte von zeitvarianten Funkkanälen. Um den inhärenten Modellfehler, den gitterbasiertes CS begehen muss, zu beseitigen, zeigen wir auf wie Atomic Norm Minimierung es erlaubt ohne die Einschränkung auf ein endliches und diskretes Gitter R-dimensionale spektrale Komponenten aus komprimierten Beobachtungen zu schätzen. Hierzu leiten wir eine R-dimensionale Variante des ADMM her, welcher dazu in der Lage ist die Signalkovarianz in diesem allgemeinen Szenario zu schätzen. Weiterhin zeigen wir, wie dieser Ansatz zur Richtungsschätzung mit realistischen Antennenarraygeometrien genutzt werden kann. In diesem Zusammenhang präsentieren wir auch eine Methode, welche mittels Stochastic gradient descent Messmatrizen ermitteln kann, die sich gut für Parameterschätzung eignen. Die hieraus resultierenden Kompressionsverfahren haben die Eigenschaft, dass die Schätzgenauigkeit über den gesamten Parameterraum ein möglichst uniformes Verhalten zeigt. Zuletzt zeigen wir auf, dass die Kombination des ADMM und des Stochastic Gradient descent das Design eines CS-Systems ermöglicht, welches in diesem gitterfreien Szenario wünschenswerte Eigenschaften hat.Along with the ever increasing number of sensors, which are also generating rapidly growing amounts of data, the traditional paradigm of sampling adhering the Nyquist criterion is facing an equally increasing number of obstacles. The rather recent theory of Compressive Sensing (CS) promises to alleviate some of these drawbacks by proposing to generalize the sampling and reconstruction schemes such that the acquired samples can contain more complex information about the signal than Nyquist samples. The proposed measurement process is more complex and the reconstruction algorithms necessarily need to be nonlinear. Additionally, the hardware design process needs to be revisited as well in order to account for this new acquisition scheme. Hence, one can identify a trade-off between information that is contained in individual samples of a signal and effort during development and operation of the sensing system. This thesis addresses the necessary steps to shift the mentioned trade-off more to the favor of CS. We do so by providing new results that make CS easier to deploy in practice while also maintaining the performance indicated by theoretical results. The sparsity order of a signal plays a central role in any CS system. Hence, we present a method to estimate this crucial quantity prior to recovery from a single snapshot. As we show, this proposed Sparsity Order Estimation method allows to improve the reconstruction error compared to an unguided reconstruction. During the development of the theory we notice that the matrix-free view on the involved linear mappings offers a lot of possibilities to render the reconstruction and modeling stage much more efficient. Hence, we present an open source software architecture to construct these matrix-free representations and showcase its ease of use and performance when used for sparse recovery to detect defects from ultrasound data as well as estimating scatterers in a radio channel using ultra-wideband impulse responses. For the former of these two applications, we present a complete reconstruction pipeline when the ultrasound data is compressed by means of sub-sampling in the frequency domain. Here, we present the algorithms for the forward model, the reconstruction stage and we give asymptotic bounds for the number of measurements and the expected reconstruction error. We show that our proposed system allows significant compression levels without substantially deteriorating the imaging quality. For the second application, we develop a sampling scheme to acquire the channel Impulse Response (IR) based on a Random Demodulator that allows to capture enough information in the recorded samples to reliably estimate the IR when exploiting sparsity. Compared to the state of the art, this in turn allows to improve the robustness to the effects of time-variant radar channels while also outperforming state of the art methods based on Nyquist sampling in terms of reconstruction error. In order to circumvent the inherent model mismatch of early grid-based compressive sensing theory, we make use of the Atomic Norm Minimization framework and show how it can be used for the estimation of the signal covariance with R-dimensional parameters from multiple compressive snapshots. To this end, we derive a variant of the ADMM that can estimate this covariance in a very general setting and we show how to use this for direction finding with realistic antenna geometries. In this context we also present a method based on a Stochastic gradient descent iteration scheme to find compression schemes that are well suited for parameter estimation, since the resulting sub-sampling has a uniform effect on the whole parameter space. Finally, we show numerically that the combination of these two approaches yields a well performing grid-free CS pipeline

Digitale Bibliothek Thüringen

FullExpression - Emotion Recognition Software

Author: Rocha Ricardo Gomes da
Publication venue
Publication date: 01/01/2019
Field of study

During human evolution emotion expression became an important social tool that contributed to the complexification of societies. Human-computer interaction is commonly present in our daily life, and the industry is struggling for solutions that can analyze human emotions, in an attempt to provide better experiences. The purpose of this study was to understand if a software built using the transfer-learning technique on a deep learning model was capable of classifying human emotions, through facial expression analysis. A Convolutional Neuronal Network model was trained and used in a web application, which is available online. Several tools were created to facilitate the software development process, including the training and validation processes, and these are also available online. The data was collected after the combination of several facial expression emotion databases, such as KDEF_AKDEF, TFEID, Face_Place and jaffe. Software evaluation reveled an accuracy in identifying the correct emotions close to 80%. In addition, a comparison between the software and preliminary data from human’s performance, on recognizing facial expressed emotions, suggested that the software performed better. This work can be useful in many different domains such as marketing (to understand the effect of marketing campaigns on people’s emotional states), health (to help mental diseases diagnosis) and industry 4.0 (to create a better collaborating environment between humans and machines).Durante a evolução da espécie humana, a expressões de emoções tornou-se uma ferramenta social importante, que permitiu a criação de sociedades cada vez mais complexas. A interação entre humanos e máquinas acontece regularmente, evidenciando a necessidade da indústria desenvolver soluções que possam analisar emoções, de modo a proporcionar melhores experiências aos utilizadores. O propósito deste trabalho foi perceber se soluções de software desenvolvidas a partir da técnica de transfer-learning são capazes de classificar emoções humanas, a partir da análise de expressões faciais. Um modelo que implementa a arquitetura Convolutional Neuronal Network foi escolhido para ser treinado e utilizado na aplicação web desenvolvida neste trabalho, que está disponível online. A par da aplicação web, diferentes ferramentas foram criadas de forma a facilitar o processo de criação e avaliação de modelos Deep Learning, e estas também estão disponíveis online. Os dados foram recolhidos após a combinação de várias bases de dados de expressões de emoções (KDEF_AKDEF, TFEID, Face_Place and jaffe). A avaliação do software demostrou uma precisão na classificação de emoções próxima dos 80%. Para além disso, uma comparação entre o software e dados preliminares relativos ao reconhecimento de emoções por pessoas sugere que o software é melhor a classificar emoções. Os resultados deste trabalho podem aplicados em diversas áreas, como a publicidade (de forma a perceber os efeitos das campanhas no estado emocional das pessoas), a saúde (para um melhor diagnóstico de doenças mentais) e na indústria 4.0 (de forma a criar um melhor ambiente de colaboração entre humanos e máquinas)

Repositório Científico do Instituto Politécnico do Porto