86 research outputs found
Do logarithmic proximity measures outperform plain ones in graph clustering?
We consider a number of graph kernels and proximity measures including
commute time kernel, regularized Laplacian kernel, heat kernel, exponential
diffusion kernel (also called "communicability"), etc., and the corresponding
distances as applied to clustering nodes in random graphs and several
well-known datasets. The model of generating random graphs involves edge
probabilities for the pairs of nodes that belong to the same class or different
predefined classes of nodes. It turns out that in most cases, logarithmic
measures (i.e., measures resulting after taking logarithm of the proximities)
perform better while distinguishing underlying classes than the "plain"
measures. A comparison in terms of reject curves of inter-class and intra-class
distances confirms this conclusion. A similar conclusion can be made for
several well-known datasets. A possible origin of this effect is that most
kernels have a multiplicative nature, while the nature of distances used in
cluster algorithms is an additive one (cf. the triangle inequality). The
logarithmic transformation is a tool to transform the first nature to the
second one. Moreover, some distances corresponding to the logarithmic measures
possess a meaningful cutpoint additivity property. In our experiments, the
leader is usually the logarithmic Communicability measure. However, we indicate
some more complicated cases in which other measures, typically, Communicability
and plain Walk, can be the winners.Comment: 11 pages, 5 tables, 9 figures. Accepted for publication in the
Proceedings of 6th International Conference on Network Analysis, May 26-28,
2016, Nizhny Novgorod, Russi
Embedding-based Scientific Literature Discovery in a Text Editor Application
Each claim in a research paper requires all relevant prior knowledge to be
discovered, assimilated, and appropriately cited. However, despite the
availability of powerful search engines and sophisticated text editing
software, discovering relevant papers and integrating the knowledge into a
manuscript remain complex tasks associated with high cognitive load. To define
comprehensive search queries requires strong motivation from authors,
irrespective of their familiarity with the research field. Moreover, switching
between independent applications for literature discovery, bibliography
management, reading papers, and writing text burdens authors further and
interrupts their creative process. Here, we present a web application that
combines text editing and literature discovery in an interactive user
interface. The application is equipped with a search engine that couples
Boolean keyword filtering with nearest neighbor search over text embeddings,
providing a discovery experience tuned to an author's manuscript and his
interests. Our application aims to take a step towards more enjoyable and
effortless academic writing.
The demo of the application (https://SciEditorDemo2020.herokuapp.com/) and a
short video tutorial (https://youtu.be/pkdVU60IcRc) are available online
Representation learning for very short texts using weighted word embedding aggregation
Short text messages such as tweets are very noisy and sparse in their use of
vocabulary. Traditional textual representations, such as tf-idf, have
difficulty grasping the semantic meaning of such texts, which is important in
applications such as event detection, opinion mining, news recommendation, etc.
We constructed a method based on semantic word embeddings and frequency
information to arrive at low-dimensional representations for short texts
designed to capture semantic similarity. For this purpose we designed a
weight-based model and a learning procedure based on a novel median-based loss
function. This paper discusses the details of our model and the optimization
methods, together with the experimental results on both Wikipedia and Twitter
data. We find that our method outperforms the baseline approaches in the
experiments, and that it generalizes well on different word embeddings without
retraining. Our method is therefore capable of retaining most of the semantic
information in the text, and is applicable out-of-the-box.Comment: 8 pages, 3 figures, 2 tables, appears in Pattern Recognition Letter
Learning semantic sentence representations from visually grounded language without lexical knowledge
Current approaches to learning semantic representations of sentences often
use prior word-level knowledge. The current study aims to leverage visual
information in order to capture sentence level semantics without the need for
word embeddings. We use a multimodal sentence encoder trained on a corpus of
images with matching text captions to produce visually grounded sentence
embeddings. Deep Neural Networks are trained to map the two modalities to a
common embedding space such that for an image the corresponding caption can be
retrieved and vice versa. We show that our model achieves results comparable to
the current state-of-the-art on two popular image-caption retrieval benchmark
data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the
resulting sentence embeddings using the data from the Semantic Textual
Similarity benchmark task and show that the multimodal embeddings correlate
well with human semantic similarity judgements. The system achieves
state-of-the-art results on several of these benchmarks, which shows that a
system trained solely on multimodal data, without assuming any word
representations, is able to capture sentence level semantics. Importantly, this
result shows that we do not need prior knowledge of lexical level semantics in
order to model sentence level semantics. These findings demonstrate the
importance of visual information in semantics
Enhancing scene text recognition with visual context information
This thesis addresses the problem of improving text spotting systems, which aim to detect and recognize text in unrestricted images (e.g. a street sign, an advertisement, a bus destination, etc.). The goal is to improve the performance of off-the-shelf vision systems by exploiting the semantic information derived from the image itself. The rationale is that knowing the content of the image or the visual context can help to decide which words are the correct andidate
words.
For example, the fact that an image shows a coffee shop makes it more likely that a word on a signboard reads as Dunkin and not unkind.
We address this problem by drawing on successful developments in natural language processing and machine learning, in particular, learning to re-rank and neural networks, to present post-process frameworks that improve state-of-the-art text spotting systems without the need for costly data-driven re-training or tuning procedures.
Discovering the degree of semantic relatedness of candidate words and their image context is a task related to assessing the semantic similarity between words or text fragments. However, semantic relatedness is more general than similarity (e.g. car, road, and traffic light are related but not similar) and requires certain adaptations. To meet the requirements of these broader perspectives of semantic similarity, we develop two approaches to learn the semantic related-ness of the spotted word and its environmental context: word-to-word (object) or word-to-sentence (caption). In the word-to-word approach, word embed-ding based re-rankers are developed. The re-ranker takes the words from the text spotting baseline and re-ranks them based on the visual context from the object classifier. For the second, an end-to-end neural approach is designed to drive image description (caption) at the sentence-level as well as the word-level (objects) and re-rank them based not only on the visual context but also on the co-occurrence between them.
As an additional contribution, to meet the requirements of data-driven ap-proaches such as neural networks, we propose a visual context dataset for this task, in which the publicly available COCO-text dataset [Veit et al. 2016] has been extended with information about the scene (including the objects and places appearing in the image) to enable researchers to include the semantic relations between texts and scene in their Text Spotting systems, and to offer a common evaluation baseline for such approaches.Aquesta tesi aborda el problema de millorar els sistemes de reconeixement de text, que permeten detectar i reconèixer text en imatges no restringides (per exemple, un cartell al carrer, un anunci, una destinació d’autobús, etc.). L’objectiu és millorar el rendiment dels sistemes de visió existents explotant la informació semà ntica derivada de la pròpia imatge. La idea principal és que conèixer el contingut de la imatge o el context visual en el que un text apareix, pot ajudar a decidir quines són les paraules correctes. Per exemple, el fet que una imatge mostri una cafeteria fa que sigui més probable que una paraula en un rètol es llegeixi com a Dunkin que no pas com unkind. Abordem aquest problema recorrent a avenços en el processament del llenguatge natural i l’aprenentatge automà tic, en particular, aprenent re-rankers i xarxes neuronals, per presentar solucions de postprocés que milloren els sistemes de l’estat de l’art de reconeixement de text, sense necessitat de costosos procediments de reentrenament o afinació que requereixin grans quantitats de dades. Descobrir el grau de relació semà ntica entre les paraules candidates i el seu context d’imatge és una tasca relacionada amb l’avaluació de la semblança semà ntica entre paraules o fragments de text. Tanmateix, determinar l’existència d’una relació semà ntica és una tasca més general que avaluar la semblança (per exemple, cotxe, carretera i semà for estan relacionats però no són similars) i per tant els mètodes existents requereixen certes adaptacions. Per satisfer els requisits d’aquestes perspectives més à mplies de relació semà ntica, desenvolupem dos enfocaments per aprendre la relació semà ntica de la paraula reconeguda i el seu context: paraula-a-paraula (amb els objectes a la imatge) o paraula-a-frase (subtÃtol de la imatge). En l’enfocament de paraula-a-paraula s’usen re-rankers basats en word-embeddings. El re-ranker pren les paraules proposades pel sistema base i les torna a reordenar en funció del context visual proporcionat pel classificador d’objectes. Per al segon cas, s’ha dissenyat un enfocament neuronal d’extrem a extrem per explotar la descripció de la imatge (subtÃtol) tant a nivell de frase com a nivell de paraula i re-ordenar les paraules candidates basant-se tant en el context visual com en les co-ocurrències amb el subtÃtol. Com a contribució addicional, per satisfer els requisits dels enfocs basats en dades com ara les xarxes neuronals, presentem un conjunt de dades de contextos visuals per a aquesta tasca, en el què el conjunt de dades COCO-text disponible públicament [Veit et al. 2016] s’ha ampliat amb informació sobre l’escena (inclosos els objectes i els llocs que apareixen a la imatge) per permetre als investigadors incloure les relacions semà ntiques entre textos i escena als seus sistemes de reconeixement de text, i oferir una base d’avaluació comuna per a aquests enfocaments
Addressing Class Imbalance in Multi-Class Image Classification by Means of Auxiliary Feature Space Restrictions
Learning from imbalanced class distributions generally leads to a classifier that is not able to distinguish classes with few training examples from the other classes. In the context of cultural heritage, addressing this problem becomes important when existing digital online collections consisting of images depicting artifacts and assigned semantic annotations shall be completed automatically; images with known annotations can be used to train a classifier that predicts missing information, where training data is often highly imbalanced. In the present paper, combining a classification loss with an auxiliary clustering loss is proposed to improve the classification performance particularly for underrepresented classes, where additionally different sampling strategies are applied. The proposed auxiliary loss aims to cluster feature vectors with respect to the semantic annotations as well as to visual properties of the images to be classified and thus, is supposed to help the classifier in distinguishing individual classes. We conduct an ablation study on a dataset consisting of images depicting silk fabrics coming along with annotations for different silk-related classification tasks. Experimental results show improvements of up to 10.5% in average F1-score and up to 20.8% in the F1-score averaged over the underrepresented classes in some classification tasks
Recommended from our members
Unsupervised Outlier Detection and Semi-Supervised Learning ; CU-CS-976-04
- …