4,406 research outputs found
Relay-Linking Models for Prominence and Obsolescence in Evolving Networks
The rate at which nodes in evolving social networks acquire links (friends,
citations) shows complex temporal dynamics. Preferential attachment and link
copying models, while enabling elegant analysis, only capture rich-gets-richer
effects, not aging and decline. Recent aging models are complex and heavily
parameterized; most involve estimating 1-3 parameters per node. These
parameters are intrinsic: they explain decline in terms of events in the past
of the same node, and do not explain, using the network, where the linking
attention might go instead. We argue that traditional characterization of
linking dynamics are insufficient to judge the faithfulness of models. We
propose a new temporal sketch of an evolving graph, and introduce several new
characterizations of a network's temporal dynamics. Then we propose a new
family of frugal aging models with no per-node parameters and only two global
parameters. Our model is based on a surprising inversion or undoing of triangle
completion, where an old node relays a citation to a younger follower in its
immediate vicinity. Despite very few parameters, the new family of models shows
remarkably better fit with real data. Before concluding, we analyze temporal
signatures for various research communities yielding further insights into
their comparative dynamics. To facilitate reproducible research, we shall soon
make all the codes and the processed dataset available in the public domain
Learning Multimodal Graph-to-Graph Translation for Molecular Optimization
We view molecular optimization as a graph-to-graph translation problem. The
goal is to learn to map from one molecular graph to another with better
properties based on an available corpus of paired molecules. Since molecules
can be optimized in different ways, there are multiple viable translations for
each input graph. A key challenge is therefore to model diverse translation
outputs. Our primary contributions include a junction tree encoder-decoder for
learning diverse graph translations along with a novel adversarial training
method for aligning distributions of molecules. Diverse output distributions in
our model are explicitly realized by low-dimensional latent vectors that
modulate the translation process. We evaluate our model on multiple molecular
optimization tasks and show that our model outperforms previous
state-of-the-art baselines
Linked Data Supported Information Retrieval
Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestützten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der Effektivität der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. Zunächst wird eine Einführung in die Grundlagen des Information Retrieval und Linked Data gegeben. Anschließend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren Verknüpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgeführt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ähnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ähnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. Hierfür werden zwei Anwendungen präsentiert. Zum einen eine Linked Data basierte explorative Erweiterung als Ergänzung zu einer traditionellen schlüsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem
Identification of MIR-Flickr near-duplicate images : a benchmark collection for near-duplicate detection
There are many contexts where the automated detection of near-duplicate images is important, for example the detection of copyright infringement or images of child abuse. There are many published methods for the detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively compared with each other, probably because of a lack of any good framework in which to do so. Published sets of near-duplicate images exist, but are typically small, specialist, or generated. Here, we give a new test set based on a large, serendipitously selected collection of high quality images. Having observed that the MIR- Flickr 1M image set contains a significant number of near-duplicate images, we have discovered the majority of these. We disclose a set of 1,958 near-duplicate clusters from within the set, and show that this is very likely to contain almost all of the near-duplicate pairs that exist. The main contribution of this publication is the identification of these images, which may then be used by other authors to make comparisons as they see fit. In particular however, near-duplicate classification functions may now be accurately tested for sensitivity and specificity over a general collection of images
Enhancing the Ranking Context of Dense Retrieval Methods through Reciprocal Nearest Neighbors
Sparse annotation poses persistent challenges to training dense retrieval
models; for example, it distorts the training signal when unlabeled relevant
documents are used spuriously as negatives in contrastive learning. To
alleviate this problem, we introduce evidence-based label smoothing, a novel,
computationally efficient method that prevents penalizing the model for
assigning high relevance to false negatives. To compute the target relevance
distribution over candidate documents within the ranking context of a given
query, we assign a non-zero relevance probability to those candidates most
similar to the ground truth based on the degree of their similarity to the
ground-truth document(s).
To estimate relevance we leverage an improved similarity metric based on
reciprocal nearest neighbors, which can also be used independently to rerank
candidates in post-processing. Through extensive experiments on two large-scale
ad hoc text retrieval datasets, we demonstrate that reciprocal nearest
neighbors can improve the ranking effectiveness of dense retrieval models, both
when used for label smoothing, as well as for reranking. This indicates that by
considering relationships between documents and queries beyond simple geometric
distance we can effectively enhance the ranking context.Comment: EMNLP 202
DWIE : an entity-centric dataset for multi-task document-level information extraction
This paper presents DWIE, the 'Deutsche Welle corpus for Information Extraction', a newly created multi-task dataset that combines four main Information Extraction (IE) annotation subtasks: (i) Named Entity Recognition (NER), (ii) Coreference Resolution, (iii) Relation Extraction (RE), and (iv) Entity Linking. DWIE is conceived as an entity-centric dataset that describes interactions and properties of conceptual entities on the level of the complete document. This contrasts with currently dominant mention-driven approaches that start from the detection and classification of named entity mentions in individual sentences. Further, DWIE presented two main challenges when building and evaluating IE models for it. First, the use of traditional mention-level evaluation metrics for NER and RE tasks on entity-centric DWIE dataset can result in measurements dominated by predictions on more frequently mentioned entities. We tackle this issue by proposing a new entity-driven metric that takes into account the number of mentions that compose each of the predicted and ground truth entities. Second, the document-level multi-task annotations require the models to transfer information between entity mentions located in different parts of the document, as well as between different tasks, in a joint learning setting. To realize this, we propose to use graph-based neural message passing techniques between document-level mention spans. Our experiments show an improvement of up to 5.5 F-1 percentage points when incorporating neural graph propagation into our joint model. This demonstrates DWIE's potential to stimulate further research in graph neural networks for representation learning in multi-task IE. We make DWIE publicly available at https://github.com/klimzaporojets/DWIE
- …