Search CORE

10 research outputs found

A novel approach to task abstraction to make better sense of provenance data

Author: Attfield Simon
Battle Leilani
Bors Christian
Dowling Michelle
Endert Alex
Koch Steffen
Kulyk Olga
Laramee Robert
Troy Melanie
Wenskovitch John
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik [Society Publisher]
Publication date: 01/01/2019
Field of study

Working Group Report in 'Provenance and Logging for Sense Making' report from Dagstuhl Seminar 18462: Provenance and Logging for Sense Making, Dagstuhl Reports, Volume 8, Issue 1

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Middlesex University Research Repository

Dagstuhl Research Online Publication Server

HAL-Rennes 1

A novel approach to task abstraction to make better sense of provenance data

Author: Attfield S.
Attfield S.
Battle L.
Battle L.
Bors C.
Bors C.
Dowling M.
Dowling M.
Endert A.
Endert A.
Koch S.
Koch S.
Kulyk O.
Kulyk O.
Laramee R.
Laramee R.
Troy M.
Troy M.
Wenskovitch J.
Wenskovitch J.
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik [Society Publisher]
Publication date: 01/01/2019
Field of study

Working Group Report in 'Provenance and Logging for Sense Making' report from Dagstuhl Seminar 18462: Provenance and Logging for Sense Making, Dagstuhl Reports, Volume 8, Issue 1

Middlesex University Research Repository

Minimal dominating sets enumeration with FPT-delay parameterized by the degeneracy and maximum degree

Author: Bartier Valentin
Defrain Oscar
Inerney Fionn Mc
Publication venue
Publication date: 01/09/2023
Field of study

At STOC 2002, Eiter, Gottlob, and Makino presented a technique called ordered generation that yields an

n^{O(d)}

-delay algorithm listing all minimal transversals of an

n

-vertex hypergraph of degeneracy

d

. Recently at IWOCA 2019, Conte, Kant\'e, Marino, and Uno asked whether this XP-delay algorithm parameterized by

d

could be made FPT-delay parameterized by

d

and the maximum degree

\Delta

, i.e., an algorithm with delay

f(d,\Delta)\cdot n^{O(1)}

for some computable function

f

. Moreover, as a first step toward answering that question, they note that the same delay is open for the intimately related problem of listing all minimal dominating sets in graphs. In this paper, we answer the latter question in the affirmative.Comment: 18 pages, 2 figure

arXiv.org e-Print Archive

Covert Computation in the Abstract Tile-Assembly Model

Author: Alaniz Robert M.
Caballero David
Gomez Timothy
Grizzell Elise
Rodriguez Andrew
Schweller Robert
Wylie Tim
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 2nd Symposium on Algorithmic Foundations of Dynamic Networks (SAND 2023)
Publication date: 01/01/2023
Field of study

There have been many advances in molecular computation that offer benefits such as targeted drug delivery, nanoscale mapping, and improved classification of nanoscale organisms. This power led to recent work exploring privacy in the computation, specifically, covert computation in self-assembling circuits. Here, we prove several important results related to the concept of a hidden computation in the most well-known model of self-assembly, the Abstract Tile-Assembly Model (aTAM). We show that in 2D, surprisingly, the model is capable of covert computation, but only with an exponential-sized assembly. We also show that the model is capable of covert computation with polynomial-sized assemblies with only one step in the third dimension (just-barely 3D). Finally, we investigate types of functions that can be covertly computed as members of P/Poly

Dagstuhl Research Online Publication Server

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

DSI++: Updating Transformer Memory with New Documents

Author: Dehghani Mostafa
Gupta Jai
Mehta Sanket Vaibhav
Metzler Donald
Najork Marc
Rao Jinfeng
Strubell Emma
Tay Yi
Tran Vinh Q.
Publication venue
Publication date: 08/12/2023
Field of study

Differentiable Search Indices (DSIs) encode a corpus of documents in model parameters and use the same model to answer user queries directly. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents (

+12\%

). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting significantly. Concretely, it improves the average Hits@10 by

+21.1\%

over competitive baselines for NQ and requires

6

times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.Comment: Accepted at EMNLP 2023 main conferenc

arXiv.org e-Print Archive

SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification

Author: Aggarwal Pranjal
Deshpande Ameet
Narasimhan Karthik
Publication venue
Publication date: 22/06/2023
Field of study

Extreme classification (XC) involves predicting over large numbers of classes (thousands to millions), with real-world applications like news article classification and e-commerce product tagging. The zero-shot version of this task requires generalization to novel classes without additional supervision. In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-shot performance on three XC datasets derived from legal, e-commerce, and Wikipedia data. To develop SemSup-XC, we use automatically collected semantic class descriptions to represent classes and facilitate generalization through a novel hybrid matching module that matches input instances to class descriptions using a combination of semantic and lexical similarity. Trained with contrastive learning, SemSup-XC significantly outperforms baselines and establishes state-of-the-art performance on all three datasets considered, gaining up to 12 precision points on zero-shot and more than 10 precision points on one-shot tests, with similar gains for recall@10. Our ablation studies highlight the relative importance of our hybrid matching module and automatically collected class descriptions.Comment: Published at ICML 2023. V2: camera ready version at ICML 202

arXiv.org e-Print Archive

Encapsulating effects

Author: Lindley Sam
Publication venue
Publication date: 01/01/2018
Field of study

Heriot Watt Pure

Edinburgh Research Explorer

Rule mining on extended knowledge graphs

Author: Jøsang Johanna
Publication venue: The University of Bergen
Publication date: 27/06/2022
Field of study

Masteroppgave i informatikkINF399MAMN-PROGMAMN-IN

University of Bergen

Visual Analytics for the Exploratory Analysis and Labeling of Cultural Data

Author: Meinecke Christofer
Publication venue
Publication date: 20/10/2023
Field of study

Cultural data can come in various forms and modalities, such as text traditions, artworks, music, crafted objects, or even as intangible heritage such as biographies of people, performing arts, cultural customs and rites. The assignment of metadata to such cultural heritage objects is an important task that people working in galleries, libraries, archives, and museums (GLAM) do on a daily basis. These rich metadata collections are used to categorize, structure, and study collections, but can also be used to apply computational methods. Such computational methods are in the focus of Computational and Digital Humanities projects and research. For the longest time, the digital humanities community has focused on textual corpora, including text mining, and other natural language processing techniques. Although some disciplines of the humanities, such as art history and archaeology have a long history of using visualizations. In recent years, the digital humanities community has started to shift the focus to include other modalities, such as audio-visual data. In turn, methods in machine learning and computer vision have been proposed for the specificities of such corpora. Over the last decade, the visualization community has engaged in several collaborations with the digital humanities, often with a focus on exploratory or comparative analysis of the data at hand. This includes both methods and systems that support classical Close Reading of the material and Distant Reading methods that give an overview of larger collections, as well as methods in between, such as Meso Reading. Furthermore, a wider application of machine learning methods can be observed on cultural heritage collections. But they are rarely applied together with visualizations to allow for further perspectives on the collections in a visual analytics or human-in-the-loop setting. Visual analytics can help in the decision-making process by guiding domain experts through the collection of interest. However, state-of-the-art supervised machine learning methods are often not applicable to the collection of interest due to missing ground truth. One form of ground truth are class labels, e.g., of entities depicted in an image collection, assigned to the individual images. Labeling all objects in a collection is an arduous task when performed manually, because cultural heritage collections contain a wide variety of different objects with plenty of details. A problem that arises with these collections curated in different institutions is that not always a specific standard is followed, so the vocabulary used can drift apart from another, making it difficult to combine the data from these institutions for large-scale analysis. This thesis presents a series of projects that combine machine learning methods with interactive visualizations for the exploratory analysis and labeling of cultural data. First, we define cultural data with regard to heritage and contemporary data, then we look at the state-of-the-art of existing visualization, computer vision, and visual analytics methods and projects focusing on cultural data collections. After this, we present the problems addressed in this thesis and their solutions, starting with a series of visualizations to explore different facets of rap lyrics and rap artists with a focus on text reuse. Next, we engage in a more complex case of text reuse, the collation of medieval vernacular text editions. For this, a human-in-the-loop process is presented that applies word embeddings and interactive visualizations to perform textual alignments on under-resourced languages supported by labeling of the relations between lines and the relations between words. We then switch the focus from textual data to another modality of cultural data by presenting a Virtual Museum that combines interactive visualizations and computer vision in order to explore a collection of artworks. With the lessons learned from the previous projects, we engage in the labeling and analysis of medieval illuminated manuscripts and so combine some of the machine learning methods and visualizations that were used for textual data with computer vision methods. Finally, we give reflections on the interdisciplinary projects and the lessons learned, before we discuss existing challenges when working with cultural heritage data from the computer science perspective to outline potential research directions for machine learning and visual analytics of cultural heritage data

Qucosa - Publikationsserver der Universität Leipzig

Designing Attentive Information Dashboards with Eye Tracking Technology

Author: Toreini Peyman
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen