5,371 research outputs found
Recommended from our members
Exact solutions to the Erdős-Rothschild problem
Let k := (k1,...,k2) be a sequence of natural numbers. For a graph G, let F (G;k) denote the number of colourings of the edges of G with colours 1,...,s such that, for every c ∈ {1,...,s}, the edges of colour c contain no clique of order kc. Write F (n; k) to denote the maximum of F (G;k) over all graphs G on n vertices. There are currently very few known exact (or asymptotic) results for this problem, posed by Erdős and Rothschild in 1974. We prove some new exact results for n → ∞:
(i) A sufficient condition on k which guarantees that every extremal graph is a complete multipartite graph, which systematically recovers all existing exact results.
(ii) Addressing the original question of Erdős and Rothschild, in the case k = (3,..., 3) of length 7, the unique extremal graph is the complete balanced 8-partite graph, with colourings coming from Hadamard matrices of order 8.
(iii) In the case k = (k+ 1, k), for which the sufficient condition in (i) does not hold, for 3 ≤ k ≤ 10, the unique extremal graph is complete k-partite with one part of size less than k and the other parts as equal in size as
possible
Cross-domain interactions confer stability to benthic biofilms in proglacial streams
Cross-domain interactions are an integral part of the success of biofilms in natural environments but remain poorly understood. Here, we describe cross-domain interactions in stream biofilms draining proglacial floodplains in the Swiss Alps. These streams, as a consequence of the retreat of glaciers, are characterised by multiple environmental gradients and perturbations (e.g., changes in channel geomorphology, discharge) that depend on the time since deglaciation. We evaluate co-occurrence of bacteria and eukaryotic communities along streams and show that key community members have disproportionate effects on the stability of community networks. The topology of the networks, here quantified as the arrangement of the constituent nodes formed by specific taxa, was independent of stream type and their apparent environmental stability. However, network stability against fragmentation was higher in the streams draining proglacial terrain that was more recently deglaciated. We find that bacteria, eukaryotic photoautotrophs, and fungi are central to the stability of these networks, which fragment upon the removal of both pro- and eukaryotic taxa. Key taxa are not always abundant, suggesting an underlying functional component to their contributions. Thus, we show that there is a key role played by individual taxa in determining microbial community stability of glacier-fed streams
Self-supervised learning for transferable representations
Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks
An embedding technique in the study of word-representability of graphs
Word-representable graphs, which are the same as semi-transitively orientable graphs, generalize several fundamental classes of graphs. In this paper we propose a novel approach to study word-representability of graphs using a technique of homomorphisms. As a proof of concept, we apply our method to show word-representability of the simplified graph of overlapping permutations that we introduce in this paper. For another application, we obtain results on word-representability of certain subgraphs of simplified de Bruijn graphs that were introduced recently by Petyuk and studied in the context of word-representability
Cross-domain interactions confer stability to benthic biofilms in proglacial streams
Cross-domain interactions are an integral part of the success of biofilms in natural environments but remain poorly understood. Here, we describe cross-domain interactions in stream biofilms draining proglacial floodplains in the Swiss Alps. These streams, as a consequence of the retreat of glaciers, are characterised by multiple environmental gradients and perturbations (e.g., changes in channel geomorphology, discharge) that depend on the time since deglaciation. We evaluate co-occurrence of bacteria and eukaryotic communities along streams and show that key community members have disproportionate effects on the stability of community networks. The topology of the networks, here quantified as the arrangement of the constituent nodes formed by specific taxa, was independent of stream type and their apparent environmental stability. However, network stability against fragmentation was higher in the streams draining proglacial terrain that was more recently deglaciated. We find that bacteria, eukaryotic photoautotrophs, and fungi are central to the stability of these networks, which fragment upon the removal of both pro- and eukaryotic taxa. Key taxa are not always abundant, suggesting an underlying functional component to their contributions. Thus, we show that there is a key role played by individual taxa in determining microbial community stability of glacier-fed streams
Security Aspects in Web of Data Based on Trust Principles. A brief of Literature Review
Within scientific community, there is a certain consensus to define "Big Data" as a global set, through a complex integration that embraces several dimensions from using of research data, Open Data, Linked Data, Social Network Data, etc. These data are scattered in different sources, which suppose a mix that respond to diverse philosophies, great diversity of structures, different denominations, etc. Its management faces great technological and methodological challenges: The discovery and selection of data, its extraction and final processing, preservation, visualization, access possibility, greater or lesser structuring, between other aspects, which allow showing a huge domain of study at the level of analysis and implementation in different knowledge domains. However, given the data availability and its possible opening: What problems do the data opening face? This paper shows a literature review about these security aspects
On the Power of Threshold-Based Algorithms for Detecting Cycles in the CONGEST Model
It is known that, for every , -freeness can be decided by a
generic Monte-Carlo algorithm running in rounds in the
CONGEST model. For , faster Monte-Carlo algorithms do exist,
running in rounds, based on upper bounding the number of
messages to be forwarded, and aborting search sub-routines for which this
number exceeds certain thresholds. We investigate the possible extension of
these threshold-based algorithms, for the detection of larger cycles. We first
show that, for every , there exists an infinite family of graphs
containing a -cycle for which any threshold-based algorithm fails to detect
that cycle. Hence, in particular, neither -freeness nor
-freeness can be decided by threshold-based algorithms. Nevertheless,
we show that -freeness can still be decided by a
threshold-based algorithm, running in rounds,
which is faster than using the generic algorithm, which would run in
rounds. Moreover, we exhibit an
infinite collection of families of cycles such that threshold-based algorithms
can decide -freeness for every in this collection.Comment: to be published in SIROCCO 202
Low- and high-resource opinion summarization
Customer reviews play a vital role in the online purchasing decisions we make. The reviews
express user opinions that are useful for setting realistic expectations and uncovering important
details about products. However, some products receive hundreds or even thousands of
reviews, making them time-consuming to read. Moreover, many reviews contain uninformative
content, such as irrelevant personal experiences. Automatic summarization offers an
alternative – short text summaries capturing the essential information expressed in reviews.
Automatically produced summaries can reflect overall or particular opinions and be tailored to
user preferences. Besides being presented on major e-commerce platforms, home assistants
can also vocalize them. This approach can improve user satisfaction by assisting in making
faster and better decisions.
Modern summarization approaches are based on neural networks, often requiring thousands of
annotated samples for training. However, human-written summaries for products are expensive
to produce because annotators need to read many reviews. This has led to annotated data
scarcity where only a few datasets are available. Data scarcity is the central theme of our
works, and we propose a number of approaches to alleviate the problem. The thesis consists
of two parts where we discuss low- and high-resource data settings.
In the first part, we propose self-supervised learning methods applied to customer reviews
and few-shot methods for learning from small annotated datasets. Customer reviews without
summaries are available in large quantities, contain a breadth of in-domain specifics, and
provide a powerful training signal. We show that reviews can be used for learning summarizers
via a self-supervised objective. Further, we address two main challenges associated with
learning from small annotated datasets. First, large models rapidly overfit on small datasets
leading to poor generalization. Second, it is not possible to learn a wide range of in-domain
specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to
subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We
address the first challenge by explicitly modeling summary properties (e.g., content coverage
and sentiment alignment). Furthermore, we leverage small modules – adapters – that are
more robust to overfitting. As we show, despite their size, these modules can be used to
store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method
for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and
‘resolution.’ This task is harder to learn, and we present a few-shot method for training a
query-based summarizer on small annotated datasets.
In the second part, we focus on the high-resource setting and present a large dataset with
summaries collected from various online resources. The dataset has more than 33,000 humanwritten
summaries, where each is linked up to thousands of reviews. This, however, makes it
challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To
address this problem, we propose selecting small subsets of informative reviews. Only these
subsets are encoded by the deep encoder and subsequently summarized. We show that the
selector and summarizer can be trained end-to-end via amortized inference and policy gradient
methods
LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction
Fully supervised log anomaly detection methods suffer the heavy burden of
annotating massive unlabeled log data. Recently, many semi-supervised methods
have been proposed to reduce annotation costs with the help of parsed
templates. However, these methods consider each keyword independently, which
disregards the correlation between keywords and the contextual relationships
among log sequences. In this paper, we propose a novel weakly supervised log
anomaly detection framework, named LogLG, to explore the semantic connections
among keywords from sequences. Specifically, we design an end-to-end iterative
process, where the keywords of unlabeled logs are first extracted to construct
a log-event graph. Then, we build a subgraph annotator to generate pseudo
labels for unlabeled log sequences. To ameliorate the annotation quality, we
adopt a self-supervised task to pre-train a subgraph annotator. After that, a
detection model is trained with the generated pseudo labels. Conditioned on the
classification results, we re-extract the keywords from the log sequences and
update the log-event graph for the next iteration. Experiments on five
benchmarks validate the effectiveness of LogLG for detecting anomalies on
unlabeled log data and demonstrate that LogLG, as the state-of-the-art weakly
supervised method, achieves significant performance improvements compared to
existing methods.Comment: 12 page
- …