77 research outputs found
MARIO: Model Agnostic Recipe for Improving OOD Generalization of Graph Contrastive Learning
In this work, we investigate the problem of out-of-distribution (OOD)
generalization for unsupervised learning methods on graph data. This scenario
is particularly challenging because graph neural networks (GNNs) have been
shown to be sensitive to distributional shifts, even when labels are available.
To address this challenge, we propose a \underline{M}odel-\underline{A}gnostic
\underline{R}ecipe for \underline{I}mproving \underline{O}OD generalizability
of unsupervised graph contrastive learning methods, which we refer to as MARIO.
MARIO introduces two principles aimed at developing distributional-shift-robust
graph contrastive methods to overcome the limitations of existing frameworks:
(i) Information Bottleneck (IB) principle for achieving generalizable
representations and (ii) Invariant principle that incorporates adversarial data
augmentation to obtain invariant representations. To the best of our knowledge,
this is the first work that investigates the OOD generalization problem of
graph contrastive learning, with a specific focus on node-level tasks. Through
extensive experiments, we demonstrate that our method achieves state-of-the-art
performance on the OOD test set, while maintaining comparable performance on
the in-distribution test set when compared to existing approaches. The source
code for our method can be found at: https://github.com/ZhuYun97/MARIOComment: 20 pages, 15 figure
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
Recent advancements in reference-free learned metrics for open-domain
dialogue evaluation have been driven by the progress in pre-trained language
models and the availability of dialogue data with high-quality human
annotations. However, current studies predominantly concentrate on English
dialogues, and the generalization of these metrics to other languages has not
been fully examined. This is largely due to the absence of a multilingual
dialogue evaluation benchmark. To address the issue, we introduce xDial-Eval,
built on top of open-source English dialogue evaluation datasets. xDial-Eval
includes 12 turn-level and 6 dialogue-level English datasets, comprising 14930
annotated turns and 8691 annotated dialogues respectively. The English dialogue
data are extended to nine other languages with commercial machine translation
systems. On xDial-Eval, we conduct comprehensive analyses of previous
BERT-based metrics and the recently-emerged large language models. Lastly, we
establish strong self-supervised and multilingual baselines. In terms of
average Pearson correlations over all datasets and languages, the best baseline
outperforms OpenAI's ChatGPT by absolute improvements of 6.5% and 4.6% at the
turn and dialogue levels respectively, albeit with much fewer parameters. The
data and code are publicly available at https://github.com/e0397123/xDial-Eval.Comment: Accepted to EMNLP-2023 Finding
The Spatial and Temporal Dynamics of Rabies in China
Rabies is a major problem in developing countries and responsible for more than 55,000 deaths annually. More than half of the cases occur in Asia and China has the second highest incidence of rabies after India. Human rabies cases in China decreased during the early 1990s but the virus began to re-emerge in the latter half of the decade and spread rapidly across the country with a corresponding increase in cases. To try and learn more about the epidemic, in 2006 the government implemented a trial surveillance program to sample and screen canine populations in locations where human cases were reported. In this work we selected a subset of samples (representative of the entire epidemic region) for sequencing and investigated the history and origin of the virus in China and examined the variation from a geographical perspective. Our results indicate that the epidemic is primarily composed of a younger strain with a geographical dispersion that was consistent with the recorded spread of the virus and a second older strain that corresponds to a previous epidemic. This second group exhibits a different geographical pattern, and it appears that this strain remained at low levels throughout the country and was able to re-emerge as the epidemic took hold
Comparative genomics of Toll-like receptor signalling in five species
<p>Abstract</p> <p>Background</p> <p>Over the last decade, several studies have identified quantitative trait loci (QTL) affecting variation of immune related traits in mammals. Recent studies in humans and mice suggest that part of this variation may be caused by polymorphisms in genes involved in Toll-like receptor (TLR) signalling. In this project, we used a comparative approach to investigate the importance of TLR-related genes in comparison with other immunologically relevant genes for resistance traits in five species by associating their genomic location with previously published immune-related QTL regions.</p> <p>Results</p> <p>We report the genomic localisation of <it>TLR1-10 </it>and ten associated signalling molecules in sheep and pig using <it>in-silico </it>and/or radiation hybrid (RH) mapping techniques and compare their positions with their annotated homologues in the human, cattle and mouse whole genome sequences. We also report medium-density RH maps for porcine chromosomes 8 and 13. A comparative analysis of the positions of previously published relevant QTLs allowed the identification of homologous regions that are associated with similar health traits in several species and which contain TLR related and other immunologically relevant genes. Additional evidence was gathered by examining relevant gene expression and association studies.</p> <p>Conclusion</p> <p>This comparative genomic approach identified eight genes as potentially causative genes for variations of health related traits. These include susceptibility to clinical mastitis in dairy cattle, general disease resistance in sheep, cattle, humans and mice, and tolerance to protozoan infection in cattle and mice. Four TLR-related genes (<it>TLR1</it>, <it>6</it>, <it>MyD88</it>, <it>IRF3</it>) appear to be the most likely candidate genes underlying QTL regions which control the resistance to the same or similar pathogens in several species. Further studies are required to investigate the potential role of polymorphisms within these genes.</p
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST
speaker recognition evaluations (SRE). The latest edition of such joint
submission was in SRE 2018, in which the I4U submission was among the
best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
consortium into NIST SRE series of evaluation. The primary objective of the
current paper is to summarize the results and lessons learned based on the
twelve sub-systems and their fusion submitted to SRE'18. It is also our
intention to present a shared view on the advancements, progresses, and major
paradigm shifts that we have witnessed as an SRE participant in the past decade
from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
shift from supervector representation to deep speaker embedding, and a switch
of research challenge from channel compensation to domain adaptation.Comment: 5 page
A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms
We describe a genetic variation map for the chicken genome containing 2.8 million single-nucleotide polymorphisms ( SNPs). This map is based on a comparison of the sequences of three domestic chicken breeds ( a broiler, a layer and a Chinese silkie) with that of their wild ancestor, red jungle fowl. Subsequent experiments indicate that at least 90% of the variant sites are true SNPs, and at least 70% are common SNPs that segregate in many domestic breeds. Mean nucleotide diversity is about five SNPs per kilobase for almost every possible comparison between red jungle fowl and domestic lines, between two different domestic lines, and within domestic lines - in contrast to the notion that domestic animals are highly inbred relative to their wild ancestors. In fact, most of the SNPs originated before domestication, and there is little evidence of selective sweeps for adaptive alleles on length scales greater than 100 kilobases
- …