237 research outputs found
Social Inclusion and Foreigner Support in the Post-COVID-19 Era: An Interview-based Survey of Highly Educated Foreigners
departmental bulletin pape
Social Inclusion and Foreigner Support in the Post-COVID-19 Era: An Interview-based Survey of Highly Educated Foreigners
departmental bulletin pape
Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools
This paper introduces a high-throughput software tool framework called {\it
sam2bam} that enables users to significantly speedup pre-processing for
next-generation sequencing data. The sam2bam is especially efficient on
single-node multi-core large-memory systems. It can reduce the runtime of data
pre-processing in marking duplicate reads on a single node system by 156-186x
compared with de facto standard tools. The sam2bam consists of parallel
software components that can fully utilize the multiple processors, available
memory, high-bandwidth of storage, and hardware compression accelerators if
available.
The sam2bam provides file format conversion between well-known genome file
formats, from SAM to BAM, as a basic feature. Additional features such as
analyzing, filtering, and converting the input data are provided by {\it
plug-in} tools, e.g., duplicate marking, which can be attached to sam2bam at
runtime.
We demonstrated that sam2bam could significantly reduce the runtime of NGS
data pre-processing from about two hours to about one minute for a whole-exome
data set on a 16-core single-node system using up to 130 GB of memory. The
sam2bam could reduce the runtime for whole-genome sequencing data from about 20
hours to about nine minutes on the same system using up to 711 GB of memory
Personalized Dialogue Generation with Diversified Traits
Endowing a dialogue system with particular personality traits is essential to
deliver more human-like conversations. However, due to the challenge of
embodying personality via language expression and the lack of large-scale
persona-labeled dialogue data, this research problem is still far from
well-studied. In this paper, we investigate the problem of incorporating
explicit personality traits in dialogue generation to deliver personalized
dialogues.
To this end, firstly, we construct PersonalDialog, a large-scale multi-turn
dialogue dataset containing various traits from a large number of speakers. The
dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers.
Each utterance is associated with a speaker who is marked with traits like Age,
Gender, Location, Interest Tags, etc. Several anonymization schemes are
designed to protect the privacy of each speaker. This large-scale dataset will
facilitate not only the study of personalized dialogue generation, but also
other researches on sociolinguistics or social science.
Secondly, to study how personality traits can be captured and addressed in
dialogue generation, we propose persona-aware dialogue generation models within
the sequence to sequence learning framework. Explicit personality traits
(structured by key-value pairs) are embedded using a trait fusion module.
During the decoding process, two techniques, namely persona-aware attention and
persona-aware bias, are devised to capture and address trait-related
information. Experiments demonstrate that our model is able to address proper
traits in different contexts. Case studies also show interesting results for
this challenging research problem.Comment: Please contact [zhengyinhe1 at 163 dot com] for the PersonalDialog
datase
L vector spaces and L fields
We construct in ZFC an L topological vector space -- a topological vector
space that is an L space -- and an L field -- a topological field that is an L
space. This generalizes results in [5] and [8].Comment: It has been accepted for publication in SCIENCE CHINA Mathematic
MA does not imply
We construct a model in which MA(S)[S] holds and
fails. This shows that MA(S)[S] does not imply and
answers an old question of Larson and Todorcevic in [3]. We also investigate
different strong colorings in models of MA(S)[S]
The Difference of Between Event Marketing and Activity Marketing
In the modern commodity economy society, Event marketing and Activity marketing have long been regarded as golden opportunities in marketing management by companies, therefore a large number of enterprises have fit them into the system of marketing strategy. But in recent years, scholars and companies simply confuse these two concepts with each other. They hold the opinion that event marketing is activity marketing and vice versa, besides the way to call them, no essential difference lying in. It’s hardly realized that the opinion stated above mixed the attributes of these two. This concept is not only unfavourable to the realization of intended marketing goals, but also go ill with the control of different business opportunities in Event marketing and Activity marketing
Out-of-domain Detection for Natural Language Understanding in Dialog Systems
Natural Language Understanding (NLU) is a vital component of dialogue
systems, and its ability to detect Out-of-Domain (OOD) inputs is critical in
practical applications, since the acceptance of the OOD input that is
unsupported by the current system may lead to catastrophic failure. However,
most existing OOD detection methods rely heavily on manually labeled OOD
samples and cannot take full advantage of unlabeled data. This limits the
feasibility of these models in practical applications.
In this paper, we propose a novel model to generate high-quality pseudo OOD
samples that are akin to IN-Domain (IND) input utterances, and thereby improves
the performance of OOD detection. To this end, an autoencoder is trained to map
an input utterance into a latent code. and the codes of IND and OOD samples are
trained to be indistinguishable by utilizing a generative adversarial network.
To provide more supervision signals, an auxiliary classifier is introduced to
regularize the generated OOD samples to have indistinguishable intent labels.
Experiments show that these pseudo OOD samples generated by our model can be
used to effectively improve OOD detection in NLU. Besides, we also demonstrate
that the effectiveness of these pseudo OOD data can be further improved by
efficiently utilizing unlabeled data.Comment: Accepted by TALS
Abstracts: Challenges of Comprehensive Disaster Assistance for Foreigners Based on Recent Disaster Experience
departmental bulletin pape
- …