256 research outputs found
Recommended from our members
Study on Evaluation and Influence Factors of Floating Population Social Integration in Changsha-Zhuzhou-Xiangtan
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
A hybrid Bayesian hierarchical model combining cohort and case–control studies for meta-analysis of diagnostic tests: Accounting for partial verification bias
To account for between-study heterogeneity in meta-analysis of diagnostic accuracy studies, bivariate random effects models have been recommended to jointly model the sensitivities and specificities. As study design and population vary, the definition of disease status or severity could differ across studies. Consequently, sensitivity and specificity may be correlated with disease prevalence. To account for this dependence, a trivariate random effects model had been proposed. However, the proposed approach can only include cohort studies with information estimating study-specific disease prevalence. In addition, some diagnostic accuracy studies only select a subset of samples to be verified by the reference test. It is known that ignoring unverified subjects may lead to partial verification bias in the estimation of prevalence, sensitivities and specificities in a single study. However, the impact of this bias on a meta-analysis has not been investigated. In this paper, we propose a novel hybrid Bayesian hierarchical model combining cohort and case-control studies and correcting partial verification bias at the same time. We investigate the performance of the proposed methods through a set of simulation studies. Two case studies on assessing the diagnostic accuracy of gadolinium-enhanced magnetic resonance imaging in detecting lymph node metastases and of adrenal fluorine-18 fluorodeoxyglucose positron emission tomography in characterizing adrenal masses are presented
MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control
Personalized dialogue systems aim to endow the chatbot agent with more
anthropomorphic traits for human-like interactions. Previous approaches have
explored explicitly user profile modeling using text descriptions, implicit
derivation of user embeddings, or utilizing handicraft prompts for ChatGPT-like
models. However, textual personas are limited in describing multi-faceted
attributes (\emph{e.g.}, \emph{language style, inner character nuances}),
implicit embedding suffers from personality sparsity, and handicraft prompts
lack fine-grained and stable controllability. Hence, these approaches may
struggle with complex personalized dialogue generation tasks that require
generating controllable responses with multiple personal attributes. To this
end, we propose \textbf{\textsc{Miracle}}, a novel personalized dialogue
generation method through \textbf{M}ult\textbf{I}ple Pe\textbf{R}sonal
\textbf{A}ttributes \textbf{C}ontrol within \textbf{L}atent-Space
\textbf{E}nergy-based Models. ttributes \textbf{C}ontrol within
\textbf{L}atent-Space \textbf{E}nergy-based Models. Specifically, our approach
first disentangles complex personality into multi-faceted attributes.
Subsequently, we employ a conditional variational auto-encoder to align with
the dense personalized responses within a latent joint attribute space. We have
also tailored a dedicated energy function and customized the ordinary
differential equations sampling method to offer flexible attribute composition
and precise attribute control. Extensive experiments demonstrate that
\textsc{Miracle} outperforms several strong baselines in terms of personality
controllability and response generation quality. Our dataset and code are
available at \url{https://github.com/LZY-the-boys/MIRACLE}Comment: Accepted by EMNLP2023 finding
Efficient Document-level Event Extraction via Pseudo-Trigger-aware Pruned Complete Graph
Most previous studies of document-level event extraction mainly focus on
building argument chains in an autoregressive way, which achieves a certain
success but is inefficient in both training and inference. In contrast to the
previous studies, we propose a fast and lightweight model named as PTPCG. In
our model, we design a novel strategy for event argument combination together
with a non-autoregressive decoding algorithm via pruned complete graphs, which
are constructed under the guidance of the automatically selected pseudo
triggers. Compared to the previous systems, our system achieves competitive
results with 19.8\% of parameters and much lower resource consumption, taking
only 3.8\% GPU hours for training and up to 8.5 times faster for inference.
Besides, our model shows superior compatibility for the datasets with (or
without) triggers and the pseudo triggers can be the supplements for annotated
triggers to make further improvements. Codes are available at
https://github.com/Spico197/DocEE .Comment: Accepted to IJCAI'202
- …