83 research outputs found
Quasi-SLCA based Keyword Query Processing over Probabilistic XML Data
The probabilistic threshold query is one of the most common queries in
uncertain databases, where a result satisfying the query must be also with
probability meeting the threshold requirement. In this paper, we investigate
probabilistic threshold keyword queries (PrTKQ) over XML data, which is not
studied before. We first introduce the notion of quasi-SLCA and use it to
represent results for a PrTKQ with the consideration of possible world
semantics. Then we design a probabilistic inverted (PI) index that can be used
to quickly return the qualified answers and filter out the unqualified ones
based on our proposed lower/upper bounds. After that, we propose two efficient
and comparable algorithms: Baseline Algorithm and PI index-based Algorithm. To
accelerate the performance of algorithms, we also utilize probability density
function. An empirical study using real and synthetic data sets has verified
the effectiveness and the efficiency of our approaches
Efficient Truss Maintenance in Evolving Networks
Truss was proposed to study social network data represented by graphs. A
k-truss of a graph is a cohesive subgraph, in which each edge is contained in
at least k-2 triangles within the subgraph. While truss has been demonstrated
as superior to model the close relationship in social networks and efficient
algorithms for finding trusses have been extensively studied, very little
attention has been paid to truss maintenance. However, most social networks are
evolving networks. It may be infeasible to recompute trusses from scratch from
time to time in order to find the up-to-date -trusses in the evolving
networks. In this paper, we discuss how to maintain trusses in a graph with
dynamic updates. We first discuss a set of properties on maintaining trusses,
then propose algorithms on maintaining trusses on edge deletions and
insertions, finally, we discuss truss index maintenance. We test the proposed
techniques on real datasets. The experiment results show the promise of our
work
CHIEF : clustering With higher-order motifs in big networks
Clustering network vertices is an enabler of various applications such as social computing and Internet of Things. However, challenges arise for clustering when networks increase in scale. This paper proposes CHIEF (Clustering with HIgher-ordEr motiFs), a solution which consists of two motif clustering techniques: standard acceleration CHIEF-ST and approximate acceleration CHIEF-AP. Both algorithms firstly find the maximal -edge-connected subgraphs within the target networks to lower the network scale by optimizing the network structure with maximal -edge-connected subgraphs, and then use heterogeneous four-node motifs clustering in higher-order dense networks. For CHIEF-ST, we illustrate that all target motifs will be kept after this procedure when the minimum node degree of the target motif is equal or greater than . For CHIEF-AP, we prove that the eigenvalues of the adjacency matrix and the Laplacian matrix are relatively stable after this step. CHIEF offers an improved efficiency of motif clustering for big networks, and it verifies higher-order motif significance. Experiments on real and synthetic networks demonstrate that the proposed solutions outperform baseline approaches in large network analysis, and higher-order motifs outperform traditional triangle motifs in clustering. © 2022 IEEE Computer Society. All rights reserved
Multi-tissue integrative analysis of personal epigenomes
Evaluating the impact of genetic variants on transcriptional regulation is a central goal in biological science that has been constrained by reliance on a single reference genome. To address this, we constructed phased, diploid genomes for four cadaveric donors (using long-read sequencing) and systematically charted noncoding regulatory elements and transcriptional activity across more than 25 tissues from these donors. Integrative analysis revealed over a million variants with allele-specific activity, coordinated, locus-scale allelic imbalances, and structural variants impacting proximal chromatin structure. We relate the personal genome analysis to the ENCODE encyclopedia, annotating allele- and tissue-specific elements that are strongly enriched for variants impacting expression and disease phenotypes. These experimental and statistical approaches, and the corresponding EN-TEx resource, provide a framework for personalized functional genomics
Context-based diversification for keyword queries over XML data
While keyword query empowers ordinary users to search vast amount of data, the ambiguity of keyword query makes it difficult to effectively answer keyword queries, especially for short and vague keyword queries. To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. Given a short and vague keyword query and XML data to be searched, we first derive keyword search candidates of the query by a simple feature selection model. And then, we design an effective XML keyword search diversification model to measure the quality of each candidate. After that, two efficient algorithms are proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Two selection criteria are targeted: the k selected query candidates are most relevant to the given query while they have to cover maximal number of distinct results. At last, a comprehensive evaluation on real and synthetic data sets demonstrates the effectiveness of our proposed diversification model and the efficiency of our algorithms
- …