152 research outputs found
ASAP : towards accurate, stable and accelerative penetrating-rank estimation on large graphs
Pervasive web applications increasingly require a measure of similarity among objects. Penetrating-Rank (P-Rank) has been one of the promising link-based similarity metrics as it provides a comprehensive way of jointly encoding both incoming and outgoing links into computation for emerging applications. In this paper, we investigate P-Rank efficiency problem that encompasses its accuracy, stability and computational time. (1) We provide an accuracy estimate for iteratively computing P-Rank. A symmetric problem is to find the iteration number K needed for achieving a given accuracy ε. (2) We also analyze the stability of P-Rank, by showing that small choices of the damping factors would make P-Rank more stable and well-conditioned. (3) For undirected graphs, we also explicitly characterize the P-Rank solution in terms of matrices. This results in a novel non-iterative algorithm, termed ASAP , for efficiently computing P-Rank, which improves the CPU time from O(n 4) to O( n 3 ). Using real and synthetic data, we empirically verify the effectiveness and efficiency of our approaches
Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs
SimRank has been considered as one of the promising link-based ranking algorithms to evaluate similarities of web documents in many modern search engines. In this paper, we investigate the optimization problem of SimRank similarity computation on undirected web graphs. We first present a novel algorithm to estimate the SimRank between vertices in O(n3+ Kn2) time, where n is the number of vertices, and K is the number of iterations. In comparison, the most efficient implementation of SimRank algorithm in [1] takes O(K n3 ) time in the worst case. To efficiently handle large-scale computations, we also propose a parallel implementation of the SimRank algorithm on multiple processors. The experimental evaluations on both synthetic and real-life data sets demonstrate the better computational time and parallel efficiency of our proposed techniques
Computational Approaches for Estimating Life Cycle Inventory Data
Data gaps in life cycle inventory (LCI) are stumbling blocks for
investigating the life cycle performance and impact of emerging technologies. It
can be tedious, expensive and time consuming for LCI practitioners to collect LCI
data or to wait for experime
ntal data become available.
I
propose a
computational approach to estimate missing LCI data using link prediction
techniques in network science.
LCI data in E
coinvent 3.1 is used to test the
method.
The proposed
approach is based on the similarities between different
processes or environmental intervention
s in the LCI database. By comparing two
processes’ material inputs and emission outputs,
I
measure the similarity of
these processes.
I
hypothesize that similar
processes tend to have similar
material inputs and emission outputs which are life cycle inventory data
I
want
to estimate. In particular,
I
measure similarity using four metrics, including
average difference, Pearson correlation coefficient,
Euclidean di
stance, and
SimRank with or without data normalization
.
I
test these four metrics
and
normalization method
for their performance of estimating missing LCI data.
The
results show that processes in the same industrial classification have
higher similarities,
which validat
e the
approach of measuring the similarity
between unit processes.
I
remove a small set of data (from one data point to 50)
for each process and then use the rest of LCI data as to train the model for
estimating the removed data.
I
t is found
that approximately 80% of removed
data can be successfully estimated with less than 10% errors. This st
udy is the
first attempt in the
searching for an effective computational method for
estimating missing LCI data.
I
t is
anticipate
d
that
this approach wil
l significantly
transform LCI compilation and LCA studies in future.Master of ScienceNatural Resources and EnvironmentUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/134693/3/Cai_Jiarui_Document.pd
Cross-Lingual Data Quality for Knowledge Base Acceleration across Wikipedia Editions
International audienc
Link Prediction Based on Local Random Walk
The problem of missing link prediction in complex networks has attracted much
attention recently. Two difficulties in link prediction are the sparsity and
huge size of the target networks. Therefore, the design of an efficient and
effective method is of both theoretical interests and practical significance.
In this Letter, we proposed a method based on local random walk, which can give
competitively good prediction or even better prediction than other
random-walk-based methods while has a lower computational complexity.Comment: 6 pages, 2 figure
LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles
Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.
Link Prediction in Complex Networks: A Survey
Link prediction in complex networks has attracted increasing attention from
both physical and computer science communities. The algorithms can be used to
extract missing information, identify spurious interactions, evaluate network
evolving mechanisms, and so on. This article summaries recent progress about
link prediction algorithms, emphasizing on the contributions from physical
perspectives and approaches, such as the random-walk-based methods and the
maximum likelihood methods. We also introduce three typical applications:
reconstruction of networks, evaluation of network evolving mechanism and
classification of partially labelled networks. Finally, we introduce some
applications and outline future challenges of link prediction algorithms.Comment: 44 pages, 5 figure
- …