7,459 research outputs found
An Iterative Scheme for Leverage-based Approximate Aggregation
The current data explosion poses great challenges to the approximate
aggregation with an efficiency and accuracy. To address this problem, we
propose a novel approach to calculate the aggregation answers with a high
accuracy using only a small portion of the data. We introduce leverages to
reflect individual differences in the samples from a statistical perspective.
Two kinds of estimators, the leverage-based estimator, and the sketch estimator
(a "rough picture" of the aggregation answer), are in constraint relations and
iteratively improved according to the actual conditions until their difference
is below a threshold. Due to the iteration mechanism and the leverages, our
approach achieves a high accuracy. Moreover, some features, such as not
requiring recording the sampled data and easy to extend to various execution
modes (e.g., the online mode), make our approach well suited to deal with big
data. Experiments show that our approach has an extraordinary performance, and
when compared with the uniform sampling, our approach can achieve high-quality
answers with only 1/3 of the same sample size.Comment: 17 pages, 9 figure
Improving Performance of Iterative Methods by Lossy Checkponting
Iterative methods are commonly used approaches to solve large, sparse linear
systems, which are fundamental operations for many modern scientific
simulations. When the large-scale iterative methods are running with a large
number of ranks in parallel, they have to checkpoint the dynamic variables
periodically in case of unavoidable fail-stop errors, requiring fast I/O
systems and large storage space. To this end, significantly reducing the
checkpointing overhead is critical to improving the overall performance of
iterative methods. Our contribution is fourfold. (1) We propose a novel lossy
checkpointing scheme that can significantly improve the checkpointing
performance of iterative methods by leveraging lossy compressors. (2) We
formulate a lossy checkpointing performance model and derive theoretically an
upper bound for the extra number of iterations caused by the distortion of data
in lossy checkpoints, in order to guarantee the performance improvement under
the lossy checkpointing scheme. (3) We analyze the impact of lossy
checkpointing (i.e., extra number of iterations caused by lossy checkpointing
files) for multiple types of iterative methods. (4)We evaluate the lossy
checkpointing scheme with optimal checkpointing intervals on a high-performance
computing environment with 2,048 cores, using a well-known scientific
computation package PETSc and a state-of-the-art checkpoint/restart toolkit.
Experiments show that our optimized lossy checkpointing scheme can
significantly reduce the fault tolerance overhead for iterative methods by
23%~70% compared with traditional checkpointing and 20%~58% compared with
lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1
Weakly supervised segment annotation via expectation kernel density estimation
Since the labelling for the positive images/videos is ambiguous in weakly
supervised segment annotation, negative mining based methods that only use the
intra-class information emerge. In these methods, negative instances are
utilized to penalize unknown instances to rank their likelihood of being an
object, which can be considered as a voting in terms of similarity. However,
these methods 1) ignore the information contained in positive bags, 2) only
rank the likelihood but cannot generate an explicit decision function. In this
paper, we propose a voting scheme involving not only the definite negative
instances but also the ambiguous positive instances to make use of the extra
useful information in the weakly labelled positive bags. In the scheme, each
instance votes for its label with a magnitude arising from the similarity, and
the ambiguous positive instances are assigned soft labels that are iteratively
updated during the voting. It overcomes the limitations of voting using only
the negative bags. We also propose an expectation kernel density estimation
(eKDE) algorithm to gain further insight into the voting mechanism.
Experimental results demonstrate the superiority of our scheme beyond the
baselines.Comment: 9 pages, 2 figure
Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems
Crowdsourcing systems commonly face the problem of aggregating multiple
judgments provided by potentially unreliable workers. In addition, several
aspects of the design of efficient crowdsourcing processes, such as defining
worker's bonuses, fair prices and time limits of the tasks, involve knowledge
of the likely duration of the task at hand. Bringing this together, in this
work we introduce a new time--sensitive Bayesian aggregation method that
simultaneously estimates a task's duration and obtains reliable aggregations of
crowdsourced judgments. Our method, called BCCTime, builds on the key insight
that the time taken by a worker to perform a task is an important indicator of
the likely quality of the produced judgment. To capture this, BCCTime uses
latent variables to represent the uncertainty about the workers' completion
time, the tasks' duration and the workers' accuracy. To relate the quality of a
judgment to the time a worker spends on a task, our model assumes that each
task is completed within a latent time window within which all workers with a
propensity to genuinely attempt the labelling task (i.e., no spammers) are
expected to submit their judgments. In contrast, workers with a lower
propensity to valid labeling, such as spammers, bots or lazy labelers, are
assumed to perform tasks considerably faster or slower than the time required
by normal workers. Specifically, we use efficient message-passing Bayesian
inference to learn approximate posterior probabilities of (i) the confusion
matrix of each worker, (ii) the propensity to valid labeling of each worker,
(iii) the unbiased duration of each task and (iv) the true label of each task.
Using two real-world public datasets for entity linking tasks, we show that
BCCTime produces up to 11% more accurate classifications and up to 100% more
informative estimates of a task's duration compared to state-of-the-art
methods
Dynamic Radio Cooperation for Downlink Cloud-RANs with Computing Resource Sharing
A novel dynamic radio-cooperation strategy is proposed for Cloud Radio Access
Networks (C-RANs) consisting of multiple Remote Radio Heads (RRHs) connected to
a central Virtual Base Station (VBS) pool. In particular, the key capabilities
of C-RANs in computing-resource sharing and real-time communication among the
VBSs are leveraged to design a joint dynamic radio clustering and cooperative
beamforming scheme that maximizes the downlink weighted sum-rate system utility
(WSRSU). Due to the combinatorial nature of the radio clustering process and
the non-convexity of the cooperative beamforming design, the underlying
optimization problem is NP-hard, and is extremely difficult to solve for a
large network. Our approach aims for a suboptimal solution by transforming the
original problem into a Mixed-Integer Second-Order Cone Program (MI-SOCP),
which can be solved efficiently using a proposed iterative algorithm. Numerical
simulation results show that our low-complexity algorithm provides
close-to-optimal performance in terms of WSRSU while significantly
outperforming conventional radio clustering and beamforming schemes.
Additionally, the results also demonstrate the significant improvement in
computing-resource utilization of C-RANs over traditional RANs with distributed
computing resources.Comment: 9 pages, 6 figures, accepted to IEEE MASS 201
An algebraic multigrid method for mixed discretizations of the Navier-Stokes equations
Algebraic multigrid (AMG) preconditioners are considered for discretized
systems of partial differential equations (PDEs) where unknowns associated with
different physical quantities are not necessarily co-located at mesh points.
Specifically, we investigate a mixed finite element discretization of
the incompressible Navier-Stokes equations where the number of velocity nodes
is much greater than the number of pressure nodes. Consequently, some velocity
degrees-of-freedom (dofs) are defined at spatial locations where there are no
corresponding pressure dofs. Thus, AMG approaches leveraging this co-located
structure are not applicable. This paper instead proposes an automatic AMG
coarsening that mimics certain pressure/velocity dof relationships of the
discretization. The main idea is to first automatically define coarse
pressures in a somewhat standard AMG fashion and then to carefully (but
automatically) choose coarse velocity unknowns so that the spatial location
relationship between pressure and velocity dofs resembles that on the finest
grid. To define coefficients within the inter-grid transfers, an energy
minimization AMG (EMIN-AMG) is utilized. EMIN-AMG is not tied to specific
coarsening schemes and grid transfer sparsity patterns, and so it is applicable
to the proposed coarsening. Numerical results highlighting solver performance
are given on Stokes and incompressible Navier-Stokes problems.Comment: Submitted to a journa
- …