239,393 research outputs found
Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework
Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms
Solving DCOPs with Distributed Large Neighborhood Search
The field of Distributed Constraint Optimization has gained momentum in
recent years, thanks to its ability to address various applications related to
multi-agent cooperation. Nevertheless, solving Distributed Constraint
Optimization Problems (DCOPs) optimally is NP-hard. Therefore, in large-scale,
complex applications, incomplete DCOP algorithms are necessary. Current
incomplete DCOP algorithms suffer of one or more of the following limitations:
they (a) find local minima without providing quality guarantees; (b) provide
loose quality assessment; or (c) are unable to benefit from the structure of
the problem, such as domain-dependent knowledge and hard constraints.
Therefore, capitalizing on strategies from the centralized constraint solving
community, we propose a Distributed Large Neighborhood Search (D-LNS) framework
to solve DCOPs. The proposed framework (with its novel repair phase) provides
guarantees on solution quality, refining upper and lower bounds during the
iterative process, and can exploit domain-dependent structures. Our
experimental results show that D-LNS outperforms other incomplete DCOP
algorithms on both structured and unstructured problem instances
Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data
We propose an efficient family of algorithms to learn the parameters of a
Bayesian network from incomplete data. In contrast to textbook approaches such
as EM and the gradient method, our approach is non-iterative, yields closed
form parameter estimates, and eliminates the need for inference in a Bayesian
network. Our approach provides consistent parameter estimates for missing data
problems that are MCAR, MAR, and in some cases, MNAR. Empirically, our approach
is orders of magnitude faster than EM (as our approach requires no inference).
Given sufficient data, we learn parameters that can be orders of magnitude more
accurate
New Algorithms for -Estimation of Multivariate Scatter and Location
We present new algorithms for -estimators of multivariate scatter and
location and for symmetrized -estimators of multivariate scatter. The new
algorithms are considerably faster than currently used fixed-point and related
algorithms. The main idea is to utilize a second order Taylor expansion of the
target functional and to devise a partial Newton-Raphson procedure. In
connection with symmetrized -estimators we work with incomplete
-statistics to accelerate our procedures initially
Link-Prediction Enhanced Consensus Clustering for Complex Networks
Many real networks that are inferred or collected from data are incomplete
due to missing edges. Missing edges can be inherent to the dataset (Facebook
friend links will never be complete) or the result of sampling (one may only
have access to a portion of the data). The consequence is that downstream
analyses that consume the network will often yield less accurate results than
if the edges were complete. Community detection algorithms, in particular,
often suffer when critical intra-community edges are missing. We propose a
novel consensus clustering algorithm to enhance community detection on
incomplete networks. Our framework utilizes existing community detection
algorithms that process networks imputed by our link prediction based
algorithm. The framework then merges their multiple outputs into a final
consensus output. On average our method boosts performance of existing
algorithms by 7% on artificial data and 17% on ego networks collected from
Facebook
Module extraction via query inseparability in OWL 2 QL
We show that deciding conjunctive query inseparability for OWL 2 QL ontologies is PSpace-hard and in ExpTime. We give polynomial-time (incomplete) algorithms and demonstrate by experiments that they can be used for practical module extraction
- …