Search CORE

621 research outputs found

Efficient Processing of k Nearest Neighbor Joins using MapReduce

Author: Chen Su
Lu Wei
Ooi Beng Chin
Shen Yanyan
Publication venue
Publication date: 01/01/2012
Field of study

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarBank@NUS

Maximal Quantum Fisher Information in a Mach-Zehnder Interferometer without initial parity

Author: Liu Jing
Shao Yanyan
Shen Luyi
Wang Xiaoguang
Yu Xu
Zhao Xiang
Publication venue: 'The Optical Society'
Publication date: 01/01/2018
Field of study

Mach-Zehnder interferometer is a common device in quantum phase estimation and the photon losses in it are an important issue for achieving a high phase accuracy. Here we thoroughly discuss the precision limit of the phase in the Mach-Zehnder interferometer with a coherent state and a superposition of coherent states as input states. By providing a general analytical expression of quantum Fisher information, the phase-matching condition and optimal initial parity are given. Especially, in the photon loss scenario, the sensitivity behaviors are analyzed and specific strategies are provided to restore the phase accuracies for symmetric and asymmetric losses.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

Crossref

HAL-Polytechnique

Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions

Author: Cheng Weiyu
Huang Linpeng
Shen Yanyan
Publication venue
Publication date: 03/04/2020
Field of study

Various factorization-based methods have been proposed to leverage second-order, or higher-order cross features for boosting the performance of predictive models. They generally enumerate all the cross features under a predefined maximum order, and then identify useful feature interactions through model training, which suffer from two drawbacks. First, they have to make a trade-off between the expressiveness of higher-order cross features and the computational cost, resulting in suboptimal predictions. Second, enumerating all the cross features, including irrelevant ones, may introduce noisy feature combinations that degrade model performance. In this work, we propose the Adaptive Factorization Network (AFN), a new model that learns arbitrary-order cross features adaptively from data. The core of AFN is a logarithmic transformation layer to convert the power of each feature in a feature combination into the coefficient to be learned. The experimental results on four real datasets demonstrate the superior predictive performance of AFN against the start-of-the-arts.Comment: Accepted by AAAI'2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

COMPLEX QUERY PROCESSING AND RECOVERY IN DISTRIBUTED SYSTEMS

Author: SHEN YANYAN
Publication venue
Publication date: 13/05/2015
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Microkernel mechanisms for improving the trustworthiness of commodity hardware

Author: Shen Yanyan
Publication venue: UNSW, Sydney
Publication date: 01/01/2019
Field of study

The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthiness of computer systems based on commercial off-the-shelf (COTS) hardware that can malfunction when the hardware is impacted by transient hardware faults. The hardware anomalies, if undetected, can cause data corruptions, system crashes, and security vulnerabilities, significantly undermining system dependability. Specifically, we adopt the single event upset (SEU) fault model and address transient CPU or memory faults. We take advantage of the functional correctness and isolation guarantee provided by the formally verified seL4 microkernel and hardware redundancy provided by multicore processors, design the redundant co-execution (RCoE) architecture that replicates a whole software system (including the microkernel) onto different CPU cores, and implement two variants, loosely-coupled redundant co-execution (LC-RCoE) and closely-coupled redundant co-execution (CC-RCoE), for the ARM and x86 architectures. RCoE treats each replica of the software system as a state machine and ensures that the replicas start from the same initial state, observe consistent inputs, perform equivalent state transitions, and thus produce consistent outputs during error-free executions. Compared with other software-based error detection approaches, the distinguishing feature of RCoE is that the microkernel and device drivers are also included in redundant co-execution, significantly extending the sphere of replication (SoR). Based on RCoE, we introduce two kernel mechanisms, fingerprint validation and kernel barrier timeout, detecting fault-induced execution divergences between the replicated systems, with the flexibility of tuning the error detection latency and coverage. The kernel error-masking mechanisms built on RCoE enable downgrading from triple modular redundancy (TMR) to dual modular redundancy (DMR) without service interruption. We run synthetic benchmarks and system benchmarks to evaluate the performance overhead of the approach, observe that the overhead varies based on the characteristics of workloads and the variants (LC-RCoE or CC-RCoE), and conclude that the approach is applicable for real-world applications. The effectiveness of the error detection mechanisms is assessed by conducting fault injection campaigns on real hardware, and the results demonstrate compelling improvement

UNSWorks