50,917 research outputs found

    Massively-Parallel Feature Selection for Big Data

    Full text link
    We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of pp-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

    Adaptive control in rollforward recovery for extreme scale multigrid

    Full text link
    With the increasing number of compute components, failures in future exa-scale computer systems are expected to become more frequent. This motivates the study of novel resilience techniques. Here, we extend a recently proposed algorithm-based recovery method for multigrid iterations by introducing an adaptive control. After a fault, the healthy part of the system continues the iterative solution process, while the solution in the faulty domain is re-constructed by an asynchronous on-line recovery. The computations in both the faulty and healthy subdomains must be coordinated in a sensitive way, in particular, both under and over-solving must be avoided. Both of these waste computational resources and will therefore increase the overall time-to-solution. To control the local recovery and guarantee an optimal re-coupling, we introduce a stopping criterion based on a mathematical error estimator. It involves hierarchical weighted sums of residuals within the context of uniformly refined meshes and is well-suited in the context of parallel high-performance computing. The re-coupling process is steered by local contributions of the error estimator. We propose and compare two criteria which differ in their weights. Failure scenarios when solving up to 6.9â‹…10116.9\cdot10^{11} unknowns on more than 245\,766 parallel processes will be reported on a state-of-the-art peta-scale supercomputer demonstrating the robustness of the method

    Learning when to skim and when to read

    Full text link
    Many recent advances in deep learning for natural language processing have come at increasing computational cost, but the power of these state-of-the-art models is not needed for every example in a dataset. We demonstrate two approaches to reducing unnecessary computation in cases where a fast but weak baseline classier and a stronger, slower model are both available. Applying an AUC-based metric to the task of sentiment classification, we find significant efficiency gains with both a probability-threshold method for reducing computational cost and one that uses a secondary decision network.Comment: 8 pages (4 article, 1 references, 3 appendix), 11 figures, 3 tables, published at ACL2017 workshop Repl4NL

    Sampling from Stochastic Finite Automata with Applications to CTC Decoding

    Full text link
    Stochastic finite automata arise naturally in many language and speech processing tasks. They include stochastic acceptors, which represent certain probability distributions over random strings. We consider the problem of efficient sampling: drawing random string variates from the probability distribution represented by stochastic automata and transformations of those. We show that path-sampling is effective and can be efficient if the epsilon-graph of a finite automaton is acyclic. We provide an algorithm that ensures this by conflating epsilon-cycles within strongly connected components. Sampling is also effective in the presence of non-injective transformations of strings. We illustrate this in the context of decoding for Connectionist Temporal Classification (CTC), where the predictive probabilities yield auxiliary sequences which are transformed into shorter labeling strings. We can sample efficiently from the transformed labeling distribution and use this in two different strategies for finding the most probable CTC labeling

    Practical recommendations for gradient-based training of deep architectures

    Full text link
    Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures

    Sacrificing Accuracy for Reduced Computation: Cascaded Inference Based on Softmax Confidence

    Full text link
    We study the tradeoff between computational effort and accuracy in a cascade of deep neural networks. During inference, early termination in the cascade is controlled by confidence levels derived directly from the softmax outputs of intermediate classifiers. The advantage of early termination is that classification is performed using less computation, thus adjusting the computational effort to the complexity of the input. Moreover, dynamic modification of confidence thresholds allow one to trade accuracy for computational effort without requiring retraining. Basing of early termination on softmax classifier outputs is justified by experimentation that demonstrates an almost linear relation between confidence levels in intermediate classifiers and accuracy. Our experimentation with architectures based on ResNet obtained the following results. (i) A speedup of 1.5 that sacrifices 1.4% accuracy with respect to the CIFAR-10 test set. (ii) A speedup of 1.19 that sacrifices 0.7% accuracy with respect to the CIFAR-100 test set. (iii) A speedup of 2.16 that sacrifices 1.4% accuracy with respect to the SVHN test set

    Fast Cross-Validation via Sequential Testing

    Full text link
    With the increasing size of today's data sets, finding the right parameter configuration in model selection via cross-validation can be an extremely time-consuming task. In this paper we propose an improved cross-validation procedure which uses nonparametric testing coupled with sequential analysis to determine the best parameter set on linearly increasing subsets of the data. By eliminating underperforming candidates quickly and keeping promising candidates as long as possible, the method speeds up the computation while preserving the capability of the full cross-validation. Theoretical considerations underline the statistical power of our procedure. The experimental evaluation shows that our method reduces the computation time by a factor of up to 120 compared to a full cross-validation with a negligible impact on the accuracy

    Multiscale sampling model for motion integration

    Full text link
    Biologically plausible strategies for visual scene integration across spatial and temporal domains continues to be a challenging topic. The fundamental question we address is whether classical problems in motion integration, such as the aperture problem, can be solved in a model that samples the visual scene at multiple spatial and temporal scales in parallel. We hypothesize that fast interareal connections that allow feedback of information between cortical layers are the key processes that disambiguate motion direction. We developed a neural model showing how the aperture problem can be solved using different spatial sampling scales between LGN, V1 layer 4, V1 layer 6, and area MT. Our results suggest that multiscale sampling, rather than feedback explicitly, is the key process that gives rise to end-stopped cells in V1 and enables area MT to solve the aperture problem without the need for calculating intersecting constraints or crafting intricate patterns of spatiotemporal receptive fields. Furthermore, the model explains why end-stopped cells no longer emerge in the absence of V1 layer 6 activity (Bolz & Gilbert, 1986), why V1 layer 4 cells are significantly more end-stopped than V1 layer 6 cells (Pack, Livingstone, Duffy, & Born, 2003), and how it is possible to have a solution to the aperture problem in area MT with no solution in V1 in the presence of driving feedback. In summary, while much research in the field focuses on how a laminar architecture can give rise to complicated spatiotemporal receptive fields to solve problems in the motion domain, we show that one can reframe motion integration as an emergent property of multiscale sampling achieved concurrently within lamina and across multiple visual areas.This work was supported in part by CELEST, a National Science Foundation Science of Learning Center; NSF SBE-0354378 and OMA-0835976; ONR (N00014-11-1-0535); and AFOSR (FA9550-12-1-0436). (CELEST, a National Science Foundation Science of Learning Center; SBE-0354378 - NSF; OMA-0835976 - NSF; N00014-11-1-0535 - ONR; FA9550-12-1-0436 - AFOSR)Published versio
    • …
    corecore