35,446 research outputs found

    Impact of Digital Video Analytics on Accuracy of Chemobehavioural Phenotyping in Aquatic Toxicology

    Get PDF
    [Abstract] Chemobehavioural phenotypic analysis using small aquatic model organisms is becoming an important toolbox in aquatic ecotoxicology and neuroactive drug discovery. The analysis of the organisms’ behavior is usually performed by combining digital video recording with animal tracking software. This software detects the organisms in the video frames, and reconstructs their movement trajectory using image processing algorithms. In this work we investigated the impact of video file characteristics, video optimization techniques and differences in animal tracking algorithms on the accuracy of quantitative neurobehavioural endpoints. We employed larval stages of a free-swimming euryhaline crustacean Artemia franciscana,commonly used for marine ecotoxicity testing, as a proxy modelto assess the effects of video analytics on quantitative behavioural parameters. We evaluated parameters such as data processing speed, tracking precision, capability to perform high-throughput batch processing of video files. Using a model toxicant the software algorithms were also finally benchmarked against one another. Our data indicates that variability in video file parameters; such as resolution, frame rate, file containers types, codecs and compression levels, can be a source of experimental biases in behavioural analysis. Similarly, the variability in data outputs between different tracking algorithms should be taken into account when designing standardized behavioral experiments and conducting chemobehavioural phenotyping

    Statistics in the Big Data era

    Get PDF
    It is estimated that about 90% of the currently available data have been produced over the last two years. Of these, only 0.5% is effectively analysed and used. However, this data can be a great wealth, the oil of 21st century, when analysed with the right approach. In this article, we illustrate some specificities of these data and the great interest that they can represent in many fields. Then we consider some challenges to statistical analysis that emerge from their analysis, suggesting some strategies

    Is One Hyperparameter Optimizer Enough?

    Full text link
    Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics. To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be `best' and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50\% cases, was no better than using default configurations. We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1

    Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

    Full text link
    We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance
    • 

    corecore