23,177 research outputs found

    Challenges of Big Data Analysis

    Full text link
    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

    Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling

    Full text link
    The lattice thermal conductivity ({\kappa}{\omega}) is a key property for many potential applications of compounds. Discovery of materials with very low or high {\kappa}{\omega} remains an experimental challenge due to high costs and time-consuming synthesis procedures. High-throughput computational pre-screening is a valuable approach for significantly reducing the set of candidate compounds. In this article, we introduce efficient methods for reliably estimating the bulk {\kappa}{\omega} for a large number of compounds. The algorithms are based on a combination of machine-learning algorithms, physical insights, and automatic ab-initio calculations. We scanned approximately 79,000 half-Heusler entries in the AFLOWLIB.org database. Among the 450 mechanically stable ordered semiconductors identified, we find that {\kappa}{\omega} spans more than two orders of magnitude- a much larger range than that previously thought. {\kappa}{\omega} is lowest for compounds whose elements in equivalent positions have large atomic radii. We then perform a thorough screening of thermodynamical stability that allows to reduce the list to 77 systems. We can then provide a quantitative estimate of {\kappa}{\omega} for this selected range of systems. Three semiconductors having {\kappa}{\omega} < 5 W /(m K) are proposed for further experimental study.Comment: 9 pages, 4 figure

    Machine learning for automatic prediction of the quality of electrophysiological recordings

    Get PDF
    The quality of electrophysiological recordings varies a lot due to technical and biological variability and neuroscientists inevitably have to select “good” recordings for further analyses. This procedure is time-consuming and prone to selection biases. Here, we investigate replacing human decisions by a machine learning approach. We define 16 features, such as spike height and width, select the most informative ones using a wrapper method and train a classifier to reproduce the judgement of one of our expert electrophysiologists. Generalisation performance is then assessed on unseen data, classified by the same or by another expert. We observe that the learning machine can be equally, if not more, consistent in its judgements as individual experts amongst each other. Best performance is achieved for a limited number of informative features; the optimal feature set being different from one data set to another. With 80–90% of correct judgements, the performance of the system is very promising within the data sets of each expert but judgments are less reliable when it is used across sets of recordings from different experts. We conclude that the proposed approach is relevant to the selection of electrophysiological recordings, provided parameters are adjusted to different types of experiments and to individual experimenters
    • …
    corecore