23,177 research outputs found
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling
The lattice thermal conductivity ({\kappa}{\omega}) is a key property for
many potential applications of compounds. Discovery of materials with very low
or high {\kappa}{\omega} remains an experimental challenge due to high costs
and time-consuming synthesis procedures. High-throughput computational
pre-screening is a valuable approach for significantly reducing the set of
candidate compounds. In this article, we introduce efficient methods for
reliably estimating the bulk {\kappa}{\omega} for a large number of compounds.
The algorithms are based on a combination of machine-learning algorithms,
physical insights, and automatic ab-initio calculations. We scanned
approximately 79,000 half-Heusler entries in the AFLOWLIB.org database. Among
the 450 mechanically stable ordered semiconductors identified, we find that
{\kappa}{\omega} spans more than two orders of magnitude- a much larger range
than that previously thought. {\kappa}{\omega} is lowest for compounds whose
elements in equivalent positions have large atomic radii. We then perform a
thorough screening of thermodynamical stability that allows to reduce the list
to 77 systems. We can then provide a quantitative estimate of {\kappa}{\omega}
for this selected range of systems. Three semiconductors having
{\kappa}{\omega} < 5 W /(m K) are proposed for further experimental study.Comment: 9 pages, 4 figure
Recommended from our members
Statistical Workflow for Feature Selection in Human Metabolomics Data.
High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity underlying human health and disease. Large-scale metabolomics data sources, generated using either targeted or nontargeted platforms, are becoming more common. Appropriate statistical analysis of these complex high-dimensional data will be critical for extracting meaningful results from such large-scale human metabolomics studies. Therefore, we consider the statistical analytical approaches that have been employed in prior human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we offer a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection. We discuss the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and offer guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow. Certain pervasive analytical challenges facing the field warrant ongoing focused research. Addressing these challenges, particularly those related to analyzing human metabolomics data, will allow for more standardization of as well as advances in how research in the field is practiced. In turn, such major analytical advances will lead to substantial improvements in the overall contributions of human metabolomics investigations
Machine learning for automatic prediction of the quality of electrophysiological recordings
The quality of electrophysiological recordings varies a lot due to technical and biological variability and neuroscientists inevitably have to select “good” recordings for further analyses. This procedure is time-consuming and prone to selection biases. Here, we investigate replacing human decisions by a machine learning approach. We define 16 features, such as spike height and width, select the most informative ones using a wrapper method and train a classifier to reproduce the judgement of one of our expert electrophysiologists. Generalisation performance is then assessed on unseen data, classified by the same or by another expert. We observe that the learning machine can be equally, if not more, consistent in its judgements as individual experts amongst each other. Best performance is achieved for a limited number of informative features; the optimal feature set being different from one data set to another. With 80–90% of correct judgements, the performance of the system is very promising within the data sets of each expert but judgments are less reliable when it is used across sets of recordings from different experts. We conclude that the proposed approach is relevant to the selection of electrophysiological recordings, provided parameters are adjusted to different types of experiments and to individual experimenters
- …