16,930 research outputs found
Recommended from our members
Predicting performance of non-contiguous I/O with machine learning
Data sieving in ROMIO promises to optimize individual non-contiguous I/O. However, making the right choice and parameterizing its buffer size accordingly are non-trivial tasks, since predicting the resulting performance is difficult. Since many performance factors are not taken into account by data sieving, extracting the optimal performance for a given access pattern and system is often not possible. Additionally, in Lustre, settings such as the stripe size and number of servers are tunable, yet again, identifying rules for the data-centre proves challenging indeed.
In this paper, we (1) discuss limitations of data sieving, (2) apply machine learning techniques to build a performance predictor, and (3) learn and extract best practices for the settings from the data. We used decision trees as these models can capture non-linear behavior, are easy to understand and allow for extraction of the rules used. Even though this initial research is based on decision trees, with sparse training data, the algorithm can predict many cases sufficiently. Compared to a standard setting, the decision trees created are able to improve performance significantly and we can derive expert knowledge by extracting rules from the learned tree. Applying the scheme to a set of experimental data improved the average throughput by 25–50 % of the best parametrization’s gain. Additionally, we demonstrate the versatility of this approach by applying it to the porting system of DKRZ’s next generation supercomputer and discuss achievable performance gains
Identifying Interaction Sites in "Recalcitrant" Proteins: Predicted Protein and Rna Binding Sites in Rev Proteins of Hiv-1 and Eiav Agree with Experimental Data
Protein-protein and protein nucleic acid interactions are vitally important
for a wide range of biological processes, including regulation of gene
expression, protein synthesis, and replication and assembly of many viruses. We
have developed machine learning approaches for predicting which amino acids of
a protein participate in its interactions with other proteins and/or nucleic
acids, using only the protein sequence as input. In this paper, we describe an
application of classifiers trained on datasets of well-characterized
protein-protein and protein-RNA complexes for which experimental structures are
available. We apply these classifiers to the problem of predicting protein and
RNA binding sites in the sequence of a clinically important protein for which
the structure is not known: the regulatory protein Rev, essential for the
replication of HIV-1 and other lentiviruses. We compare our predictions with
published biochemical, genetic and partial structural information for HIV-1 and
EIAV Rev and with our own published experimental mapping of RNA binding sites
in EIAV Rev. The predicted and experimentally determined binding sites are in
very good agreement. The ability to predict reliably the residues of a protein
that directly contribute to specific binding events - without the requirement
for structural information regarding either the protein or complexes in which
it participates - can potentially generate new disease intervention strategies.Comment: Pacific Symposium on Biocomputing, Hawaii, In press, Accepted, 200
A comparison of statistical and machine learning methods for creating national daily maps of ambient PM concentration
A typical problem in air pollution epidemiology is exposure assessment for
individuals for which health data are available. Due to the sparsity of
monitoring sites and the limited temporal frequency with which measurements of
air pollutants concentrations are collected (for most pollutants, once every 3
or 6 days), epidemiologists have been moving away from characterizing ambient
air pollution exposure solely using measurements. In the last few years,
substantial research efforts have been placed in developing statistical methods
or machine learning techniques to generate estimates of air pollution at finer
spatial and temporal scales (daily, usually) with complete coverage. Some of
these methods include: geostatistical techniques, such as kriging; spatial
statistical models that use the information contained in air quality model
outputs (statistical downscaling models); linear regression modeling approaches
that leverage the information in GIS covariates (land use regression); or
machine learning methods that mine the information contained in relevant
variables (neural network and deep learning approaches). Although some of these
exposure modeling approaches have been used in several air pollution
epidemiological studies, it is not clear how much the predicted exposures
generated by these methods differ, and which method generates more reliable
estimates. In this paper, we aim to address this gap by evaluating a variety of
exposure modeling approaches, comparing their predictive performance and
computational difficulty. Using PM in year 2011 over the continental
U.S. as case study, we examine the methods' performances across seasons, rural
vs urban settings, and levels of PM concentrations (low, medium, high)
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
In this work, we propose a novel robot learning framework called Neural Task
Programming (NTP), which bridges the idea of few-shot learning from
demonstration and neural program induction. NTP takes as input a task
specification (e.g., video demonstration of a task) and recursively decomposes
it into finer sub-task specifications. These specifications are fed to a
hierarchical neural program, where bottom-level programs are callable
subroutines that interact with the environment. We validate our method in three
robot manipulation tasks. NTP achieves strong generalization across sequential
tasks that exhibit hierarchal and compositional structures. The experimental
results show that NTP learns to generalize well to- wards unseen tasks with
increasing lengths, variable topologies, and changing objectives.Comment: ICRA 201
- …