488 research outputs found
Robust Principal Component Analysis-based Prediction of Protein-Protein Interaction Hot spots ( {RBHS} )
Proteins often exert their function by binding to other cellular partners. The hot spots are key residues for protein-protein binding. Their identification may shed light on the impact of disease associated mutations on protein complexes and help design protein-protein interaction inhibitors for therapy. Unfortunately, current machine learning methods to predict hot spots, suffer from limitations caused by gross errors in the data matrices. Here, we present a novel data pre-processing pipeline that overcomes this problem by recovering a low rank matrix with reduced noise using Robust Principal Component Analysis. Application to existing databases shows the predictive power of the method
Robust principal component analysis-based prediction of protein-protein interaction hot spots.
AbstractProteins often exert their function by binding to other cellular partners. The hot spots are key residues for proteinâprotein binding. Their identification may shed light on the impact of disease associated mutations on protein complexes and help design proteinâprotein interaction inhibitors for therapy. Unfortunately, current machine learning methods to predict hot spots, suffer from limitations caused by gross errors in the data matrices. Here, we present a novel data preâprocessing pipeline that overcomes this problem by recovering a low rank matrix with reduced noise using Robust Principal Component Analysis. Application to existing databases shows the predictive power of the method
A transferable machine-learning framework linking interstice distribution and plastic heterogeneity in metallic glasses
When metallic glasses (MGs) are subjected to mechanical loads, the plastic
response of atoms is non-uniform. However, the extent and manner in which
atomic environment signatures present in the undeformed structure determine
this plastic heterogeneity remain elusive. Here, we demonstrate that novel site
environment features that characterize interstice distributions around atoms
combined with machine learning (ML) can reliably identify plastic sites in
several Cu-Zr compositions. Using only quenched structural information as
input, the ML-based plastic probability estimates ("quench-in softness" metric)
can identify plastic sites that could activate at high strains, losing
predictive power only upon the formation of shear bands. Moreover, we reveal
that a quench-in softness model trained on a single composition and quenching
rate substantially improves upon previous models in generalizing to different
compositions and completely different MG systems (Ni62Nb38, Al90Sm10 and
Fe80P20). Our work presents a general, data-centric framework that could
potentially be used to address the structural origin of any site-specific
property in MGs
Recommended from our members
A CFD-informed model for subchannel resolution crud prediction
A physics-directed, statistically based, surrogate model of the small scale flow fea-
tures that impact Chalk River unidentified deposit (crud) growth is presented in this work. The objective of the surrogate is to provide additional details of the rod surface
temperature, heat flux, and near-wall turbulent kinetic energy fields which cannot be
explicitly captured by a subchannel code.
Operating as a mapping from the high fidelity computational fluid dynamics (CFD) data to the low fidelity subchannel grid (hi2lo), the model provides CFD-informed bound-
ary conditions to the crud model executed on the subchannel pin surface mesh. The
surface temperature, heat flux, and turbulent kinetic energy, henceforth referred to as
the fields of interest (FOI), govern the growth rate of crud on the surface of the rod and
the precipitation of boron in the porous crud layer. Therefore the model predicts the
behavior of the FOIs as a function of position in the core and local thermal-hydraulic
(TH) conditions.
The subchannel code produces an estimate for all crud-relevant TH quantities at a
coarse spatial resolution everywhere in the core and executes substantially faster than
CFD. In the hi2lo approach, the solution provided by the subchannel code is augmented
by a predicted stochastic component of the FOI informed by CFD results to provide a
more detailed description of the target FOIs than subchannel can provide alone. To this
end, a novel method based on the marriage of copula and gradient boosting techniques is proposed. This methodology forgoes a spatial interpolation procedure for a statistically
driven approach, which predicts the fractional area of a rodâs surface in excess of some
critical temperature but not precisely where such maxima occur on the rod surface. The
resultant model retains the ability to account for the presence of hot and cold spots on the
rod surface induced by turbulent flow downstream of spacer grids when producing crud
estimates. Sklarâs theorem is leveraged to decompose multivariate probability densities
of the FOI into independent copula and marginal models. The free parameters within the
copula model are predicted using a combination of supervised regression and classification
machine learning techniques with training data sets supplied by a suite of precomputed
CFD results spanning a typical pressurized water reactor TH envelope.
Results show that compared to the subchannel standalone case, the hi2lo method
more accurately preserves the influence of spacer grids on the crud growth rate. Or more
precisely, the hi2lo method recovers key statistical properties of the FOI which impact
crud growth. Compared to gold standard high fidelity CFD/crud coupled results in a
single assembly test case, the hi2lo model produced a relative total crud mass difference
of -8.9% compared to the standalone subchannel relative crud mass difference of 192.1%.Mechanical Engineerin
Rapidly predicting KohnâSham total energy using data-centric AI
Predicting material properties by solving the Kohn-Sham (KS) equation, which is the basis of modern computational approaches to electronic structures, has provided significant improvements in materials sciences. Despite its contributions, both DFT and DFTB calculations are limited by the number of electrons and atoms that translate into increasingly longer run-times. In this work we introduce a novel, data-centric machine learning framework that is used to rapidly and accurately predicate the KS total energy of anatase TiO 2 nanoparticles (NPs) at different temperatures using only a small amount of theoretical data. The proposed framework that we call co-modeling eliminates the need for experimental data and is general enough to be used over any NPs to determine electronic structure and, consequently, more efficiently study physical and chemical properties. We include a web service to demonstrate the effectiveness of our approach. © 2022, The Author(s)
The spatial dynamics of invasive para grass on a monsoonal floodplain, Kakadu National Park, northern Australia
Abstract: African para grass (Urochloa mutica) is an invasive weed that has become prevalent across many important freshwater wetlands of the world. In northern Australia, including the World Heritage landscape of Kakadu National Park (KNP), its dense cover can displace ecologically, genetically and culturally significant species, such as the Australian native rice (Oryza spp.). In regions under management for biodiversity conservation para grass is often beyond eradication. However, its targeted control is also necessary to manage and preserve site-specific wetland values. This requires an understanding of para grass spread-patterns and its potential impacts on valuable native vegetation. We apply a multi-scale approach to examine the spatial dynamics and impact of para grass cover across a 181 km2 floodplain of KNP. First, we measure the overall displacement of different native vegetation communities across the floodplain from 1986 to 2006. Using high spatial resolution satellite imagery in conjunction with historical aerial-photo mapping, we then measure finer-scale, inter-annual, changes between successive dry seasons from 1990 to 2010 (for a 48 km2 focus area); Para grass presence-absence maps from satellite imagery (2002 to 2010) were produced with an object-based machine-learning approach (stochastic gradient boosting). Changes, over time, in mapped para grass areas were then related to maps of depth-habitat and inter-annual fire histories. Para grass invasion and establishment patterns varied greatly in time and space. Wild rice communities were the most frequently invaded, but the establishment and persistence of para grass fluctuated greatly between years, even within previously invaded communities. However, these different patterns were also shown to vary with different depth-habitat and recent fire history. These dynamics have not been previously documented and this understanding presents opportunities for intensive para grass management in areas of high conservation value, such as those occupied by wild rice
- âŠ