14 research outputs found
A multi-species functional embedding integrating sequence and network structure
A key challenge to transferring knowledge between species is that different species have fundamentally different genetic architectures. Initial computational approaches to transfer knowledge across species have relied on measures of heredity such as genetic homology, but these approaches suffer from limitations. First, only a small subset of genes have homologs, limiting the amount of knowledge that can be transferred, and second, genes change or repurpose functions, complicating the transfer of knowledge. Many approaches address this problem by expanding the notion of homology by leveraging high-throughput genomic and proteomic measurements, such as through network alignment. In this work, we take a new approach to transferring knowledge across species by expanding the notion of homology through explicit measures of functional similarity between proteins in different species. Specifically, our kernel-based method, HANDL (Homology Assessment across Networks using Diffusion and Landmarks), integrates sequence and network structure to create a functional embedding in which proteins from different species are embedded in the same vector space. We show that inner products in this space and the vectors themselves capture functional similarity across species, and are useful for a variety of functional tasks. We perform the first whole-genome method for predicting phenologs, generating many that were previously identified, but also predicting new phenologs supported from the biological literature. We also demonstrate the HANDL embedding captures pairwise gene function, in that gene pairs with synthetic lethal interactions are significantly separated in HANDL space, and the direction of separation is conserved across species. Software for the HANDL algorithm is available at http://bit.ly/lrgr-handl.Published versio
Recommended from our members
Carboplatin-Induced Gene Expression Changes In Vitro are Prognostic of Survival in Epithelial Ovarian Cancer
Background: We performed a time-course microarray experiment to define the transcriptional response to carboplatin in vitro, and to correlate this with clinical outcome in epithelial ovarian cancer (EOC). RNA was isolated from carboplatin and control-treated 36M2 ovarian cancer cells at several time points, followed by oligonucleotide microarray hybridization. Carboplatin induced changes in gene expression were assessed at the single gene as well as at the pathway level. Clinical validation was performed in publicly available microarray datasets using disease free and overall survival endpoints. Results: Time-course and pathway analyses identified 317 genes and 40 pathways (designated time-course and pathway signatures) deregulated following carboplatin exposure. Both types of signatures were validated in two separate platinum-treated ovarian and NSCLC cell lines using published microarray data. Expression of time-course and pathway signature genes distinguished between patients with unfavorable and favorable survival in two independent ovarian cancer datasets. Among the pathways most highly induced by carboplatin in vitro, the NRF2, NF-kB, and cytokine and inflammatory response pathways were also found to be upregulated prior to chemotherapy exposure in poor prognosis tumors. Conclusion: Dynamic assessment of gene expression following carboplatin exposure in vitro can identify both genes and pathways that are correlated with clinical outcome. The functional relevance of this observation for better understanding the mechanisms of drug resistance in EOC will require further evaluation
Functional protein representations from biological networks enable diverse cross-species inference
Partial funding for Open Access provided by the UMD Libraries' Open Access Publishing Fund.Transferring knowledge between species is key for
many biological applications, but is complicated
by divergent and convergent evolution. Many current
approaches for this problem leverage sequence
and interaction network data to transfer knowledge
across species, exemplified by network alignment
methods. While these techniques do well, they are
limited in scope, creating metrics to address one
specific problem or task. We take a different approach
by creating an environment where multiple
knowledge transfer tasks can be performed using
the same protein representations. Specifically, our
kernel-based method, MUNK, integrates sequence
and network structure to create functional protein
representations, embedding proteins from different
species in the same vector space. First we show
proteins in different species that are close in MUNKspace
are functionally similar. Next,we use these representations
to share knowledge of synthetic lethal
interactions between species. Importantly, we find
that the results using MUNK-representations are at
least as accurate as existing algorithms for these
tasks. Finally, we generalize the notion of a phenolog
(āorthologous phenotypeā) to use functionally similar
proteins (i.e. those with similar representations). We
demonstrate the utility of this broadened notion by
using it to identify known phenologs and novel non-obvious
ones supported by current research
Recommended from our members
Integrated Analysis of Multiple Microarray Datasets Identifies a Reproducible Survival Predictor in Ovarian Cancer
Background
Public data integration may help overcome challenges in clinical implementation of microarray profiles. We integrated several ovarian cancer datasets to identify a reproducible predictor of survival.
Methodology/Principal Findings
Four microarray datasets from different institutions comprising 265 advanced stage tumors were uniformly reprocessed into a single training dataset, also adjusting for inter-laboratory variation (ābatch-effectā). Supervised principal component survival analysis was employed to identify prognostic models. Models were independently validated in a 61-patient cohort using a custom array genechip and a publicly available 229-array dataset. Molecular correspondence of high- and low-risk outcome groups between training and validation datasets was demonstrated using Subclass Mapping. Previously established molecular phenotypes in the 2nd validation set were correlated with high and low-risk outcome groups. Functional representational and pathway analysis was used to explore gene networks associated with high and low risk phenotypes. A 19-gene model showed optimal performance in the training set (median OS 31 and 78 months, p<0.01), 1st validation set (median OS 32 months versus not-yet-reached, pā=ā0.026) and 2nd validation set (median OS 43 versus 61 months, pā=ā0.013) maintaining independent prognostic power in multivariate analysis. There was strong molecular correspondence of the respective high- and low-risk tumors between training and 1st validation set. Low and high-risk tumors were enriched for favorable and unfavorable molecular subtypes and pathways, previously defined in the public 2nd validation set.
Conclusions/Significance
Integration of previously generated cancer microarray datasets may lead to robust and widely applicable survival predictors. These predictors are not simply a compilation of prognostic genes but appear to track true molecular phenotypes of good- and poor-outcome
A Grain Carried by the Flood: methods and data for global change ecology amidst a data deluge
Thesis (Ph.D.)--University of Washington, 2020Forecasting the responses of ecological systems to changing environment is a critical area of modern ecology research. An overwhelming amount of openly-available ecological and environmental data is emerging in service of this goal, but often the methodology for producing ecological insight from these heterogeneous data sources is out of reach of standard ecological practice. In this dissertation, I investigate opportunities to use open, heterogeneous ecological data to produce new insight via contributions in modeling methodology, emerging data sources, and global-scale mechanistic analysis. In the first of three chapters, I find that modern nonlinear modeling methods are able to improve range shift predictions made via species traits. Second, I develop a snow cover data product for montane ecological research from an emerging satellite observation platform with unprecedented spatial and temporal resolution. Finally, I contribute testable predictions of phytoplankton physiological responses to marine heatwave events by pairing a globally-distributed observational dataset with empirically-derived thermal reaction norms of fitness. Taken together, these contributions represent both independent discoveries toward more accurate ecological forecasting and the extraordinary potential of an approach to ecological research driven by open ecological and environmental data sources and modern methods
Accounting for nonlinear responses to traits improves range shift predictions
Abstract Accurately predicting species' range shifts in response to environmental change is paramount for understanding ecological processes and global change. In synthetic analyses, traits emerge as significant but weak predictors of species' range shifts across recent climate change. These studies assume linear responses to traits, while detailed empirical work often reveals trait responses that are unimodal and contain thresholds or other nonlinearities. We hypothesize that the use of linear modeling approaches fails to capture these nonlinearities and, therefore, may be underāpowering traits to predict range shifts. We evaluate the predictive performance of approaches that can capture nonlinear relationships (ridgeāregularized linear regression, support vector regression with linear and nonlinear kernels, and random forests). We apply our models using six multidecadal range shift datasets for plants, moths, marine fish, birds, and small mammals. We show that nonlinear approaches can perform better than leastāsquares linear modeling in reproducing historical range shifts. Consistent with expectations, we identify dispersal and climatic niche traits as primary determinants of distribution shifts. Traits identified as important predictors and the direction of trait effects are generally consistent across models, but there are notable exceptions. Among important predictors, there are more consistent responses to climatic niches than dispersal ability. Modest improvements in predictability when accounting for nonlinearities and interactions, and the overall low amount of variance accounted for by trait predictors suggest limits to traitābased statistical predictive frameworks
High-Resolution Snow-Covered Area Mapping in Forested Mountain Ecosystems Using PlanetScope Imagery
Improving high-resolution (meter-scale) mapping of snow-covered areas in complex and forested terrains is critical to understanding the responses of species and water systems to climate change. Commercial high-resolution imagery from Planet Labs, Inc. (Planet, San Francisco, CA, USA) can be used in environmental science, as it has both high spatial (0.7-3.0 m) and temporal (1-2 day) resolution. Deriving snow-covered areas from Planet imagery using traditional radiometric techniques have limitations due to the lack of a shortwave infrared band that is needed to fully exploit the difference in reflectance to discriminate between snow and clouds. However, recent work demonstrated that snow cover area (SCA) can be successfully mapped using only the PlanetScope 4-band (Red, Green, Blue and NIR) reflectance products and a machine learning (ML) approach based on convolutional neural networks (CNN). To evaluate how additional features improve the existing model performance, we: (1) build on previous work to augment a CNN model with additional input data including vegetation metrics (Normalized Difference Vegetation Index) and DEM-derived metrics (elevation, slope and aspect) to improve SCA mapping in forested and open terrain, (2) evaluate the model performance at two geographically diverse sites (Gunnison, Colorado, USA and Engadin, Switzerland), and (3) evaluate the model performance over different land-cover types. The best augmented model used the Normalized Difference Vegetation Index (NDVI) along with visible (red, green, and blue) and NIR bands, with an F-score of 0.89 (Gunnison) and 0.93 (Engadin) and was found to be 4% and 2% better than when using canopy height- and terrain-derived measures at Gunnison, respectively. The NDVI-based model improves not only upon the original band-only model's ability to detect snow in forests, but also across other various land-cover types (gaps and canopy edges). We examined the model's performance in forested areas using three forest canopy quantification metrics and found that augmented models can better identify snow in canopy edges and open areas but still underpredict snow cover under forest canopies. While the new features improve model performance over band-only options, the models still have challenges identifying the snow under trees in dense forests, with performance varying as a function of the geographic area. The improved high-resolution snow maps in forested environments can support studies involving climate change effects on mountain ecosystems and evaluations of hydrological impacts in snow-dominated river basins.ISSN:2072-429
Climate change impacts on natural icons: Do phenological shifts threaten the relationship between peak wildflowers and visitor satisfaction?
Climate change will affect the timing of natural features of recreational interest, like fall colors, salmon migration, and wildflower blooms; and may therefore alter social-ecological relationships. For example, if fewer recreational visits are aligned with seasonal events of interest, visitor satisfaction could be affected. To explore this possibility at Mount Rainier National Park, we combined data from a community science program (MeadoWatch MW) with hiking trip reports posted to a hiking organization (Washington Trails Association WTA). We first explored how peak flowering, WTA trip reports, and visitation varied across years that differed in snow disappearance, a climatic factor that correlates with flowering phenology. We found that wildflower blooms tracked snow disappearance more closely than did trip reports and park visitation, implying a decreasing proportion of future visitors will experience peak wildflower blooms. We next extracted sentiment related to specific trail-experiences (e.g., wildflowers, views) and overall hike satisfaction from WTA trip reports. While wildflowers were a positive component in overall hiker satisfaction, other non-seasonal trail experiences also had positive effects. In all, a shifting wildflower season that is less accessible to visitors could alter perceptions of natural areas like Mount Rainier National Park. Countering negative social-ecological impacts could be achieved by highlighting non-seasonal aspects of the visitor experience, or alternatively, communicating the altered timing of the peak wildflower season while also increasing accessibility during this time. Such actions likely require partnerships between managers of natural areas, interpretive staff, and scientists that study seasonal phenomena of recreational interest.ISSN:2666-900
Ī²3-Integrin Expression on Tumor Cells Inhibits Tumor Progression, Reduces Metastasis, and Is Associated with a Favorable Prognosis in Patients with Ovarian Cancer
The role of the vitronectin receptor (Ī±vĪ²3-integrin) as a tumor promoter seems well established, and, consequently, therapies that block this integrin are currently in clinical testing. We undertook the current study to determine whether Ī±vĪ²3-integrin is an appropriate target in ovarian cancer treatment. Expression of Ī²3-integrin in SKOV3ip1 ovarian cancer cells led to the overexpression of Ī±vĪ²3-integrin on the cell surface and increased adhesion. However, Ī±vĪ²3-integrin-overexpressing cells showed impaired invasion, protease expression, and colony formation. These results were recapitulated in xenograft studies: Ī±vĪ²3-integrin-expressing cells showed increased adhesion to mouse peritoneum, but the overall number of metastatic nodules (105 versus 68 tumors) and tumor weight were significantly lower than those in the parental SKOV3ip1 cells. The Ī±vĪ²3-integrin-overexpressing cells had a decreased proliferation rate mediated by inhibition of cyclin B1 and induction of phospho-Cdc2 and p53 expression, consistent with a G2M cell cycle arrest. Confirming the above results, inhibition of Ī²3-integrin in cultured or primary OvCa cells decreased adhesion but increased invasion and proliferation. Patients with tumors expressing high Ī²3-integrin had significantly better disease-free and overall survival (52 months versus 27 months, P < 0.05). This study shows that Ī±vĪ²3-integrin expression on tumor cells actually slows tumor progression and acts as a tumor suppressor. Therefore, the vitronectin receptor might not be an appropriate therapeutic target in ovarian cancer