265 research outputs found
Detection of Recurrent Copy Number Alterations in the Genome: a Probabilistic Approach
Copy number variation (CNV) in genomic DNA is linked to a variety of human diseases (including cancer, HIV acquisition, autoimmune and neurodegenerative diseases), and array-based CGH (aCGH) is currently the main technology to locate CNVs. Several methods can analyze aCGH data at the single sample level, but disease-critical genes are more likely to be found in regions that are common or recurrent among samples. Unfortunately, defining recurrent CNV regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for CNVs that affect only some subsets of the samples (without prior knowledge of which regions and subsets of samples are affected), but this is neglected by current methods.
We have developed two methods to define recurrent CNV regions. Our methods are unique and qualitatively different from existing approaches: they detect both regions over the complete set of arrays and alterations that are common only to some subsets of the samples and, thus, CNV alterations that might characterize previously unknown groups; they use probabilities of alteration as input (not discretized gain/loss calls, which discard uncertainty and variability) and return probabilities of being a shared common region, thus allowing researchers to modify thresholds as needed; the two parameters of the methods have an immediate, straightforward, biological interpretation. Using data from previous studies, we show that we can detect patterns that other methods miss and, by using probabilities, that researchers can modify, as needed, thresholds of immediate interpretability to answer specific research questions.
These methods are a qualitative advance in the location of recurrent CNV regions and will be instrumental in efforts to standardize definitions of recurrent CNVs and cluster samples with respect to patterns of CNV, and ultimately in the search for genomic regions harboring disease-critical genes
Finding Recurrent Regions of Copy Number Variation: A Review
Copy number variation (CNV) in genomic DNA is linked to a variety of human diseases, and array-based CGH (aCGH) is currently the main technology to locate CNVs. Although many methods have been developed to analyze aCGH from a single array/subject, disease-critical genes are more likely to be found in regions that are common or recurrent among subjects. Unfortunately, finding recurrent CNV regions remains a challenge. We review existing methods for the identification of recurrent CNV regions. The working definition of ``common\u27\u27 or ``recurrent\u27\u27 region differs between methods, leading to approaches that use different types of input (discretized output from a previous CGH segmentation analysis or intensity ratios), or that incorporate to varied degrees biological considerations (which play a role in the identification of ``interesting\u27\u27 regions and in the details of null models used to assess statistical significance). Very few approaches use and/or return probabilities, and code is not easily available for several methods. We suggest that finding recurrent CNVs could benefit from reframing the problem in a biclustering context. We also emphasize that, when analyzing data from complex diseases with significant among-subject heterogeneity, methods should be able to identify CNVs that affect only a subset of subjects. We make some recommendations about choice among existing methods, and we suggest further methodological research
Predicting COVID-19 progression from diagnosis to recovery or death linking primary care and hospital records in Castilla y León (Spain).
This paper analyses COVID-19 patients' dynamics during the first wave in the region of Castilla y León (Spain) with around 2.4 million inhabitants using multi-state competing risk survival models. From the date registered as the start of the clinical process, it is assumed that a patient can progress through three intermediate states until reaching an absorbing state of recovery or death. Demographic characteristics, epidemiological factors such as the time of infection and previous vaccinations, clinical history, complications during the course of the disease and drug therapy for hospitalised patients are considered as candidate predictors. Regarding risk factors associated with mortality and severity, consistent results with many other studies have been found, such as older age, being male, and chronic diseases. Specifically, the hospitalisation (death) rate for those over 69 is 27.2% (19.8%) versus 5.3% (0.7%) for those under 70, and for males is 14.5%(7%) versus 8.3%(4.6%)for females. Among patients with chronic diseases the highest rates of hospitalisation are 26.1% for diabetes and 26.3% for kidney disease, while the highest death rate is 21.9% for cerebrovascular disease. Moreover, specific predictors for different transitions are given, and estimates of the probability of recovery and death for each patient are provided by the model. Some interesting results obtained are that for patients infected at the end of the period the hazard of transition from hospitalisation to ICU is significatively lower (p < 0.001) and the hazard of transition from hospitalisation to recovery is higher (p < 0.001). For patients previously vaccinated against pneumococcus the hazard of transition to recovery is higher (p < 0.001). Finally, internal validation and calibration of the model are also performed
Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases
Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases
Asterias: a parallelized web-based suite for the analysis of expression and aCGH data
Asterias (\url{http://www.asterias.info}) is an integrated collection of
freely-accessible web tools for the analysis of gene expression and aCGH data.
Most of the tools use parallel computing (via MPI). Most of our applications
allow the user to obtain additional information for user-selected genes by
using clickable links in tables and/or figures. Our tools include:
normalization of expression and aCGH data; converting between different types
of gene/clone and protein identifiers; filtering and imputation; finding
differentially expressed genes related to patient class and survival data;
searching for models of class prediction; using random forests to search for
minimal models for class prediction or for large subsets of genes with
predictive capacity; searching for molecular signatures and predictive genes
with survival data; detecting regions of genomic DNA gain or loss. The
capability to send results between different applications, access to additional
functional information, and parallelized computation make our suite unique and
exploit features only available to web-based applications.Comment: web based application; 3 figure
Asterias: integrated analysis of expression and aCGH data using an open-source, web-based, parallelized software suite
Asterias (http://www.asterias.info) is an open-source, web-based, suite for the analysis of gene expression and aCGH data. Asterias implements validated statistical methods, and most of the applications use parallel computing, which permits taking advantage of multicore CPUs and computing clusters. Access to, and further analysis of, additional biological information and annotations (PubMed references, Gene Ontology terms, KEGG and Reactome pathways) are available either for individual genes (from clickable links in tables and figures) or sets of genes. These applications cover from array normalization to imputation and preprocessing, differential gene expression analysis, class and survival prediction and aCGH analysis. The source code is available, allowing for extention and reuse of the software. The links and analysis of additional functional information, parallelization of computation and open-source availability of the code make Asterias a unique suite that can exploit features specific to web-based environments
Asterias: A Parallelized Web-based Suite for the Analysis of Expression and aCGH Data
The analysis of expression and CGH arrays plays a central role in the study of complex diseases, especially cancer, including finding markers for early diagnosis and prognosis, choosing an optimal therapy, or increasing our understanding of cancer development and metastasis. Asterias (http://www.asterias.info) is an integrated collection of freely-accessible web tools for the analysis of gene expression and aCGH data. Most of the tools use parallel computing (via MPI) and run on a server with 60 CPUs for computation; compared to a desktop or server-based but not parallelized application, parallelization provides speed ups of factors up to 50. Most of our applications allow the user to obtain additional information for user-selected genes (chromosomal location, PubMed ids, Gene Ontology terms, etc.) by using clickable links in tables and/or figures. Our tools include: normalization of expression and aCGH data (DNMAD); converting between different types of gene/clone and protein identifiers (IDconverter/IDClight); filtering and imputation (preP); finding differentially expressed genes related to patient class and survival data (Pomelo II); searching for models of class prediction (Tnasas); using random forests to search for minimal models for class prediction or for large subsets of genes with predictive capacity (GeneSrF); searching for molecular signatures and predictive genes with survival data (SignS); detecting regions of genomic DNA gain or loss (ADaCGH). The capability to send results between different applications, access to additional functional information, and parallelized computation make our suite unique and exploit features only available to web-based applications
Recommended from our members
FGFR1 amplification or overexpression and hormonal resistance in luminal breast cancer: rationale for a triple blockade of ER, CDK4/6, and FGFR1.
BACKGROUND: FGFR1 amplification, but not overexpression, has been related to adverse prognosis in hormone-positive breast cancer (HRPBC). Whether FGFR1 overexpression and amplification are correlated, what is their distribution among luminal A or B HRPBC, and if there is a potential different prognostic role for amplification and overexpression are currently unknown features. The role of FGFR1 inhibitors in HRPBC is also unclear. METHODS: FGFR1 amplification (FISH) and overexpression (RNAscope) were investigated in a N = 251 HRPBC patients cohort and the METABRIC cohort; effects on survival and FISH-RNAscope concordance were determined. We generated hormonal deprivation resistant (LTED-R) and FGFR1-overexpressing cell line variants of the ER+ MCF7 and T47-D and the ER+, FGFR1-amplified HCC1428 cell lines. The role of ER, CDK4/6, and/or FGFR1 blockade alone or in combinations in Rb phosphorylation, cell cycle, and survival were studied. RESULTS: FGFR1 overexpression and amplification was non-concordant in > 20% of the patients, but both were associated to a similar relapse risk (~ 2.5-fold; P < 0.05). FGFR1 amplification or overexpression occurred regardless of the luminal subtype, but the incidence was higher in luminal B (16.3%) than A (6.6%) tumors; P < 0.05. The Kappa index for overexpression and amplification was 0.69 (P < 0.001). Twenty-four per cent of the patients showed either amplification and/or overexpression of FGFR1, what was associated to a hazard ratio for relapse of 2.6 (95% CI 1.44-4.62, P < 0.001). In vitro, hormonal deprivation led to FGFR1 overexpression. Primary FGFR1 amplification, engineered mRNA overexpression, or LTED-R-acquired FGFR1 overexpression led to resistance against hormonotherapy alone or in combination with the CDK4/6 inhibitor palbociclib. Blocking FGFR1 with the kinase-inhibitor rogaratinib led to suppression of Rb phosphorylation, abrogation of the cell cycle, and resistance-reversion in all FGFR1 models. CONCLUSIONS: FGFR1 amplification and overexpression are associated to similar adverse prognosis in hormone-positive breast cancer. Capturing all the patients with adverse prognosis-linked FGFR1 aberrations requires assessing both features. Hormonal deprivation leads to FGFR1 overexpression, and FGFR1 overexpression and/or amplification are associated with resistance to hormonal monotherapy or in combination with palbociclib. Both resistances are reverted with triple ER, CDK4/6, and FGFR1 blockade
Recommended from our members
Fbxl17 is rearranged in breast cancer and loss of its activity leads to increased global O -GlcNAcylation
Funder: Wildy Fellowship Department of PathologyFunder: Addenbrooke's Charitable Trust, Cambridge University Hospitals; doi: http://dx.doi.org/10.13039/501100002927Funder: The Mark FoundationAbstract: In cancer, many genes are mutated by genome rearrangement, but our understanding of the functional consequences of this remains rudimentary. Here we report the F-box protein encoded by FBXL17 is disrupted in the region of the gene that encodes its substrate-binding leucine rich repeat (LRR) domain. Truncating Fbxl17 LRRs impaired its association with the other SCF holoenzyme subunits Skp1, Cul1 and Rbx1, and decreased ubiquitination activity. Loss of the LRRs also differentially affected Fbxl17 binding to its targets. Thus, genomic rearrangements in FBXL17 are likely to disrupt SCFFbxl17-regulated networks in cancer cells. To investigate the functional effect of these rearrangements, we performed a yeast two-hybrid screen to identify Fbxl17-interacting proteins. Among the 37 binding partners Uap1, an enzyme involved in O-GlcNAcylation of proteins was identified most frequently. We demonstrate that Fbxl17 binds to UAP1 directly and inhibits its phosphorylation, which we propose regulates UAP1 activity. Knockdown of Fbxl17 expression elevated O-GlcNAcylation in breast cancer cells, arguing for a functional role for Fbxl17 in this metabolic pathway
- …