3,881 research outputs found
Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues
Our knowledge on tissue- and disease-specific functions of human genes is rather limited and highly context-specific. Here, we have developed a method for the comparison of mRNA expression levels of most human genes across 9,783 Affymetrix gene expression array experiments representing 43 normal human tissue types, 68 cancer types, and 64 other diseases. This database of gene expression patterns in normal human tissues and pathological conditions covers 113 million datapoints and is available from the GeneSapiens website
Decorin protein core affects the global gene expression profile of the tumor microenvironment in a triple-negative orthotopic breast carcinoma xenograft model
Decorin, a member of the small leucine-rich proteoglycan gene family, exists and functions wholly within the tumor microenvironment to suppress tumorigenesis by directly targeting and antagonizing multiple receptor tyrosine kinases, such as the EGFR and Met. This leads to potent and sustained signal attenuation, growth arrest, and angiostasis. We thus sought to evaluate the tumoricidal benefits of systemic decorin on a triple-negative orthotopic breast carcinoma xenograft model. To this end, we employed a novel high-density mixed expression array capable of differentiating and simultaneously measuring gene signatures of both Mus musculus (stromal) and Homo sapiens (epithelial) tissue origins. We found that decorin protein core modulated the differential expression of 374 genes within the stromal compartment of the tumor xenograft. Further, our top gene ontology classes strongly suggests an unexpected and preferential role for decorin protein core to inhibit genes necessary for immunomodulatory responses while simultaneously inducing expression of those possessing cellular adhesion and tumor suppressive gene properties. Rigorous verification of the top scoring candidates led to the discovery of three genes heretofore unlinked to malignant breast cancer that were reproducibly found to be induced in several models of tumor stroma. Collectively, our data provide highly novel and unexpected stromal gene signatures as a direct function of systemic administration of decorin protein core and reveals a fundamental basis of action for decorin to modulate the tumor stroma as a biological mechanism for the ascribed anti-tumorigenic properties
Mitmemõõtmeliste andmete statistiline analüüs bioinformaatikas
Väitekirja elektrooniline versioon ei sisalda publikatsioone.Valgud on organismide ühed tähtsaimad ehituskivid. Nende kogust ja omavahelisi seoseid uurides on võimalik saada infot organismi seisundi kohta. Tänapäevased seadmed võimaldavad koguda lühikese ajaga palju valkudega seotud andmeid. Nende analüüs on aga suhteliselt keerukas ja on loonud uue teadusharu nimega bioinformaatika.
Käesoleva doktoritöö eesmärgiks on kirjeldada mitmemõõtmeliste andmete statistilise analüüsiga seotud probleeme ja nende lahendusi. Näidatakse, kuidas sellised andmed saab esitada maatriksi kujul. Antakse ülevaade andmeallikatest ja analüüsimeetoditest ning näidatakse, kuidas neid saab praktikas kasutada. Kirjeldatakse üleeuroopalist vähiuuringute projekti PREDECT, kus paljud organisatsioonid osalevad vähimudelite täiustamises. Antakse ülevaade metaandmete kogumisest paljudelt partneritelt, samuti veebitööriistadest, mis loodi esmaseks andmeanalüüsiks. Kirjeldatakse uudse rinnavähi mudeliga seotud analüüsi ja koelõikude võrdlust erinevates laboritingimustes. Tutvustatakse vabalt kasutatavat veebitööriista, millega saab teha kirjeldavat andmeanalüüsi. Järgmistes peatükkides kirjeldatakse andmeanalüüsi erinevates uuringutes. Inimese platsentas leiti mitmeid uusi alleelispetsiifilise ekspressiooniga geene. Uuriti atoopilise dermatiidi molekulaarseid mehhanisme, täpsemalt valgu gamma-interferoon mõju sellele haigusele. Leiti mikroRNAsid, mida saab kasutada endometrioosi markeritena, ja loodi klassifitseerija endometrioosihaigete eristamiseks tervetest.Proteins are one of the most important building blocks of an organism. By investigating the abundance and relations between different proteins, it is possible to get information about the current state of the organism. Modern technologies allow to collect a large amount of data related to proteins in a short period of time. This type of analysis is quite complicated and has created a new field of science called bioinformatics.
The aim of the dissertation is to describe problems and solutions related to statistical analysis of multivariate data. It is shown how this type of data can be presented as a matrix. An overview of data sources and analysis methods is given and it is shown how they can be used in practice. A pan-European project PREDECT is described where many organizations are contributing to develop better cancer models. An overview is given about collecting metadata from multiple partners, and about web tools created for initial data analysis. An analysis concerning a novel breast cancer model is described, and a comparison of tissue slices in different cultivation conditions is made. A freely available web tool is introduced which allows to perform exploratory data analysis. Next chapters describe data analysis in various projects. Multiple novel genes were found in the human placenta that have an allele-specific expression. Molecular mechanisms of a disease called atopic dermatitis were examined, more specifically the influence of the protein interferon-gamma. MicroRNAs were found that can be used as markers for a disease called endometriosis, and a classifier was built to differentiate people with endometriosis from healthy people
Previously Unidentified Changes in Renal Cell Carcinoma Gene Expression Identified by Parametric Analysis of Microarray Data
BACKGROUND. Renal cell carcinoma is a common malignancy that often presents as a metastatic-disease for which there are no effective treatments. To gain insights into the mechanism of renal cell carcinogenesis, a number of genome-wide expression profiling studies have been performed. Surprisingly, there is very poor agreement among these studies as to which genes are differentially regulated. To better understand this lack of agreement we profiled renal cell tumor gene expression using genome-wide microarrays (45,000 probe sets) and compare our analysis to previous microarray studies. METHODS. We hybridized total RNA isolated from renal cell tumors and adjacent normal tissue to Affymetrix U133A and U133B arrays. We removed samples with technical defects and removed probesets that failed to exhibit sequence-specific hybridization in any of the samples. We detected differential gene expression in the resulting dataset with parametric methods and identified keywords that are overrepresented in the differentially expressed genes with the Fisher-exact test. RESULTS. We identify 1,234 genes that are more than three-fold changed in renal tumors by t-test, 800 of which have not been previously reported to be altered in renal cell tumors. Of the only 37 genes that have been identified as being differentially expressed in three or more of five previous microarray studies of renal tumor gene expression, our analysis finds 33 of these genes (89%). A key to the sensitivity and power of our analysis is filtering out defective samples and genes that are not reliably detected. CONCLUSIONS. The widespread use of sample-wise voting schemes for detecting differential expression that do not control for false positives likely account for the poor overlap among previous studies. Among the many genes we identified using parametric methods that were not previously reported as being differentially expressed in renal cell tumors are several oncogenes and tumor suppressor genes that likely play important roles in renal cell carcinogenesis. This highlights the need for rigorous statistical approaches in microarray studies.National Institutes of Healt
CAncer bioMarker Prediction Pipeline (CAMPP) - A standardized framework for the analysis of quantitative biological data
With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework
Mammary molecular portraits reveal lineage-specific features and progenitor cell vulnerabilities.
The mammary epithelium depends on specific lineages and their stem and progenitor function to accommodate hormone-triggered physiological demands in the adult female. Perturbations of these lineages underpin breast cancer risk, yet our understanding of normal mammary cell composition is incomplete. Here, we build a multimodal resource for the adult gland through comprehensive profiling of primary cell epigenomes, transcriptomes, and proteomes. We define systems-level relationships between chromatin-DNA-RNA-protein states, identify lineage-specific DNA methylation of transcription factor binding sites, and pinpoint proteins underlying progesterone responsiveness. Comparative proteomics of estrogen and progesterone receptor-positive and -negative cell populations, extensive target validation, and drug testing lead to discovery of stem and progenitor cell vulnerabilities. Top epigenetic drugs exert cytostatic effects; prevent adult mammary cell expansion, clonogenicity, and mammopoiesis; and deplete stem cell frequency. Select drugs also abrogate human breast progenitor cell activity in normal and high-risk patient samples. This integrative computational and functional study provides fundamental insight into mammary lineage and stem cell biology
iGPSe: A Visual Analytic System for Integrative Genomic Based Cancer Patient Stratification
Background: Cancers are highly heterogeneous with different subtypes. These
subtypes often possess different genetic variants, present different
pathological phenotypes, and most importantly, show various clinical outcomes
such as varied prognosis and response to treatment and likelihood for
recurrence and metastasis. Recently, integrative genomics (or panomics)
approaches are often adopted with the goal of combining multiple types of omics
data to identify integrative biomarkers for stratification of patients into
groups with different clinical outcomes. Results: In this paper we present a
visual analytic system called Interactive Genomics Patient Stratification
explorer (iGPSe) which significantly reduces the computing burden for
biomedical researchers in the process of exploring complicated integrative
genomics data. Our system integrates unsupervised clustering with graph and
parallel sets visualization and allows direct comparison of clinical outcomes
via survival analysis. Using a breast cancer dataset obtained from the The
Cancer Genome Atlas (TCGA) project, we are able to quickly explore different
combinations of gene expression (mRNA) and microRNA features and identify
potential combined markers for survival prediction. Conclusions: Visualization
plays an important role in the process of stratifying given population
patients. Visual tools allowed for the selection of possibly features across
various datasets for the given patient population. We essentially made a case
for visualization for a very important problem in translational informatics.Comment: BioVis 2014 conferenc
Recommended from our members
Interaction-Based Learning for High-Dimensional Data with Continuous Predictors
High-dimensional data, such as that relating to gene expression in microarray experiments, may contain substantial amount of useful information to be explored. However, the information, relevant variables and their joint interactions are usually diluted by noise due to a large number of non-informative variables. Consequently, variable selection plays a pivotal role for learning in high dimensional problems. Most of the traditional feature selection methods, such as Pearson's correlation between response and predictors, stepwise linear regressions and LASSO are among the popular linear methods. These methods are effective in identifying linear marginal effect but are limited in detecting non-linear or higher order interaction effects. It is well known that epistasis (gene - gene interactions) may play an important role in gene expression where unknown functional forms are difficult to identify. In this thesis, we propose a novel nonparametric measure to first screen and do feature selection based on information from nearest neighborhoods. The method is inspired by Lo and Zheng's earlier work (2002) on detecting interactions for discrete predictors. We apply a backward elimination algorithm based on this measure which leads to the identification of many in influential clusters of variables. Those identified groups of variables can capture both marginal and interactive effects. Second, each identified cluster has the potential to perform predictions and classifications more accurately. We also study procedures how to combine these groups of individual classifiers to form a final predictor. Through simulation and real data analysis, the proposed measure is capable of identifying important variable sets and patterns including higher-order interaction sets. The proposed procedure outperforms existing methods in three different microarray datasets. Moreover, the nonparametric measure is quite flexible and can be easily extended and applied to other areas of high-dimensional data and studies
- …