2,198 research outputs found
Integrative disease classification based on cross-platform microarray data
<p>Abstract</p> <p>Background</p> <p>Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification.</p> <p>Results</p> <p>In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ratios across datasets. In addition, we systematically map textual annotations of datasets to concepts in Unified Medical Language System (UMLS), permitting quantitative analysis of the phenotype "distance" between datasets and automated construction of disease classes. We design a new classification approach named ManiSVM, which integrates Manifold data transformation with SVM learning to exploit the data properties. Using the leave one dataset out cross validation, ManiSVM achieved the overall accuracy of 70.7% (68.6% precision and 76.9% recall) with many disease classes achieving the accuracy higher than 80%.</p> <p>Conclusion</p> <p>Our results not only demonstrated the feasibility of the integrated disease classification approach, but also showed that the classification accuracy increases with the number of homogenous training datasets. Thus, the power of the integrative approach will increase with the continuous accumulation of microarray data in public repositories. Our study shows that automated disease diagnosis can be an important and promising application of the enormous amount of costly to generate, yet freely available, public microarray data.</p
Income Smoothing over the Business Cycle: Changes in Banks’ Coordinated Management of Provisions for Loan Losses and Loan Charge-offs from the Pre-1990 Bust to the 1990s Boom
We provide evidence that banks smooth income by managing provisions for loan losses and loan charge-offs in a coordinated fashion that varies across the bust and boom phases of the business cycle and across homogeneous and heterogeneous loan types. In particular, during the 1990s boom, we predict and find that banks accelerated provisioning for loan losses and made this less obvious by accelerating loan charge-offs, especially for homogenous loans for which charge-offs are determined using number-of-days-past-due rules. We also provide evidence that the valuation implications of banks’ provisions for loan losses and loan charge-offs vary across the phases of the business cycle and loan types reflecting the effect of these factors on banks’ income smoothing. In particular, during the 1990s boom, we predict and find that charge-offs of homogenous loans have a positive association with current returns and future cash flows, because these charge-offs are recorded primarily by healthy banks with good future prospects reducing over-stated allowances for loan losses. We also predict and find that these charge-offs have a positive association with future returns that is explained by their positive association with future net income and recoveries. Our results are consistent with the market only partially appreciating healthy banks’ overstatement of charge-offs of homogeneous loans based on number-of-days-past-due rules during the 1990s boom, because of the perceived non-discretionary nature of these charge-offs
REACTIN: Regulatory Activity Inference of Transcription Factors Underlying Human Diseases with Application to Breast Cancer
Genetic alterations of transcription factors (TFs) have been implicated in the tumorigenesis of cancers. In many cancers, alteration of TFs results in aberrant activity of them without changing their gene expression level. Gene expression data from microarray or RNA-seq experiments can capture the expression change of genes, however, it is still challenge to reveal the activity change of TFs. Here we propose a method, called REACTIN (REgulatory ACTivity INference), which integrates TF binding data with gene expression data to identify TFs with significantly differential activity between disease and normal samples. REACTIN successfully detect differential activity of estrogen receptor (ER) between ER+ and ER- samples in 10 breast cancer datasets. When applied to compare tumor and normal breast samples, it reveals TFs that are critical for carcinogenesis of breast cancer. Moreover, Reaction can be utilized to identify transcriptional programs that are predictive to patient survival time of breast cancer patients
Recommended from our members
Metabolic Pathways Enhancement Confers Poor Prognosis in p53 Exon Mutant Hepatocellular Carcinoma.
RNA-Sequencing (RNA-Seq), the most commonly used sequencing application tool, is not only a method for measuring gene expression but also an excellent media to detect important structural variants such as single nucleotide variants (SNVs), insertion/deletion (Indels), or fusion transcripts. The Cancer Genome Atlas (TCGA) contains genomic data from a variety of cancer types and also provides the raw data generated by TCGA consortium. p53 is among the top 10 somatic mutations associated with hepatocellular carcinoma (HCC). The aim of the present study was to analyze concordant different gene profiles and the priori defined set of genes based on p53 mutation status in HCC using RNA-Seq data. In the study, expression profile of 11 799 genes on 42 paired tumor and adjacent normal tissues was collected, processed, and further stratified by the mutated versus normal p53 expression. Furthermore, we used a knowledge-based approach Gene Set Enrichment Analysis (GSEA) to compare between normal and p53 mutation gene expression profiles. The statistical significance (nominal P value) of the enrichment score (ES) genes was calculated. The ranked gene list that reflects differential expression between p53 wild-type and mutant genotypes was then mapped to metabolic process by KEGG, an encyclopedia of genes and genomes to assign functional meanings. These approaches enable us to identify pathways and potential target gene/pathways that are highly expressed in p53 mutated HCC. Our analysis revealed 2 genes, the hexokinase 2 (HK2) and Enolase 1 (ENO1), were conspicuous of red pixel in the heatmap. To further explore the role of these genes in HCC, the overall survival plots by Kaplan-Meier method were performed for HK2 and ENO1 that revealed high HK2 and ENO1 expression in patients with HCC have poor prognosis. These results suggested that these glycolysis genes are associated with mutated-p53 in HCC that may contribute to poor prognosis. In this proof-of-concept study, we proposed an approach for identifying novel potential therapeutic targets in human HCC with mutated p53. These approaches can take advantage of the massive next-generation sequencing (NGS) data generated worldwide and make more out of it by exploring new potential therapeutic targets
- …