5 research outputs found

    Hybrid Method for Prediction of Metastasis in Breast Cancer Patients Using Gene Expression Signals

    No full text
    Using primary tumor gene expression has been shown to have the ability of finding metastasis-driving gene markers for prediction of breast cancer recurrence (BCR). However, there are some difficulties associated with analysis of microarray data, which led to poor predictive power and inconsistency of previously introduced gene signatures. In this study, a hybrid method was proposed for identifying more predictive gene signatures from microarray datasets. Initially, the parameters of a Rough-Set (RS) theory based feature selection method were tuned to construct a customized gene extraction algorithm. Afterward, using RS gene selection method the most informative genes selected from six independent breast cancer datasets. Then, combined set of these six signature sets, containing 114 genes, was evaluated for prediction of BCR. In final, a meta-signature, containing 18 genes, selected from the combination of datasets and its prediction accuracy compared to the combined signature. The results of 10-fold cross-validation test showed acceptable misclassification error rate (MCR) over 1338 cases of breast cancer patients. In comparison to a recent similar work, our approach reached more than 5% reduction in MCR using a fewer number of genes for prediction. The results also demonstrated 7% improvement in average accuracy in six utilized datasets, using the combined set of 114 genes in comparison with 18-genes meta-signature. In this study, a more informative gene signature was selected for prediction of BCR using a RS based gene extraction algorithm. To conclude, combining different signatures demonstrated more stable prediction over independent datasets

    Using Protein Interaction Database and Support Vector Machines to Improve Gene Signatures for Prediction of Breast Cancer Recurrence

    No full text
    Numerous studies used microarray gene expression data to extract metastasis-driving gene signatures for the prediction of breast cancer relapse. However, the accuracy and generality of the previously introduced biomarkers are not acceptable for reliable usage in independent datasets. This inadequacy is attributed to ignoring gene interactions by simple feature selection methods, due to their computational burden. In this study, an integrated approach with low computational cost was proposed for identifying a more predictive gene signature, for prediction of breast cancer recurrence. First, a small set of genes was primarily selected as signature by an appropriate filter feature selection (FFS) method. Then, a binary sub-class of protein-protein interaction (PPI) network was used to expand the primary set by adding adjacent proteins of each gene signature from the PPI-network. Subsequently, the support vector machine-based recursive feature elimination (SVMRFE) method was applied to the expression level of all the genes in the expanded set. Finally, the genes with the highest score by SVMRFE were selected as the new biomarkers. Accuracy of the final selected biomarkers was evaluated to classify four datasets on breast cancer patients, including 800 cases, into two cohorts of poor and good prognosis. The results of the five-fold cross validation test, using the support vector machine as a classifier, showed more than 13% improvement in the average accuracy, after modifying the primary selected signatures. Moreover, the method used in this study showed a lower computational cost compared to the other PPI-based methods. The proposed method demonstrated more robust and accurate biomarkers using the PPI network, at a low computational cost. This approach could be used as a supplementary procedure in microarray studies after applying various gene selection methods

    Applying Two Different Bioinformatic Approaches to Discover Novel Genes Associated with Hereditary Hearing Loss via Whole-Exome Sequencing: ENDEAVOUR and HomozygosityMapper

    No full text
    Background: Hearing loss (HL) is a highly prevalent heterogeneous deficiency of sensory-neural system with involvement of several dozen genes. Whole-exome sequencing (WES) is capable of discovering known and novel genes involved with HL. Materials and Methods: Two pedigrees with HL background from Khuzestan province of Iran were selected. Polymerase chain reaction-sequencing of GJB2 and homozygosity mapping of 16 DFNB loci were performed. One patient of the first and two affected individuals from the second pedigree were subjected to WES. The result files were analyzed using tools on Ubuntu 16.04. Short reads were mapped to reference genome (hg19, NCBI Build 37). Sorting and duplication removals were done. Variants were obtained and annotated by an online software tool. Variant filtration was performed. In the first family, ENDEAVOUR was applied to prioritize candidate genes. In the second family, a combination of shared variants, homozygosity mapping, and gene expression were implemented to launch the disease-causing gene. Results: GJB2 sequencing and linkage analysis established no homozygosity-by-descent at any DFNB loci. Utilizing ENDEAVOUR, BBX: C.C857G (P.A286G), and MYH15: C.C5557T (P.R1853C) were put forward, but none of the variants co-segregated with the phenotype. Two genes, UNC13B and TRAK1, were prioritized in the homozygous regions detected by HomozygosityMapper. Conclusion: WES is regarded a powerful approach to discover molecular etiology of Mendelian inherited disorders, but as it fails to enrich GC-rich regions, incapability of capturing noncoding regulatory regions and limited specificity and accuracy of copy number variations detection tools from exome data, it is assumed an insufficient procedure

    Medium term load forecasting in distribution systems based on multi linear regression & principal component analysis: A novel approach: A novel approach

    No full text
    An accurate medium term load forecast (MTLF) is essential for expansion planning studies of distribution systems. Also the mid-term electric load as a function of time has a complex nonlinear behavior which makes the ordinary linear prediction methods seems insufficient. In this paper, a combination of multi linear regression and principle components analysis is used to predict weekly electrical peak load of Yazd city distribution system. According to the prediction results, main benefits of proposed method are simplicity of calculations and high accuracy forecasting for multi-horizon predictions. MATLAB© is used to implement the forecaster model

    Medium term load forecasting in distribution systems based on multi linear regression & principal component analysis: A novel approach:A novel approach

    No full text
    An accurate medium term load forecast (MTLF) is essential for expansion planning studies of distribution systems. Also the mid-term electric load as a function of time has a complex nonlinear behavior which makes the ordinary linear prediction methods seems insufficient. In this paper, a combination of multi linear regression and principle components analysis is used to predict weekly electrical peak load of Yazd city distribution system. According to the prediction results, main benefits of proposed method are simplicity of calculations and high accuracy forecasting for multi-horizon predictions. MATLAB© is used to implement the forecaster model
    corecore