60 research outputs found

    PKMiner: a database for exploring type II polyketide synthases

    Full text link

    Reuse of imputed data in microarray analysis increases imputation efficiency

    Get PDF
    BACKGROUND: The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. RESULTS: We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. CONCLUSIONS: Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data

    FiGS: a filter-based gene selection workbench for microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The selection of genes that discriminate disease classes from microarray data is widely used for the identification of diagnostic biomarkers. Although various gene selection methods are currently available and some of them have shown excellent performance, no single method can retain the best performance for all types of microarray datasets. It is desirable to use a comparative approach to find the best gene selection result after rigorous test of different methodological strategies for a given microarray dataset.</p> <p>Results</p> <p>FiGS is a web-based workbench that automatically compares various gene selection procedures and provides the optimal gene selection result for an input microarray dataset. FiGS builds up diverse gene selection procedures by aligning different feature selection techniques and classifiers. In addition to the highly reputed techniques, FiGS diversifies the gene selection procedures by incorporating gene clustering options in the feature selection step and different data pre-processing options in classifier training step. All candidate gene selection procedures are evaluated by the .632+ bootstrap errors and listed with their classification accuracies and selected gene sets. FiGS runs on parallelized computing nodes that capacitate heavy computations. FiGS is freely accessible at <url>http://gexp.kaist.ac.kr/figs</url>.</p> <p>Conclusion</p> <p>FiGS is an web-based application that automates an extensive search for the optimized gene selection analysis for a microarray dataset in a parallel computing environment. FiGS will provide both an efficient and comprehensive means of acquiring optimal gene sets that discriminate disease states from microarray datasets.</p

    HMPAS: Human Membrane Protein Analysis System

    Full text link

    An interactive retrieval system for clinical trial studies with context-dependent protocol elements.

    Get PDF
    A well-defined protocol for a clinical trial guarantees a successful outcome report. When designing the protocol, most researchers refer to electronic databases and extract protocol elements using a keyword search. However, state-of-the-art database systems only offer text-based searches for user-entered keywords. In this study, we present a database system with a context-dependent and protocol-element-selection function for successfully designing a clinical trial protocol. To do this, we first introduce a database for a protocol retrieval system constructed from individual protocol data extracted from 184,634 clinical trials and 13,210 frame structures of clinical trial protocols. The database contains a variety of semantic information that allows the filtering of protocols during the search operation. Based on the database, we developed a web application called the clinical trial protocol database system (CLIPS; available at https://corus.kaist.edu/clips). This system enables an interactive search by utilizing protocol elements. To enable an interactive search for combinations of protocol elements, CLIPS provides optional next element selection according to the previous element in the form of a connected tree. The validation results show that our method achieves better performance than that of existing databases in predicting phenotypic features

    Elucidation of Binding Determinants and Functional Consequences of Ras/Raf-Cysteine-rich Domain Interactions

    Get PDF
    Raf-1 is a critical downstream target of Ras and contains two distinct domains that bind Ras. The first Ras-binding site (RBS1) in Raf-1 has been shown to be essential for Ras-mediated translocation of Raf-1 to the plasma membrane, whereas the second site, in the Raf-1 cysteine-rich domain (Raf-CRD), has been implicated in regulating Raf kinase activity. While recognition elements that promote Ras.RBS1 complex formation have been characterized, relatively little is known about Ras/Raf-CRD interactions. In this study, we have characterized interactions important for Ras binding to the Raf-CRD. Reconciling conflicting reports, we found that these interactions are essentially independent of the guanine nucleotide bound state, but instead, are enhanced by post-translational modification of Ras. Specifically, our findings indicate that Ras farnesylation is sufficient for stable association of Ras with the Raf-CRD. Furthermore, we have also identified a Raf-CRD variant that is impaired specifically in its interactions with Ras. NMR data also suggests that residues proximal to this mutation site on the Raf-CRD form contacts with Ras. This Raf-CRD mutant impairs the ability of Ras to activate Raf kinase, thereby providing additional support that Ras interactions with the Raf-CRD are important for Ras-mediated activation of Raf-1

    CLIC: clustering analysis of large microarray datasets with individual dimension-based clustering

    Get PDF
    Large microarray data sets have recently become common. However, most available clustering methods do not easily handle large microarray data sets due to their very large computational complexity and memory requirements. Furthermore, typical clustering methods construct oversimplified clusters that ignore subtle but meaningful changes in the expression patterns present in large microarray data sets. It is necessary to develop an efficient clustering method that identifies both absolute expression differences and expression profile patterns in different expression levels for large microarray data sets. This study presents CLIC, which meets the requirements of clustering analysis particularly but not limited to large microarray data sets. CLIC is based on a novel concept in which genes are clustered in individual dimensions first and in which the ordinal labels of clusters in each dimension are then used for further full dimension-wide clustering. CLIC enables iterative sub-clustering into more homogeneous groups and the identification of common expression patterns among the genes separated in different groups due to the large difference in the expression levels. In addition, the computation of clustering is parallelized, the number of clusters is automatically detected, and the functional enrichment for each cluster and pattern is provided. CLIC is freely available at http://gexp2.kaist.ac.kr/clic

    Development and Validation of Tumor Immunogenicity Based Gene Signature for Skin Cancer Risk Stratification

    No full text
    Melanoma is one of the most aggressive types of skin cancer, with significant heterogeneity in overall survival. Currently, tumor-node-metastasis (TNM) staging is insufficient to provide accurate survival prediction and appropriate treatment decision making for several types of tumors, such as those in melanoma patients. Therefore, the identification of more reliable prognosis biomarkers is urgently essential. Recent studies have shown that low immune cells infiltration is significantly associated with unfavorable clinical outcome in melanoma patients. Here we constructed a prognostic-related gene signature for melanoma risk stratification by quantifying the levels of several cancer hallmarks and identify the Wnt/β-catenin activation pathway as a primary risk factor for low tumor immunity. A series of bioinformatics and statistical methods were combined and applied to construct a Wnt-immune-related prognosis gene signature. With this gene signature, we computed risk scores for individual patients that can predict overall survival. To evaluate the robustness of the result, we validated the signature in multiple independent GEO datasets. Finally, an overall survival-related nomogram was established based on the gene signature and clinicopathological features. The Wnt-immune-related prognostic risk score could better predict overall survival compared with standard clinicopathological features. Our results provide a comprehensive map of the oncogene-immune-related gene signature that can serve as valuable biomarkers for better clinical decision making
    corecore