7,108 research outputs found

    Revisiting the Training of Logic Models of Protein Signaling Networks with a Formal Approach based on Answer Set Programming

    Get PDF
    A fundamental question in systems biology is the construction and training to data of mathematical models. Logic formalisms have become very popular to model signaling networks because their simplicity allows us to model large systems encompassing hundreds of proteins. An approach to train (Boolean) logic models to high-throughput phospho-proteomics data was recently introduced and solved using optimization heuristics based on stochastic methods. Here we demonstrate how this problem can be solved using Answer Set Programming (ASP), a declarative problem solving paradigm, in which a problem is encoded as a logical program such that its answer sets represent solutions to the problem. ASP has significant improvements over heuristic methods in terms of efficiency and scalability, it guarantees global optimality of solutions as well as provides a complete set of solutions. We illustrate the application of ASP with in silico cases based on realistic networks and data

    Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

    Get PDF
    The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are only suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discrimatory power of a prediction rule. Specifically, we propose a component-wise boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result

    A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots

    Get PDF
    This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turner’s thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms

    Physico-chemical foundations underpinning microarray and next-generation sequencing experiments

    Get PDF
    Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized

    Statistical modeling of RNA structure profiling experiments enables parsimonious reconstruction of structure landscapes.

    Get PDF
    RNA plays key regulatory roles in diverse cellular processes, where its functionality often derives from folding into and converting between structures. Many RNAs further rely on co-existence of alternative structures, which govern their response to cellular signals. However, characterizing heterogeneous landscapes is difficult, both experimentally and computationally. Recently, structure profiling experiments have emerged as powerful and affordable structure characterization methods, which improve computational structure prediction. To date, efforts have centered on predicting one optimal structure, with much less progress made on multiple-structure prediction. Here, we report a probabilistic modeling approach that predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data. We demonstrate robust landscape reconstruction and quantitative insights into structural dynamics by analyzing numerous data sets. This work establishes a framework for data-directed characterization of structure landscapes to aid experimentalists in performing structure-function studies

    Essential guidelines for computational method benchmarking

    Get PDF
    In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.Comment: Minor update

    Development of Artificial Intelligence systems as a prediction tool in ovarian cancer

    Get PDF
    PhD ThesisOvarian cancer is the 5th most common cancer in females and the UK has one of the highest incident rates in Europe. In the UK only 36% of patients will live for at least 5 years after diagnosis. The number of prognostic markers, treatments and the sequences of treatments in ovarian cancer are rising. Therefore, it is getting more difficult for the human brain to perform clinical decision making. There is a need for an expert computer system (e.g. Artificial Intelligence (AI)), which is capable of investigating the possible outcomes for each marker, treatment and sequence of treatment. Such expert systems may provide a tool which could help clinicians to analyse and predict outcome using different treatment pathways. Whilst prediction of overall survival of a patient is difficult there may be some benefits, as this not only is useful information for the patient but may also determine treatment modality. In this project a dataset was constructed of 352 patients who had been treated at a single centre. Clinical data were extracted from the health records. Expert systems were then investigated to determine the optimum model to predict overall survival of a patient. The five year survival period (a standard survival outcome measure in cancer research) was investigated; in addition, the system was developed with the flexibility to predict patient survival rates for many other categories. Comparisons with currently used prognostic models in ovarian cancer demonstrated a significant improvement in performance for the AI model (Area under the Curve (AUC) of 0.72 for AI and AUC of 0.62 for the statistical model). Using various methods, the most important variables in this prediction were identified as: FIGO stage, outcome of the surgery and CA125. This research investigated the effects of increasing the number of cases in prediction models. Results indicated that by increasing the number of cases, the prediction performance improved. Categorization of continuous data did not improve the prediction performance. The project next investigated the possibility of predicting surgical outcomes in ovarian cancer using AI, based on the variables that are available for clinicians prior to the surgery. Such a tool could have direct clinical relevance. Diverse models that can predict the outcome of the surgery were investigated and developed. The developed AI models were also compared against the standard statistical prediction model, which demonstrated that the AI model outperformed the statistical prediction model: the prediction of all outcomes (complete or optimal or suboptimal) (AUC of AI: 0.71 and AUC of statistical model: 0.51), the prediction of complete or optimal cytoreduction versus suboptimal cytoreduction (AUC of AI: 0.73 and AUC of statistical model: 0.50) and finally the prediction of complete cytoreduction versus optimal or suboptimal cytoreduction (AUC of AI: 0.79 and AUC of statistical model: 0.47). The most important variables for this prediction were identified as: FIGO stage, tumour grade and histology. The application of transcriptomic analysis to cancer research raises the question of which genes are significantly involved in a particular cancer and which genes can accurately predict survival outcomes in a given cancer. Therefore, AI techniques were employed to identify the most important genes for the prediction of Homologous Recombination (HR), an important DNA repair pathway in ovarian cancer, identifying LIG1 and POLD3 as novel prognostic biomarkers. Finally, AI models were used to predict the HR status for any given patient (AUC: 0.87). This project has demonstrated that AI may have an important role in ovarian cancer. AI systems may provide tools to help clinicians and research in ovarian cancer and may allow more informed decisions resulting in better management of this cancer
    corecore