38 research outputs found

    Ouroboros: early identification of at-risk students without models based on legacy data

    Get PDF
    This paper focuses on the problem of identifying students, who are at risk of failing their course. The presented method proposes a solution in the absence of data from previous courses, which are usually used for training machine learning models. This situation typically occurs in new courses. We present the concept of a "self-learner" that builds the machine learning models from the data generated during the current course. The approach utilises information about already submitted assessments, which introduces the problem of imbalanced data for training and testing the classification models. There are three main contributions of this paper: (1) the concept of training the models for identifying at-risk students using data from the current course, (2) specifying the problem as a classification task, and (3) tackling the challenge of imbalanced data, which appears both in training and testing data. The results show the comparison with the traditional approach of learning the models from the legacy course data, validating the proposed concept

    FireProt: web server for automated design of thermostable proteins

    Get PDF
    There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnological applications. A number of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purification, and characterization. Here, we present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calculation core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.Web of Science45W1W399W39

    SoluProt: prediction of soluble protein expression in Escherichia coli

    Get PDF
    Motivation: Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. Results: A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt's accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies

    EnzymeMiner: Exploration of sequence space of enzymes

    Get PDF
    Please click Additional Files below to see the full abstract

    EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities

    Get PDF
    Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Despite genomic databases growing exponentially, classical biochemical characterization techniques are time-demanding, cost-ineffective and low-throughput. Therefore, computational methods are being developed to explore the unmapped sequence space efficiently. Selection of putative enzymes for biochemical characterization based on rational and robust analysis of all available sequences remains an unsolved problem. To address this challenge, we have developed EnzymeMiner-a web server for automated screening and annotation of diverse family members that enables selection of hits for wet-lab experiments. EnzymeMiner prioritizes sequences that are more likely to preserve the catalytic activity and are heterologously expressible in a soluble form in Escherichia coli. The solubility prediction employs the in-house SoluProt predictor developed using machine learning. EnzymeMiner reduces the time devoted to data gathering, multi-step analysis, sequence prioritization and selection from days to hours. The successful use case for the haloalkane dehalogenase family is described in a comprehensive tutorial available on the EnzymeMiner web page

    System for functional annotation of single nucleotide polymorphisms

    Get PDF
    Single nucleotide polymorphisms are the substitution of one nucleotide in the DNA sequence that may or may not have phenotypic consequences. Here we describe a new system for ranking non-synonymous protein substitutions by their deleterious effects. The computational core of the proposed system is based on a rational combination of the results from the selected subset of publicly available tools. The weight coefficients for the individual tools are calculated on the basis of their confidence score and their reliabilities are assigned accordingly to their performance measured on the extensive dataset. The validation of the performance on the dataset consisting of 5 000 substitutions shows that overall accuracy of the system was improved by 6% in comparison to the simple majority vote

    Distributed information system as a system of asynchronous concurrent processes

    No full text
    Abstract. Nowadays enterprise information systems are designed as distributed network systems, where existing information systems and new components are connected together via a middleware. In most cases, architectures of the systems can be described informally or semiformally by means of common design tools. But there are also critical applications where an information system is getting involved, and a formal architecture specification is necessary. This paper describes a design of a framework for distributed information systems with a mobile architecture and an outline of its implementation. The framework provides an automatic derivation of a formal specification from an implementation of system, without an explicit formal description in a design phase of project. The derived specification can be used for a quick formal proof of correctness after radical changes in an implementation phase, without a maintenance of a formal design.
    corecore