64 research outputs found
Recommended from our members
Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data
This paper addresses the problem of a finite set of entities which are required to achieve a goal within a predefined deadline. For example, a group of students is supposed to submit a homework by a specified cutoff. Further, we are interested in predicting which entities will achieve the goal within the deadline. The predictive models are built based only on the data from that population. The predictions are computed at various time instants by taking into account updated data about the entities. The first contribution of the paper is a formal description of the problem. The important characteristic of the proposed method for model building is the use of the properties of entities that have already achieved the goal. We call such an approach “Self-Learning”. Since typically only a few entities have achieved the goal at the beginning and their number gradually grows, the problem is inherently imbalanced. To mitigate the curse of imbalance, we improved the Self-Learning method by tackling information loss and by several sampling techniques. The original Self-Learning and the modifications have been evaluated in a case study for predicting submission of the first assessment in distance higher education courses. The results show that the proposed improvements outperform the specified two base-line models and the original Self-Learner, and also that the best results are achieved if domain-driven techniques are utilised to tackle the imbalance problem. We also showed that these improvements are statistically significant using Wilcoxon signed rank test
Ouroboros: early identification of at-risk students without models based on legacy data
This paper focuses on the problem of identifying students, who are at risk of failing their course. The presented method proposes a solution in the absence of data from previous courses, which are usually used for training machine learning models. This situation typically occurs in new courses. We present the concept of a "self-learner" that builds the machine learning models from the data generated during the current course. The approach utilises information about already submitted assessments, which introduces the problem of imbalanced data for training and testing the classification models.
There are three main contributions of this paper: (1) the concept of training the models for identifying at-risk students using data from the current course, (2) specifying the problem as a classification task, and (3) tackling the challenge of imbalanced data, which appears both in training and testing data.
The results show the comparison with the traditional approach of learning the models from the legacy course data, validating the proposed concept
The effect of methamphetamine on biotransformation of ethanol: pilot study
Our results suggest that repeated pre-treatment with methamphetamine led to the decrease of ET levels in time points between minutes 0 and 120 after acute alcohol administration.Výsledek práce dokumentuje, že opakované podání metamfetaminu vede k poklesu hladinu ethanolu v časovém rozmezí 0. až 120. minuta po jeho akutním podání.Our results suggest that repeated pre-treatment with methamphetamine led to the decrease of ET levels in time points between minutes 0 and 120 after acute alcohol administration
SoluProt: prediction of soluble protein expression in Escherichia coli
Motivation: Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. Results: A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt's accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies
EnzymeMiner: Exploration of sequence space of enzymes
Please click Additional Files below to see the full abstract
FireProt: web server for automated design of thermostable proteins
There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnological applications. A number of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purification, and characterization. Here, we present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calculation core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.Web of Science45W1W399W39
EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities
Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Despite genomic databases growing exponentially, classical biochemical characterization techniques are time-demanding, cost-ineffective and low-throughput. Therefore, computational methods are being developed to explore the unmapped sequence space efficiently. Selection of putative enzymes for biochemical characterization based on rational and robust analysis of all available sequences remains an unsolved problem. To address this challenge, we have developed EnzymeMiner-a web server for automated screening and annotation of diverse family members that enables selection of hits for wet-lab experiments. EnzymeMiner prioritizes sequences that are more likely to preserve the catalytic activity and are heterologously expressible in a soluble form in Escherichia coli. The solubility prediction employs the in-house SoluProt predictor developed using machine learning. EnzymeMiner reduces the time devoted to data gathering, multi-step analysis, sequence prioritization and selection from days to hours. The successful use case for the haloalkane dehalogenase family is described in a comprehensive tutorial available on the EnzymeMiner web page
Methods for separation of ammonia from waste water
Tato práce přináší ucelený přehled většiny dosud známých metod, kterými lze separovat amoniak z odpadní vody. Metody jsou v práci rozdělené z hlediska jejich technologického principu na biologické a fyzikálně-chemické. Práce zkoumá uvedené metody i z hlediska jejich konvenčnosti a progresivnosti – zda jsou již běžně užíváné v praxi, či zda se jedná o procesy v experimentální fázi. A také sleduje jejich schopnost účinně odstranit či recyklovat amoniak a poskytnout tak možnost ho opětovně využít. To souvisí i s jejich hospodárností, kterou se práce také zabývá. Práce kromě základního popisu metod některé z nich dále rozvádí, protože u nich bylo zaznamenáno v posledních letech mnoho modifikací a zlepšení. Detailněji se práce zabývá aktuálními poznatky k stripování a membránovým metodám. Dále jsou blíže rozebrány bioelektrochemické systémy, jako metody s velkým potenciálem do budoucna. V poslední části práce jsou vybrané metody srovnány z hlediska účinnosti odstranění amoniaku a jejich energetické náročnosti.The main concern of this thesis is to provide a comprehensive overview of methods for ammonia separation from wastewater that have been discovered so far. The elementary differentiation is based on technological principle – biological and physical-chemical. Also, the thesis examines all methods from the „conventional or progressive“ point of view, which shows if the method is commonly used or is still in an experimental set up. Furthermore the focus is on classification of methods on „eliminating ammonia“ or „recovering ammonia“. This is closely connected to the economical aspects of each method which is also one of concerns of this thesis. Some of the methods are described with further details as there has been a recent progress and upgrading of these methods. It is air stripping and membrane methods. More attention is also provided to bioelectrochemical systems as it is proclaimed as one of the most suitable candidates for usage in the future. Last part of this work compares selected methods in terms of the ammonia elminiation or recovery efficiency and also power demand.
- …