166,404 research outputs found

    Prediction of protein-protein interactions using one-class classification methods and integrating diverse data

    Get PDF
    This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results. Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature

    How to (and How Not to) Analyze Deficient Height Samples

    Get PDF

    Reliable microsatellite genotyping of the Eurasian badger (Meles meles) using faecal DNA

    Get PDF
    The potential link between badgers and bovine tuberculosis has made it vital to develop accurate techniques to census badgers. Here we investigate the potential of using genetic profiles obtained from faecal DNA as a basis for population size estimation. After trialling several methods we obtained a high amplification success rate (89%) by storing faeces in 70% ethanol and using the guanidine thiocyanate/silica method for extraction. Using 70% ethanol as a storage agent had the advantage of it being an antiseptic. In order to obtain reliable genotypes with fewer amplification reactions than the standard multiple-tubes approach, we devised a comparative approach in which genetic profiles were compared and replication directed at similar, but not identical, genotypes. This modified method achieved a reduction in polymerase chain reactions comparable with the maximumlikelihood model when just using reliability criteria, and was slightly better when using reliability criteria with the additional proviso that alleles must be observed twice to be considered reliable. Our comparative approach would be best suited for studies that include multiple faeces from each individual. We utilized our approach in a well-studied population of badgers from which individuals had been sampled and reliable genotypes obtained. In a study of 53 faeces sampled from three social groups over 10 days, we found that direct enumeration could not be used to estimate population size, but that the application of mark–recapture models has the potential to provide more accurate results

    Evaluating probabilistic forecasts with scoringRules

    Get PDF
    Probabilistic forecasts in the form of probability distributions over future events have become popular in several fields including meteorology, hydrology, economics, and demography. In typical applications, many alternative statistical models and data sources can be used to produce probabilistic forecasts. Hence, evaluating and selecting among competing methods is an important task. The scoringRules package for R provides functionality for comparative evaluation of probabilistic models based on proper scoring rules, covering a wide range of situations in applied work. This paper discusses implementation and usage details, presents case studies from meteorology and economics, and points to the relevant background literature
    corecore