Search CORE

166,404 research outputs found

A Comparative Study of Various Probability Density Estimation Methods for Data Analysis

Author: Assenza Alex
Valle Maurizio
Verleysen Michel
Publication venue
Publication date: 02/05/2013
Field of study

Prediction of protein-protein interactions using one-class classification methods and integrating diverse data

Author: Gilbert D
Reyes J A
Publication venue: JIB
Publication date: 01/01/2007
Field of study

This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results. Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature

CiteSeerX

Directory of Open Access Journals

Brunel University Research Archive

How to (and How Not to) Analyze Deficient Height Samples

Author: Komlos John
Publication venue
Publication date: 01/07/2003
Field of study

Open Access LMU

Reliable microsatellite genotyping of the Eurasian badger (Meles meles) using faecal DNA

Author: Boom R
Bruford MW
Christian SF
Dytham C
Evett IW
Jansman HAH
Krebs JR
Miller CR
Murphy MA
Neal E
Otis DL
Sambrook J
Seymour SB
Valière N
Walsh PA
Woods JG
Publication venue: 'Wiley'
Publication date: 01/06/2003
Field of study

The potential link between badgers and bovine tuberculosis has made it vital to develop accurate techniques to census badgers. Here we investigate the potential of using genetic profiles obtained from faecal DNA as a basis for population size estimation. After trialling several methods we obtained a high amplification success rate (89%) by storing faeces in 70% ethanol and using the guanidine thiocyanate/silica method for extraction. Using 70% ethanol as a storage agent had the advantage of it being an antiseptic. In order to obtain reliable genotypes with fewer amplification reactions than the standard multiple-tubes approach, we devised a comparative approach in which genetic profiles were compared and replication directed at similar, but not identical, genotypes. This modified method achieved a reduction in polymerase chain reactions comparable with the maximumlikelihood model when just using reliability criteria, and was slightly better when using reliability criteria with the additional proviso that alleles must be observed twice to be considered reliable. Our comparative approach would be best suited for studies that include multiple faeces from each individual. We utilized our approach in a well-studied population of badgers from which individuals had been sampled and reliable genotypes obtained. In a study of 53 faeces sampled from three social groups over 10 days, we found that direct enumeration could not be used to estimate population size, but that the application of mark–recapture models has the potential to provide more accurate results

Crossref

White Rose Research Online

Sussex Research Online

University of Queensland eSpace

Evaluating probabilistic forecasts with scoringRules

Author: Jordan Alexander
Krüger Fabian
Lerch Sebastian
Publication venue
Publication date: 30/07/2018
Field of study

Probabilistic forecasts in the form of probability distributions over future events have become popular in several fields including meteorology, hydrology, economics, and demography. In typical applications, many alternative statistical models and data sources can be used to produce probabilistic forecasts. Hence, evaluating and selecting among competing methods is an important task. The scoringRules package for R provides functionality for comparative evaluation of probabilistic models based on proper scoring rules, covering a wide range of situations in applied work. This paper discusses implementation and usage details, presents case studies from meteorology and economics, and points to the relevant background literature

arXiv.org e-Print Archive

Journal of Statistical Software

Bern Open Repository and Information System (BORIS)