Search CORE

973 research outputs found

Recommended from our members

How does predicate invention affect human comprehensibility?

Author: AA Freitas
AM Turing
B Letham
JR Quinlan
KD Forbus
L Sterling
M Mozina
SH Muggleton
SH Muggleton
SH Muggleton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

City Research Online

Crossref

Deeply sequenced metagenome and metatranscriptome of a biogas-producing microbial community from an agricultural production-scale biogas plant

Author: P Weiland
S Jaenicke
R Wirth
M Zakrzewski
Y Stolze
A Schlüter
U Gubler
AM Bolger
S Boisvert
B Langmead
H Li
D Hyatt
M Kanehisa
C Camacho
AR Quinlan
Publication venue: Oxford University Press (OUP)
Publication date: 01/01/2008
Field of study

Bremges A, Maus I, Belmann P, et al. Deeply sequenced metagenome and metatranscriptome of a biogas-producing microbial community from an agricultural production-scale biogas plant. GigaScience. 2015;4(1): 33.Background The production of biogas takes place under anaerobic conditions and involves microbial decomposition of organic matter. Most of the participating microbes are still unknown and non-cultivable. Accordingly, shotgun metagenome sequencing currently is the method of choice to obtain insights into community composition and the genetic repertoire. Findings Here, we report on the deeply sequenced metagenome and metatranscriptome of a complex biogas-producing microbial community from an agricultural production-scale biogas plant. We assembled the metagenome and, as an example application, show that we reconstructed most genes involved in the methane metabolism, a key pathway involving methanogenesis performed by methanogenic Archaea. This result indicates that there is sufficient sequencing coverage for most downstream analyses. Conclusions Sequenced at least one order of magnitude deeper than previous studies, our metagenome data will enable new insights into community composition and the genetic potential of important community members. Moreover, mapping of transcripts to reconstructed genome sequences will enable the identification of active metabolic pathways in target organisms

Crossref

Springer - Publisher Connector

Adelaide Research & Scholarship

Queensland University of Technology ePrints Archive

Publications at Bielefeld University

Using Bayesian Networks and Machine Learning to Predict Computer Science Success

Author: A Gupta
AM Shahiri
C Romero
E Osmanbegović
J Friedman
J Heaton
JR Quinlan
KB Korb
L Breiman
M Hall
MS Andrade
R Asif
RS Baker
RS Baker
S Boughorbel
W Xing
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2019
Field of study

Bayesian Networks and Machine Learning techniques were evaluated and compared for predicting academic performance of Computer Science students at the University of Cape Town. Bayesian Networks performed similarly to other classification models. The causal links AQ1 inherent in Bayesian Networks allow for understanding of the contributing factors for academic success in this field. The most effective indicators of success in first-year ‘core’ courses in Computer Science included the student’s scores for Mathematics and Physics as well as their aptitude for learning and their work ethos. It was found that unsuccessful students could be identified with ≈91% accuracy. This could help to increase throughput as well as student wellbeing at university

Crossref

UCT Computer Science Research Document Archive

Machine Learning Classification of Females Susceptibility to Visceral Fat Associated Diseases

Author: AM Dattilo
B Brouwers
C Haddow
C Manning
C Sudlow
CS Fox
IH Witten
IH Witten
J Faith
J Gu
J-P Després
JR Quinlan
JR Quinlan
M Bekkar
M Uusitupa
N Landwehr
NV Chawla
P Golabi
Q Yang
S Maheshwari
S Sam
S-H Chin
T Ayer
T Jonsdottir
YC Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The problem of classifying subjects into risk categories is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of these algorithms is to predict dichotomous responses (e.g. healthy/at risk) based on several features. Similarly to statistical inference models, also ML models are subject to the common problem of class imbalance. Therefore, they are affected by the majority class increasing the false-negative rate. In this paper, we built and evaluated eighteen ML models classifying approximately 4300 female participants from the UK Biobank into three categorical risk statuses based on responses for the discretised visceral adipose tissue values from magnetic resonance imaging. We also examined the effect of sampling techniques on classification modelling when dealing with class imbalance. Results showed that the use of sampling techniques had a significant impact. They not only drove an improvement in predicting patients risk status but also facilitated an increase in the information contained within each variable. Based on domain experts criteria, the three best models for classification were finally identified. These encouraging results will guide further developments of classification models for predicting visceral adipose tissue without the need for a costly scan

Crossref

Kent Academic Repository

WestminsterResearch

Direct access:how is it working?

Author: A Northcott
AM Shelly
D Locker
D Naughton
JV Cook
K Quinlan
L Glidewell
M Ross
M. Ross
MK Ross
P Brocklehurst
P Parashos
P Warr
PC Hardigan
R Macey
S. Turner
T Dyer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2017
Field of study

Crossref

University of Dundee Online Publications

Cancer cells exploit an orphan RNA to drive metastatic progression.

Author: A Fabregat
A Helwak
AA Margolin
AJ Minn
Alicia Y. Zhou
AM Ainsztein
Andrei Goga
AR Kodahl
AR Quinlan
B Kim
B Langmead
Bruce Culbertson
C Roth
CI Wu
CJ David
D Huo
D Ray
DK Simanshu
DN Cooper
EL Nostrand Van
H Goodarzi
H Goodarzi
H Goodarzi
H Goodarzi
Hani Goodarzi
JM Loo
Johnny X. Yu
K Cuk
L Fish
Lisa Fish
LY Chen
M Rehmsmeier
MI Love
MJ Moore
NN Hooten
O Elemento
PD Bos
R Bhargava
R Ren
RK Lin
RO Bak
S Kishore
S Vanharanta
SF Tavazoie
Steven Zhang
T Fiskaa
W Zhou
X Wu
YCT Yang
YS DeRose
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

Here we performed a systematic search to identify breast-cancer-specific small noncoding RNAs, which we have collectively termed orphan noncoding RNAs (oncRNAs). We subsequently discovered that one of these oncRNAs, which originates from the 3' end of TERC, acts as a regulator of gene expression and is a robust promoter of breast cancer metastasis. This oncRNA, which we have named T3p, exerts its prometastatic effects by acting as an inhibitor of RISC complex activity and increasing the expression of the prometastatic genes NUPR1 and PANX2. Furthermore, we have shown that oncRNAs are present in cancer-cell-derived extracellular vesicles, raising the possibility that these circulating oncRNAs may also have a role in non-cell autonomous disease pathogenesis. Additionally, these circulating oncRNAs present a novel avenue for cancer fingerprinting using liquid biopsies

Crossref

eScholarship - University of California

Recommended from our members

Ultra-Strong Machine Learning: comprehensibility of programs learned with ILP

Author: A Srinivasan
AA Freitas
Alireza Tamaddoni-Nezhad
AM Turing
B Chandrasekaran
B Letham
Christina Zeller
E Kitzelmann
EA Lemke
F Bergadano
H Kahney
H Schielzeth
J Huysmans
JR Quinlan
JR Quinlan
KD Forbus
L Sterling
M Mozina
MD Hauser
MR Wick
SH Muggleton
SH Muggleton
SH Muggleton
SH Muggleton
SH Muggleton
Stephen H. Muggleton
Tarek Besold
TM Mitchell
U Schmid
Ute Schmid
WJ Clancey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

During the 1980s Michie defined Machine Learning in terms of two orthogonal axes of performance: predictive accuracy and comprehensibility of generated hypotheses. Since predictive accuracy was readily measurable and comprehensibility not so, later definitions in the 1990s, such as Mitchell’s, tended to use a one-dimensional approach to Machine Learning based solely on predictive accuracy, ultimately favouring statistical over symbolic Machine Learning approaches. In this paper we provide a definition of comprehensibility of hypotheses which can be estimated using human participant trials. We present two sets of experiments testing human comprehensibility of logic programs. In the first experiment we test human comprehensibility with and without predicate invention. Results indicate comprehensibility is affected not only by the complexity of the presented program but also by the existence of anonymous predicate symbols. In the second experiment we directly test whether any state-of-the-art ILP systems are ultra-strong learners in Michie’s sense, and select the Metagol system for use in humans trials. Results show participants were not able to learn the relational concept on their own from a set of examples but they were able to apply the relational definition provided by the ILP system correctly. This implies the existence of a class of relational concepts which are hard to acquire for humans, though easy to understand given an abstract explanation. We believe improved understanding of this class could have potential relevance to contexts involving human learning, teaching and verbal interaction

City Research Online

Crossref

University of Surrey

Spiral - Imperial College Digital Repository

Multiple Imputation Ensembles (MIE) for dealing with missing data

Author: A Farhangfar
AM Sefidian
B Schölkopf
C Cortes
CT Tran
DA Newman
DB Rubin
DB Rubin
DH Wolpert
EL Silva-Ramírez
GE Batista
GJ van der Heijden
H Gao
IH Witten
J Demšar
J Honaker
J Honaker
J Scheffer
JA Sterne
JL Schafer
JL Schafer
JR Quinlan
K Abayomi
KM Ting
L Breiman
L Breiman
L Rokach
M Fichman
M Khalilia
M Spratt
MA Klebanoff
MJ Azur
NJ Horton
PJ García-Laencina
PJ Kelly
PN Tan
RJ Little
S García
S Van Buuren
S Van Buuren
SS Chae
SS Choi
U Garciarena
V Vapnik
X Chen
Y Dong
Y Freund
Y He
Z Che
Z Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2020
Field of study

Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

Crossref

University of East Anglia digital repository

Predicting cell types and genetic variations contributing to disease by combining GWAS and epigenetic data

Author: A Milosavljevic
A Pekowska
A Visel
AJ Saldanha
AM Mondul
Anjana Rao
Anna Gerasimova
AR Quinlan
BE Bernstein
BE Himes
BE Stranger
Bin Li
Bjoern Peters
C Ober
C Zang
Consortium The International HapMap
D Karolchik
DB Hancock
DG Torgerson
E Birney
E Noguchi
EK Miller
F Castro-Giner
G Hon
GE Zentner
Gregory Seumois
J Ernst
J Ernst
J Harrow
Jason Greenbaum
JH Kim
JJ Farrell
KM Ansel
LD Ward
Lukas Chavez
MA Schaub
MF Moffatt
MF Moffatt
MJ de Hoon
MP Creyghton
MT Maurano
ND Heintzman
NU Rashid
Pandurangan Vijayanand
PB Talbert
PM Sleiman
PM Visscher
R Bhandare
R Jaenisch
S Chanock
S Michel
S Weidinger
T Hirota
TH Pham
W Yu
Y Li
Y Zhang
Yi-Hsiang Hsu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Genome-wide association studies (GWASs) identify single nucleotide polymorphisms (SNPs) that are enriched in individuals suffering from a given disease. Most disease-associated SNPs fall into non-coding regions, so that it is not straightforward to infer phenotype or function; moreover, many SNPs are in tight genetic linkage, so that a SNP identified as associated with a particular disease may not itself be causal, but rather signify the presence of a linked SNP that is functionally relevant to disease pathogenesis. Here, we present an analysis method that takes advantage of the recent rapid accumulation of epigenomics data to address these problems for some SNPs. Using asthma as a prototypic example; we show that non-coding disease-associated SNPs are enriched in genomic regions that function as regulators of transcription, such as enhancers and promoters. Identifying enhancers based on the presence of the histone modification marks such as H3K4me1 in different cell types, we show that the location of enhancers is highly cell-type specific. We use these findings to predict which SNPs are likely to be directly contributing to disease based on their presence in regulatory regions, and in which cell types their effect is expected to be detectable. Moreover, we can also predict which cell types contribute to a disease based on overlap of the disease-associated SNPs with the locations of enhancers present in a given cell type. Finally, we suggest that it will be possible to re-analyze GWAS studies with much higher power by limiting the SNPs considered to those in coding or regulatory regions of cell types relevant to a given disease

CiteSeerX

Public Library of Science (PLOS)

Southampton (e-Prints Soton)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare