Search CORE

Edinburgh Research Explorer

mGene.web: a web service for accurate computational gene finding

Author: A. Zien
Bernal
Besemer
Brent
C. S. Ong
G. Ratsch
G. Schweikert
G. Zeller
J. Behr
S. Sonnenburg
Salamov
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Transcript quantification with RNA-Seq data

Author: A Mortazavi
G Schweikert
Gunnar Rätsch
H Jiang
Jonas Behr
M Sammeth
Regina Bohnert
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Motivation Novel high-throughput sequencing technologies open exciting new approaches to transcriptome profiling. Sequencing transcript populations of interest, e.g. from different tissues or variable stress conditions, with RNA sequencing (RNA-Seq) [1] generates millions of short reads. Accurately aligned to a reference genome, they provide digital counts and thus facilitate transcript quantification. As the observed read counts only provide the summation of all expressed sequences at one locus, the inference of the underlying transcript abundances is crucial for further quantitative analyses. Methods To approach this problem, we have developed a new technique, called rQuant, based on quadratic programming. Given a gene annotation and position-wise exon/intron read coverage from read alignments, we determine the abundances for each annotated transcript by minimising a suitable loss function. It penalises the deviation of the observed from the expected read coverage given the transcript weights. The observed read coverage is typically non-uniformly distributed over the transcript due to several biases in the generation of the sequencing libraries and the sequencing. This leads to distortions of the transcript abundances, if not corrected properly. We therefore extended our approach to jointly optimise transcript profiles, modeling the coverage deviations depending on the position in the transcript. Our method can be applied without knowledge of the underlying transcript abundances and equally benefits from loci with and without alternative transcripts. Results To quantitatively evaluate the quality of our abundance predictions, we used a set of simulated reads from transcripts with known expression as a benchmark set. It was generated using the Flux Simulator [2] modeling biases in RNA-Seq as well as preparation experiments. Table 1 shows preliminary results with segment- and position-based loss as well as with and without the transcript profiles. Our results indicate that the position-based modeling together with transcript profiles allows us to accurately infer the underlying expression of single transcripts as well as of multiple isoforms of one gene locus

arXiv.org e-Print Archive

Reproducing Kernels of Generalized Sobolev Spaces via a Green Function Approach with Distributional Operators

Author: A. Berlinet
A. Bouhamidi
A. Bouhamidi
A.J. Smola
B. Schölkopf
D.G. Schweikert
E.M. Stein
G. Wahba
G.E. Fasshauer
Gregory E. Fasshauer
H. Wendland
J. Duchon
J. Kybic
M.D. Buhmann
M.L. Stein
Qi Ye
R. Schaback
R.A. Adams
W.A. Light
W.R. Madych
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/03/2013
Field of study

In this paper we introduce a generalized Sobolev space by defining a semi-inner product formulated in terms of a vector distributional operator

\mathbf{P}

consisting of finitely or countably many distributional operators

P_n

, which are defined on the dual space of the Schwartz space. The types of operators we consider include not only differential operators, but also more general distributional operators such as pseudo-differential operators. We deduce that a certain appropriate full-space Green function

G

with respect to

L:=\mathbf{P}^{\ast T}\mathbf{P}

now becomes a conditionally positive definite function. In order to support this claim we ensure that the distributional adjoint operator

\mathbf{P}^{\ast}

\mathbf{P}

is well-defined in the distributional sense. Under sufficient conditions, the native space (reproducing-kernel Hilbert space) associated with the Green function

G

can be isometrically embedded into or even be isometrically equivalent to a generalized Sobolev space. As an application, we take linear combinations of translates of the Green function with possibly added polynomial terms and construct a multivariate minimum-norm interpolant

s_{f,X}

to data values sampled from an unknown generalized Sobolev function

f

at data sites located in some set

X \subset \mathbb{R}^d

. We provide several examples, such as Mat\'ern kernels or Gaussian kernels, that illustrate how many reproducing-kernel Hilbert spaces of well-known reproducing kernels are isometrically equivalent to a generalized Sobolev space. These examples further illustrate how we can rescale the Sobolev spaces by the vector distributional operator

\mathbf{P}

. Introducing the notion of scale as part of the definition of a generalized Sobolev space may help us to choose the "best" kernel function for kernel-based approximation methods.Comment: Update version of the publish at Num. Math. closed to Qi Ye's Ph.D. thesis (\url{http://mypages.iit.edu/~qye3/PhdThesis-2012-AMS-QiYe-IIT.pdf}

Erasmus University Digital Repository

Androgen receptor abnormalities

Author: Brinkmann A.O. (Albert)
Kuiper G.G.J.M. (George)
Mulder E. (Eppo)
Pinsky L. (L.)
Ris-Stalpers C. (Carolyn)
Romalo G. (G.)
Rooij H.C.J. (Henri) van
Schweikert H.U. (H.)
Trapman J. (Jan)
Trifirò G. (Gianluca)
Publication venue: 'Elsevier BV'
Publication date: 01/01/1991
Field of study

The human androgen receptor is a member of the superfamily of steroid hormone receptors. Proper functioning of this protein is a prerequisite for normal male sexual differentiation and development. The cloning of the human androgen receptor cDNA and the elucidation of the genomic organization of the corresponding gene has enabled us to study androgen receptors in subjects with the clinical manifestation of androgen insensitivity and in a human prostate carcinoma cell line (LNCaP). Using PCR amplification, subcloning and sequencing of exons 2–8, we identified a G → T mutation in the androgen receptor gene of a subject with the complete form of androgen insensitivity, which inactivates the splice donor site at the exon 4/intron 4 boundary. This mutation causes the inactivation of a cryptic splice donor site in exon 4, which results in the deletion of 41 amino acids from the steroid binding domain. In two other independently arising cases we identified two different nucleotide alterations in codon 686 (GAC; aspartic acid) located in exon 4. One mutation (G → C) results in an aspartic acid → histidine substitution (with negligible androgen binding), whereas the other mutation (G → A) leads to an aspartic acid → asparagine substitution (normal androgen binding, but a rapidly dissociating androgen receptor complex). Sequence analysis of the androgen receptor in human LNCaP-cells (lymph node carcinoma of the prostate) revealed a point mutation (A → G) in codon 868 in exon 8 resulting in the substitution of threonine by alanine. This mutation is the cause of the altered steroid binding specificity of the LNCaP-cell androgen receptor. The functional consequences of the observed mutations with respect to protein expression, specific ligand binding and transcriptional activation, were established after transient expression of the mutant receptors in COS and HeLa cells. These findings illustrate that functional error

Variation of health-related quality of life assessed by caregivers and patients affected by severe childhood infections.

Author: A David
A Vuorialho
AE Carroll
AF Klassen
AS Pickard
AW Glaser
B Schweikert
BC Stade
Bureau of Epidemiology
C Eiser
CB Baca
CM McDonough
DM Bushnell
E Gibbons
E Speyer
FH Bess
G Naglie
G Samsa
H Mohan
H Sinno
Health Utilities Inc
HJ Loonen
HJ Zhou
HL Wee
HL Wee
I Griebsch
J Horsman
JE Bennett
JN Doctor
John Cairns
JP Grutters
L Sung
M Drummond
MA Joore
MJ Harrison
MT Dyer
N Guo
N Szecket
P Sakthong
Pattara Leelahavarong
R Development Core Team
R Legood
R Oostenbrink
RJ Klaassen
S Petrou
S Tongsiri
SN Davison
Virasakdi Chongsuvivatwong
Vorasith Sornsrivichai
Wantanee Kulpeng
Waranya Rattanavipapong
X Sun
Yoel Lubell
Yot Teerawattananon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: The agreement between self-reported and proxy measures of health status in ill children is not well established. This study aimed to quantify the variation in health-related quality of life (HRQOL) derived from young patients and their carers using different instruments. METHODS: A hospital-based cross-sectional survey was conducted between August 2010 and March 2011. Children with meningitis, bacteremia, pneumonia, acute otitis media, hearing loss, chronic lung disease, epilepsy, mild mental retardation, severe mental retardation, and mental retardation combined with epilepsy, aged between five to 14 years in seven tertiary hospitals were selected for participation in this study. The Health Utilities Index Mark 2 (HUI2), and Mark 3 (HUI3), and the EuroQoL Descriptive System (EQ-5D) and Visual Analogue Scale (EQ-VAS) were applied to both paediatric patients (self-assessment) and caregivers (proxy-assessment). RESULTS: The EQ-5D scores were lowest for acute conditions such as meningitis, bacteremia, and pneumonia, whereas the HUI3 scores were lowest for most chronic conditions such as hearing loss and severe mental retardation. Comparing patient and proxy scores (n = 74), the EQ-5D exhibited high correlation (r = 0.77) while in the HUI2 and HUI3 patient and caregiver scores were moderately correlated (r = 0.58 and 0.67 respectively). The mean difference between self and proxy-assessment using the HUI2, HUI3, EQ-5D and EQ-VAS scores were 0.03, 0.05, -0.03 and -0.02, respectively. In hearing-impaired and chronic lung patients the self-rated HRQOL differed significantly from their caregivers. CONCLUSIONS: The use of caregivers as proxies for measuring HRQOL in young patients affected by pneumococcal infection and its sequelae should be employed with caution. Given the high correlation between instruments, each of the HRQOL instruments appears acceptable apart from the EQ-VAS which exhibited low correlation with the others

LSHTM Research Online

Oxford University Research Archive

DGW: an exploratory data analysis tool for clustering and visualisation of epigenomic marks

Author: A Barski
A Kundaje
BW Matthews
C Taslim
D Benveniste
ENCODE Project Consortium
G Jurman
G Schweikert
Gabriele B. Schweikert
GJ Filion
Guido Sanguinetti
H Sakoe
M Müller
MB Eisen
NI Bieberstein
Roberto Visintainer
Saulius Lukauskas
TA Knijnenburg
TS Furey
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background Functional genomic and epigenomic research relies fundamentally on sequencing based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent. Results We present DGW, an open source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses Dynamic Time Warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project. Conclusions Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open source Python package

Archivio della ricerca - Fondazione Bruno Kessler

Edinburgh Research Explorer

University of Dundee Online Publications

Adiposity has differing associations with incident coronary heart disease and mortality in the Scottish population: cross-sectional surveys with follow-up

Author: A Adbullah
A H Leyland
A Romero-Corral
A Romero-Corral
A Romero-Corral
A Rosengren
A Shaw
AE Taylor
AJ Sogaard
B Schweikert
B Unal
B Wells
BJ Steinberg
C A Davies
C Bromley
D Wormser
DB Allison
DL McGee
DM Mann
DP Guh
E Pullen
E Strandhagen
EL Eisenstein
EW Gregg
G Whitlock
H Tunstall-Pedoe
HM Orpana
J Bigaard
J Rasbash
J Stevens
J W Hotchkiss
JA Simpson
Joel William Hotchkiss
JW Hotchkiss
K Barakat
KM Flegal
KM Flegal
KM Rexrode
KM Rexrode
L Gray
LL Yan
M Myerson
M Walker
MEJ Lean
MM Finucane
P Jousilahti
P McLoone
P Royston
PT Katzmarzyk
R Lawder
RH Eckel
RW Yeh
S Capewell
S Capewell
S Czernichow
S Yusuf
T Pischon
TE Strandberg
W Dong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Objective: Investigation of the association of excess adiposity with three different outcomes: all-cause mortality, coronary heart disease (CHD) mortality and incident CHD. Design: Cross-sectional surveys linked to hospital admissions and death records. Subjects: 19 329 adults (aged 18–86 years) from a representative sample of the Scottish population. Measurements: Gender-stratified Cox proportional hazards models were used to estimate hazard ratios (HRs) for all-cause mortality, CHD mortality and incident CHD. Separate models incorporating the anthropometric measurements body mass index (BMI), waist circumference (WC) or waist–hip ratio (WHR) were created adjusted for age, year of survey, smoking status and alcohol consumption. Results: For both genders, BMI-defined obesity (greater than or equal to30 kg m−2) was not associated with either an increased risk of all-cause mortality or CHD mortality. However, there was an increased risk of incident CHD among the obese men (hazard ratio (HR)=1.78; 95% confidence interval=1.37–2.31) and obese women (HR=1.93; 95% confidence interval=1.44–2.59). There was a similar pattern for WC with regard to the three outcomes; for incident CHD, the HR=1.70 (1.35–2.14) for men and 1.71 (1.28–2.29) for women in the highest WC category (men greater than or equal to102 cm, women greater than or equal to88 cm), synonymous with abdominal obesity. For men, the highest category of WHR (greater than or equal to1.0) was associated with an increased risk of all-cause mortality (1.29; 1.04–1.60) and incident CHD (1.55; 1.19–2.01). Among women with a high WHR (greater than or equal to0.85) there was an increased risk of all outcomes: all-cause mortality (1.56; 1.26–1.94), CHD mortality (2.49; 1.36–4.56) and incident CHD (1.76; 1.31–2.38). Conclusions: In this study excess adiposity was associated with an increased risk of incident CHD but not necessarily death. One possibility is that modern medical intervention has contributed to improved survival of first CHD events. The future health burden of increased obesity levels may manifest as an increase in the prevalence of individuals living with CHD and its consequences

Enlighten

Exploiting physico-chemical properties in string kernels

Author: B Peters
B Shen
C Leslie
C Leslie
C Leslie
Christian Widmer
CS Ong
CS Ong
CW Tung
G Rätsch
G Rätsch
G Schweikert
Gunnar Rätsch
H Rangwala
H Saigo
J Weston
L Jacob
M Röttig
M Venkatarajan
N Pfeifer
Nora C Toussaint
Oliver Kohlbacher
R Kuang
RM Clark
S Henikoff
S Kawashima
S Sonnenburg
S Sonnenburg
SJ Schultheiss
V Roth
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background String kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences. Although string kernels are already very powerful, when it comes to amino acids they have a major short coming. They ignore an important piece of information when comparing amino acids: the physico-chemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. There have been only very few approaches so far that aim at combining these two ideas. Results We propose new string kernels that combine the benefits of physico-chemical descriptors for amino acids with the ones of string kernels. The benefits of the proposed kernels are assessed on two problems: MHC-peptide binding classification using position specific kernels and protein classification based on the substring spectrum of the sequences. Our experiments demonstrate that the incorporation of amino acid properties in string kernels yields improved performances compared to standard string kernels and to previously proposed non-substring kernels. Conclusions In summary, the proposed modifications, in particular the combination with the RBF substring kernel, consistently yield improvements without affecting the computational complexity. The proposed kernels therefore appear to be the kernels of choice for any protein sequence-based inference. Availability Data sets, code and additional information are available from <url>http://www.fml.tuebingen.mpg.de/raetsch/suppl/aask</url>. Implementations of the developed kernels are available as part of the Shogun toolbox.</p

Directory of Open Access Journals

Inferring latent task structure for Multitask Learning by Multiple Kernel Learning

Author: B Schölkopf
C Chang
C Leslie
Christian Widmer
F Bach
G Rätsch
G Schweikert
Gunnar Rätsch
H Daumé
H Daumé III
J Blitzer
J Robinson
L Bottou
L Jacob
L Jacob
M Kloft
Nora C Toussaint
P Gehler
R Caruana
S Sonnenburg
Schuller Ben-David
T Evgeniou
T Evgeniou
T Joachims
V Vapnik
Y Xue
Yasemin Altun
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics, many problems can be cast into the Multitask Learning scenario by incorporating data from several organisms. However, combining information from several tasks requires careful consideration of the degree of similarity between tasks. Our proposed method simultaneously learns or refines the similarity between tasks along with the Multitask Learning classifier. This is done by formulating the Multitask Learning problem as Multiple Kernel Learning, using the recently published <it>q</it>-Norm MKL algorithm. Results We demonstrate the performance of our method on two problems from Computational Biology. First, we show that our method is able to improve performance on a splice site dataset with given hierarchical task structure by refining the task relationships. Second, we consider an MHC-I dataset, for which we assume no knowledge about the degree of task relatedness. Here, we are able to learn the task similarities<it> ab initio</it> along with the Multitask classifiers. In both cases, we outperform baseline methods that we compare against. Conclusions We present a novel approach to Multitask Learning that is capable of learning task similarity along with the classifiers. The framework is very general as it allows to incorporate prior knowledge about tasks relationships if available, but is also able to identify task similarities in absence of such prior information. Both variants show promising results in applications from Computational Biology.</p

Directory of Open Access Journals