Search CORE

112 research outputs found

QiSampler: evaluation of scoring schemes for high-throughput datasets using a repetitive sampling strategy on gold standards

Author: A Subramanian
Bernhard Suter
C Jacques
E Marcotte
F Ramirez
Jean F Fontaine
JF Fontaine
K Venkatesan
ME Sowa
Miguel A Andrade-Navarro
O Mete
P Smialowski
R Jansen
RDC Team
RM Ewing
T Barrett
T Fawcett
T Sing
W Xu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background High-throughput biological experiments can produce a large amount of data showing little overlap with current knowledge. This may be a problem when evaluating alternative scoring mechanisms for such data according to a gold standard dataset because standard statistical tests may not be appropriate. Findings To address this problem we have implemented the QiSampler tool that uses a repetitive sampling strategy to evaluate several scoring schemes or experimental parameters for any type of high-throughput data given a gold standard. We provide two example applications of the tool: selection of the best scoring scheme for a high-throughput protein-protein interaction dataset by comparison to a dataset derived from the literature, and evaluation of functional enrichment in a set of tumour-related differentially expressed genes from a thyroid microarray dataset. Conclusions QiSampler is implemented as an open source R script and a web server, which can be accessed at <url>http://cbdm.mdc-berlin.de/tools/sampler/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MDC Repository

PROMPT: a protein mapping and comparison tool

Author: A Bairoch
A Krogh
AD Neverov
B Boeckmann
CI Castillo-Davis
D Frishman
D Frishman
DA Benson
DH Haft
Dmitrij Frishman
EV Koonin
FC Holstege
G Cochrane
G Gianese
IH Witten
IK Jordan
K Michalickova
KD Pruitt
M Di Giulio
M Gerstein
MJ Kerner
MJ Thompson
ML Riley
P Pagel
P Smialowski
P Wong
R Das
S Ghaemmaghami
SF Altschul
SP Kennedy
T Rattei
Thorsten Schmidt
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Comparison of large protein datasets has become a standard task in bioinformatics. Typically researchers wish to know whether one group of proteins is significantly enriched in certain annotation attributes or sequence properties compared to another group, and whether this enrichment is statistically significant. In order to conduct such comparisons it is often required to integrate molecular sequence data and experimental information from disparate incompatible sources. While many specialized programs exist for comparisons of this kind in individual problem domains, such as expression data analysis, no generic software solution capable of addressing a wide spectrum of routine tasks in comparative proteomics is currently available. RESULTS: PROMPT is a comprehensive bioinformatics software environment which enables the user to compare arbitrary protein sequence sets, revealing statistically significant differences in their annotation features. It allows automatic retrieval and integration of data from a multitude of molecular biological databases as well as from a custom XML format. Similarity-based mapping of sequence IDs makes it possible to link experimental information obtained from different sources despite discrepancies in gene identifiers and minor sequence variation. PROMPT provides a full set of statistical procedures to address the following four use cases: i) comparison of the frequencies of categorical annotations between two sets, ii) enrichment of nominal features in one set with respect to another one, iii) comparison of numeric distributions, and iv) correlation of numeric variables. Analysis results can be visualized in the form of plots and spreadsheets and exported in various formats, including Microsoft Excel. CONCLUSION: PROMPT is a versatile, platform-independent, easily expandable, stand-alone application designed to be a practical workhorse in analysing and mining protein sequences and associated annotation. The availability of the Java Application Programming Interface and scripting capabilities on one hand, and the intuitive Graphical User Interface with context-sensitive help system on the other, make it equally accessible to professional bioinformaticians and biologically-oriented users. PROMPT is freely available for academic users from

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Small-scale, semi-automated purification of eukaryotic proteins for structure determination

Author: A Crameri
A Kato
AA Yee
Ahyoung Lim
Brian G. Fox
C Prodromou
C Scheich
Craig A. Bingman
CS Goh
DA King
David J. Aceti
DJ Leahy
DR Casimiro
DR Casimiro
FJ Sugar
FR Blattner
FW Studier
George N. Phillips
GH Patterson
GN Murshudov
H Nguyen
HK Sreenath
I Rayment
J Sambrook
Jason Bunge
Jason G. McCoy
JD Watson
Jikui Song
JM Canaves
John G. Primm
John Kunert
John L. Markley
Jung Whan Yoon
Lai Bergeman
LM Galvao-Botton
Louise Meske
Lucas J. Bailey
M Kawasaki
Megan Riters
Michael Cassidy
MS Kimber
NE Chayen
Nicholas A. Dillon
O Brodsky
P Emsley
P Smialowski
P Smialowski
P Zhou
Paul G. Blommel
PG Blommel
PG Blommel
PG Blommel
R Page
R Vincentelli
RA Welch
RC Stevens
RC Tyler
RC Tyler
RD Klein
RK Knaust
Ronnie O. Frederick
S Reich
S Thao
SE Brenner
SM Garrard
The CCP4 suite: programs for protein crystallography
TT Yang
W Arber
W Peti
WB Jeon
WB Wood
ZS Derewenda
Publication venue: Springer Netherlands
Publication date: 01/01/2007
Field of study

A simple approach that allows cost-effective automated purification of recombinant proteins in levels sufficient for functional characterization or structural studies is described. Studies with four human stem cell proteins, an engineered version of green fluorescent protein, and other proteins are included. The method combines an expression vector (pVP62K) that provides in vivo cleavage of an initial fusion protein, a factorial designed auto-induction medium that improves the performance of small-scale production, and rapid, automated metal affinity purification of His8-tagged proteins. For initial small-scale production screening, single colony transformants were grown overnight in 0.4 ml of auto-induction medium, produced proteins were purified using the Promega Maxwell 16, and purification results were analyzed by Caliper LC90 capillary electrophoresis. The yield of purified [U-15N]-His8-Tcl-1 was 7.5 μg/ml of culture medium, of purified [U-15N]-His8-GFP was 68 μg/ml, and of purified selenomethione-labeled AIA–GFP (His8 removed by treatment with TEV protease) was 172 μg/ml. The yield information obtained from a successful automated purification from 0.4 ml was used to inform the decision to scale-up for a second meso-scale (10–50 ml) cell growth and automated purification. 1H–15N NMR HSQC spectra of His8-Tcl-1 and of His8-GFP prepared from 50 ml cultures showed excellent chemical shift dispersion, consistent with well folded states in solution suitable for structure determination. Moreover, AIA–GFP obtained by proteolytic removal of the His8 tag was subjected to crystallization screening, and yielded crystals under several conditions. Single crystals were subsequently produced and optimized by the hanging drop method. The structure was solved by molecular replacement at a resolution of 1.7 Å. This approach provides an efficient way to carry out several key target screening steps that are essential for successful operation of proteomics pipelines with eukaryotic proteins: examination of total expression, determination of proteolysis of fusion tags, quantification of the yield of purified protein, and suitability for structure determination

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

SProtP: A Web Server to Recognize Those Short-Lived Proteins Based on Sequence-Derived Features in Human Cells

Author: A Bachmair
A Belle
A Donner
A Krogh
A Mogk
A Varshavsky
A Varshavsky
B Schwanhausser
C Cai
CH Lin
CJ Cox
CJ Reuter
CM Pfleger
CM Pfleger
E Eden
E Mathes
Emanuele Buratti
G Lederkremer
Hao Jia
I Dubchak
J Cui
J Cui
J Cui
JC Wootton
JD Bendtsen
Jiahao Sha
K Bryson
KH Choo
M Fuxreiter
M-LT Lee
MH Kubbutat
MK Doherty
MS Kostelansky
N de Souza
P Smialowski
P Tompa
Ping Han
R Debigaré
R-E Fan
S Polo
S Rogers
SH Lecker
T Golub
T Huang
Tao Zhou
WE Mitch
X-F Song
Xiaobai Zhang
Xiaofeng Song
XL Ang
Xuejiang Guo
YH-C Sherry
Z Dosztányi
Publication venue: Public Library of Science
Publication date: 16/11/2011
Field of study

Protein turnover metabolism plays important roles in cell cycle progression, signal transduction, and differentiation. Those proteins with short half-lives are involved in various regulatory processes. To better understand the regulation of cell process, it is important to study the key sequence-derived factors affecting short-lived protein degradation. Until now, most of protein half-lives are still unknown due to the difficulties of traditional experimental methods in measuring protein half-lives in human cells. To investigate the molecular determinants that affect short-lived proteins, a computational method was proposed in this work to recognize short-lived proteins based on sequence-derived features in human cells. In this study, we have systematically analyzed many features that perhaps correlated with short-lived protein degradation. It is found that a large fraction of proteins with signal peptides and transmembrane regions in human cells are of short half-lives. We have constructed an SVM-based classifier to recognize short-lived proteins, due to the fact that short-lived proteins play pivotal roles in the control of various cellular processes. By employing the SVM model on human dataset, we achieved 80.8% average sensitivity and 79.8% average specificity, respectively, on ten testing dataset (TE1-TE10). We also obtained 89.9%, 99% and 83.9% of average accuracy on an independent validation datasets iTE1, iTE2 and iTE3 respectively. The approach proposed in this paper provides a valuable alternative for recognizing the short-lived proteins in human cells, and is more accurate than the traditional N-end rule. Furthermore, the web server SProtP (http://reprod.njmu.edu.cn/sprotp) has been developed and is freely available for users

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Network-Based Prediction and Analysis of HIV Dependency Factors

HIV Dependency Factors (HDFs) are a class of human proteins that are essential for HIV replication, but are not lethal to the host cell when silenced. Three previous genome-wide RNAi experiments identified HDF sets with little overlap. We combine data from these three studies with a human protein interaction network to predict new HDFs, using an intuitive algorithm called SinkSource and four other algorithms published in the literature. Our algorithm achieves high precision and recall upon cross validation, as do the other methods. A number of HDFs that we predict are known to interact with HIV proteins. They belong to multiple protein complexes and biological processes that are known to be manipulated by HIV. We also demonstrate that many predicted HDF genes show significantly different programs of expression in early response to SIV infection in two non-human primate species that differ in AIDS progression. Our results suggest that many HDFs are yet to be discovered and that they have potential value as prognostic markers to determine pathological outcome and the likelihood of AIDS development. More generally, if multiple genome-wide gene-level studies have been performed at independent labs to study the same biological system or phenomenon, our methodology is applicable to interpret these studies simultaneously in the context of molecular interaction networks and to ask if they reinforce or contradict each other

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Binary Classification of Aqueous Solubility Using Support Vector Machines with Reduction and Recombination Feature Selection

Crossref

PubMed Central

Network Compression as a Quality Measure for Protein Interaction Networks

Author: A Barabasi
A Breitkreutz
A Ceol
A Grigoriev
A Kocsor
A Langville
A Shevchenko
A Shevchenko
A Sorribas
A Whitty
A. Francis Stewart
AC Gavin
AC Gavin
B Aranda
B Titz
BD MacArthur
BJ Breitkreutz
C von Mering
CE Shannon
CM Deane
D Hannah
D Minoli
DA Schneider
DE Knuth
DJ LaCount
DJ Watts
DL Lindstrom
E Formstecher
E Torreira
EL Hong
F Jin
G Butland
G Lima-Mendez
GD Bader
GD Bader
GJ Chaitin
GT Hart
H Dortay
H Herzel
H Lu
H Yu
HB Fraser
HW Mewes
I Lee
I Lemmens
J Leskovec
J Sun
J White
J Zhong
JC Claussen
JC Rain
JF Rual
JJ Heymans
JR Parrish
K Anand
K Norlen
K Tarassov
K Venkatesan
KH Randall
L Demetrius
L Giot
L Ji
L Kiemer
L Royer
L Salwinski
LJ Jensen
Loic Royer
M Arifuzzaman
M Dehmer
M Dehmer
M Deng
M Harata
M Kao
M Li
M Pellegrini
Matthias Reimann
ME Cusick
MEJ Newman
Michael Schroeder
N Deo
N Simonis
NJ Krogan
O Weiss
P Boldi
P Braun
P Erds
P Smialowski
P Uetz
Patrick Aloy
PM Kim
PW Holland
R Diestel
R Jansen
R Solé
RJ Deshaies
RM Ewing
S Fields
S Jukna
S Li
S Maslov
S Sato
SR Collins
T Feder
T Ito
T Ito
T Manke
T Pawson
T Reguly
TSK Prasad
U Stelzl
V Colizza
W Cleveland
WH Wu
WK Huh
WT Tutte
X Shen
X Xin
Y Ho
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

With the advent of large-scale protein interaction studies, there is much debate about data quality. Can different noise levels in the measurements be assessed by analyzing network structure? Because proteomic regulation is inherently co-operative, modular and redundant, it is inherently compressible when represented as a network. Here we propose that network compression can be used to compare false positive and false negative noise levels in protein interaction networks. We validate this hypothesis by first confirming the detrimental effect of false positives and false negatives. Second, we show that gold standard networks are more compressible. Third, we show that compressibility correlates with co-expression, co-localization, and shared function. Fourth, we also observe correlation with better protein tagging methods, physiological expression in contrast to over-expression of tagged proteins, and smart pooling approaches for yeast two-hybrid screens. Overall, this new measure is a proxy for both sensitivity and specificity and gives complementary information to standard measures such as average degree and clustering coefficients

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Genetic variants and their interactions in disease risk prediction – machine learning and network perspectives

Author: 1000 Genomes Project
A Ashworth
A Burga
A Califano
A Galvan
A Gyenesei
A Statnikov
A Torkamani
A Torkamani
AL Barabási
AL Hopkins
B Lehner
B Lehner
B Maher
B Rakitsch
BA McKinney
BA McKinney
BS Srinivasan
C Ambroise
C Kooperberg
C Tian
C Winter
CG Lambert
CS Greene
D Merico
D Urbach
DJ Balding
DM Evans
DW Aha
DW Huang
DW Huang
E Lee
EA Ashley
EE Eichler
EE Schadt
ES Lander
F Barrenäs
G Bebek
G Gibson
G Hannum
G Peng
GK Chen
GM Clarke
H Eleftherohorinou
H Holm
H Zhong
HJ Cordell
HY Chuang
I Feldman
I Guyon
I König
I Surakka
J Corander
J Jakobsdottir
J Kruppa
J Tuikkala
J Yang
JD Iglehart
JH Moore
JH Moore
K Askland
K Wang
KA Pattin
KS Reynolds
L Luo
M Ladouceur
M Michaut
M Mooney
M Smoot
M Vidal
MA Heiskanen
MD Ritchie
MJ Sillanpää
NA Lavender
NF Marko
O Lavi
O Zuk
P Beltrao
P Donnelly
P Kraft
P Sebastiani
P Smialowski
PC Phillips
PJ Castaldi
Q He
R Braun
R Jelier
R Makowsky
R Simon
RO Lindén
S Lee
S Okser
S Ripatti
S Varma
SE Baranzini
Sebastian Okser
SJ Dixon
SW Hartley
T Hu
T Ideker
T Pahikkala
T Peltola
T Schupbach
TA Manolio
Tapio Pahikkala
Tero Aittokallio
TS Deisboeck
TT Wu
U Ober
U Ober
V Bansal
VK Ramanan
W Huang
Wellcome Trust Case Control Consortium
WG Kaelin Jr
Y Saeys
Z Wang
Z Wei
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Negated bio-events: Analysis and identification

Author: A MacKinlay
A Rzhetsky
C Cortes
C Kingsford
D Hull
DW Aha
E Buyko
F Sarafraz
G Escudero
G Tottie
H Kilicoglu
H Shatkay
H Tolentino
H Zhang
IG Councill
IM Goldin
J Knight
J-D Kim
JR Quinlan
JR Quinlan
KB Cohen
L Breiman
L Rokach
LR Horn
M Ashburner
M Averbuch
M Hall
M Joshi
M Krallinger
M Miwa
M Wiegand
O Sanchez-Graillet
P Smialowski
P Thompson
P Zweigenbaum
Paul Thompson
PG Mutalik
PL Elkin
R Caruana
R Langacker
R Morante
R Morante
R Morante
R Nawaz
R Nawaz
R Sauri
Raheel Nawaz
S Agarwal
S Ananiadou
S Ananiadou
S Boytcheva
S Dumais
S Goryachev
S Harabagiu
S Pyysalo
S Pyysalo
S Van Landeghem
Sophia Ananiadou
T Mitchell
T Wilson
T Wilson
V Vincze
V Vincze
W Ceusters
WJ Wilbur
WW Chapman
X-W Chen
Y Garten
Y Huang
Y Miyao
Y Miyao
Y Miyao
Y Qi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Negation occurs frequently in scientific literature, especially in biomedical literature. It has previously been reported that around 13% of sentences found in biomedical research articles contain negation. Historically, the main motivation for identifying negated events has been to ensure their exclusion from lists of extracted interactions. However, recently, there has been a growing interest in negative results, which has resulted in negation detection being identified as a key challenge in biomedical relation extraction. In this article, we focus on the problem of identifying negated bio-events, given gold standard event annotations.Results: We have conducted a detailed analysis of three open access bio-event corpora containing negation information (i.e., GENIA Event, BioInfer and BioNLP'09 ST), and have identified the main types of negated bio-events. We have analysed the key aspects of a machine learning solution to the problem of detecting negated events, including selection of negation cues, feature engineering and the choice of learning algorithm. Combining the best solutions for each aspect of the problem, we propose a novel framework for the identification of negated bio-events. We have evaluated our system on each of the three open access corpora mentioned above. The performance of the system significantly surpasses the best results previously reported on the BioNLP'09 ST corpus, and achieves even better results on the GENIA Event and BioInfer corpora, both of which contain more varied and complex events.Conclusions: Recently, in the field of biomedical text mining, the development and enhancement of event-based systems has received significant interest. The ability to identify negated events is a key performance element for these systems. We have conducted the first detailed study on the analysis and identification of negated bio-events. Our proposed framework can be integrated with state-of-the-art event extraction systems. The resulting systems will be able to extract bio-events with attached polarities from textual documents, which can serve as the foundation for more elaborate systems that are able to detect mutually contradicting bio-events. © 2013 Nawaz et al.; licensee BioMed Central Ltd

Crossref

E-space: Manchester Metropolitan University's Research Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

The Drosophila speciation factor HMR localizes to genomic insulator sites

Author: A Golovnin
A Stark
AA Gorchakov
AG Clark
AM Bushey
AM Olszak
Andrea Lukacs
Andreas Walter Thomae
AW Thomae
Axel Imhof
B Langmead
Barbara Jennings
BD Ross
BJ Bolkan
Bo Sun
C Spana
C Trapnell
C Vieira
C-T Ong
C-Y Pai
D Bohla
D Ghosh
DA Barbash
E Emberly
E Lerat
ES Kelleher
F Bantignies
F Rus
F Wilcoxon
G Bosco
H Dai
H Thorvaldsdóttir
J Padeken
J Yang
JC Yasuhara
JJ Bayes
JP Abad
JS Kaminker
JW Nicol
K Sawamura
KD Tartof
KHC Wei
LJ Zhu
M Bartkuhn
M Frasch
M Gause
M Labrador
MA Rodriguez
N Jiang
N Nègre
N Phadnis
NC Riddle
NJ Brideau
P Heger
Pawel Smialowski
PB Talbert
PRV Satyaki
RM Baxley
S Aruna
S Heinz
S Maheshwari
S Maheshwari
S Roy
T Sexton
T Straub
T-W Chen
TC James
Thomas Andreas Gerland
TI Gerasimova
TI Gerasimova
TJ Parnell
TK Barth
VB Indjeian
W Bao
X Ni
X Sun
X Sun
YB Schwartz
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

Hybrid incompatibility between Drosophila melanogaster and D. simulans is caused by a lethal interaction of the proteins encoded by the Hmr and Lhr genes. In D. melanogaster the loss of HMR results in mitotic defects, an increase in transcription of transposable elements and a deregulation of heterochromatic genes. To better understand the molecular mechanisms that mediate HMR's function, we measured genome-wide localization of HMR in D. melanogaster tissue culture cells by chromatin immunoprecipitation. Interestingly, we find HMR localizing to genomic insulator sites that can be classified into two groups. One group belongs to gypsy insulators and another one borders HP1a bound regions at active genes. The transcription of the latter group genes is strongly affected in larvae and ovaries of Hmr mutant flies. Our data suggest a novel link between HMR and insulator proteins, a finding that implicates a potential role for genome organization in the formation of species

Crossref

Directory of Open Access Journals

Open Access LMU

PubMed Central

FigShare