Search CORE

136 research outputs found

Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

Author: Binz P.A.
Campbell D.S.
Deutsch E.W.
Farrah T.
Mendoza L.
Moritz R.L.
Omenn G.S.
Shteynberg D.
Sun Z.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 12/09/2016
Field of study

The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/

Crossref

Serveur académique lausannois

PubMed Central

FigShare

A Tandem Mass Spectrometry Sequence Database Search Method for Identification of O-Fucosylated Proteins by Mass Spectrometry.

Author: Deutsch Eric W
Eng Jimmy K
Kappe Stefan H I
Mendoza Luis
Moritz Robert L
Sather D Noah
Shteynberg David
Springer Timothy A
Swearingen Kristian E
Vigdorovich Vladimir
Publication venue: Providence St. Joseph Health Digital Commons
Publication date: 21/12/2018
Field of study

Thrombospondin type 1 repeats (TSRs), small adhesive protein domains with a wide range of functions, are usually modified with O-linked fucose, which may be extended to O-fucose-β1,3-glucose. Collision-induced dissociation (CID) spectra of O-fucosylated peptides cannot be sequenced by standard tandem mass spectrometry (MS/MS) sequence database search engines because O-linked glycans are highly labile in the gas phase and are effectively absent from the CID peptide fragment spectra, resulting in a large mass error. Electron transfer dissociation (ETD) preserves O-linked glycans on peptide fragments, but only a subset of tryptic peptides with low m/ z can be reliably sequenced from ETD spectra compared to CID. Accordingly, studies to date that have used MS to identify O-fucosylated TSRs have required manual interpretation of CID mass spectra even when ETD was also employed. In order to facilitate high-throughput, automatic identification of O-fucosylated peptides from CID spectra, we re-engineered the MS/MS sequence database search engine Comet and the MS data analysis suite Trans-Proteomic Pipeline to enable automated sequencing of peptides exhibiting the neutral losses characteristic of labile O-linked glycans. We used our approach to reanalyze published proteomics data from Plasmodium parasites and identified multiple glycoforms of TSR-containing proteins

Providence St. Joseph Health Digital Commons

FigShare

Gene finding in the chicken genome

Author: Antonarakis Stylianos E
Birney Ewan
Brent Michael R
Bye Jacqueline M
Camara Francisco
Castelo Robert
Eyras Eduardo
Flicek Paul
Guigo Roderic
Huckle Elizabeth J
Parra Genis
Reymond Alexandre
Rogers Jane
Shteynberg David D
Wyss Carine
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Despite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularly true for genomes for which there is not a large collection of known gene sequences, such as the recently published chicken genome. We used the chicken sequence to test comparative and homology-based gene-finding methods followed by experimental validation as an effective genome annotation method. RESULTS: We performed experimental evaluation by RT-PCR of three different computational gene finders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram was computed and each component of it was evaluated. The results showed that de novo comparative methods can identify up to about 700 chicken genes with no previous evidence of expression, and can correctly extend about 40% of homology-based predictions at the 5' end. CONCLUSIONS: De novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods

Springer - Publisher Connector

Serveur académique lausannois

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

UPF Digital Repository

Secretaría de Estado de Cultura

Archive ouverte UNIGE

Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies

[Image: see text] Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives

Queen's University Belfast Research Portal

Crossref

PubMed Central

Edinburgh Research Explorer

The University of Manchester - Institutional Repository

Use of expressed sequence tags as an alternative approach for the identification of Taenia solium metacestode excretion/secretion proteins

Author: A Conesa
A Ito
A Keller
A Keller
AI Nesvizhskii
André M Deelder
AP Yatsuda
B Victor
Bjorn Victor
CR Almeida
D Shteynberg
F Liu
F Vaca-Paniagua
H Choi
HH García
IJ Tsai
IK Phiri
J Lundström
J Mulvenna
Johan Lindh
K Hancock
Katja Polman
Kirezi Kanobana
KM Monteiro
M Zheng
Magnus Palmblad
P Dorny
P Millares
P Rice
Pierre Dorny
R Craig
Sarah Gabriël
VG Virginio
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Minería de datos para el descubrimiento de patrones en enfermedades respiratorias en Bogotá, Colombia

Author: A Keller
A Keller
A Moghaddas Gholami
A Shevchenko
AC Peterson
AG Paulovich
AM Edwards
AR Kristensen
AW Bell
B Domon
B MacLean
B MacLean
BC Collins
C Escher
C Karlsson
D Shteynberg
DL Tabb
DL Tabb
EL de Graaf
G Marko-Varga
G. Rosenberger
G. Rosenberger
G. Rosenberger
GS Omenn
H Lam
H Lam
H Röst
H Schägger
HL Röst
J Griss
J Sherman
J-P Lambert
JA Vizcaíno
JA Vizcaíno
JE Elias
JK Eng
JM Burkhart
JR Wisniewski
L Lane
L Reiter
L Reiter
LC Gillet
LY Geer
M Beck
M Claassen
M Uhlen
M Wilhelm
M-S Kim
OT Schubert
P Picotti
P Picotti
P Picotti
PA Rudnick
R Aebersold
R Apweiler
RJ Chalkley
RR Craig
RT Schumacher
T Farrah
T Geiger
T Glatter
UH Toprak
V Marx
Y Liu
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/09/2014
Field of study

Trabajo de InvestigaciónEl presente proyecto se basa en la aplicación de minería de datos mediante el algoritmo de clustering K- means que permita la generación de un modelo descriptivo con el análisis de los datos y con el objetivo de identificar posibles comportamientos en enfermedades respiratorias en la ciudad de Bogotá. El conjunto de clústeres generados por la herramienta RapidMiner es la recopilación de datos de un periodo de cinco años de 2012 a 2016, en donde se contemplan el número de casos asociados a 184 diagnósticos de enfermedades respiratorias y la edad de los pacientes corresponde de 0 a 5 años.Trabajo de Investigación1. GENERALIDADES 2. OBJETIVOS 3. JUSTIFICACIÓN 4. DELIMITACIÓN 5. MARCO REFERENCIAL 6. METODOLOGÍA 7. FUENTES DE EXTRACCIÓN Y SUS VARIABLES 8. DISEÑO 9. SELECCIÓN DE ALGORITMOS DE CLUSTERING 10. RECONOCER PATRONES A PARTIR DE LA INFORMACIÓN RECOPILADA 11. CONCLUSIONES 12. TRABAJOS FUTUROS 13. REFERENCIAS BIBLIOGRÁFICAS 14. ANEXOSPregradoIngeniero de Sistema

Queen's University Belfast Research Portal

Crossref

PubMed Central

HAL Descartes

Repositorio Institucional Universidad Católica de Colombia

Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome

Author: A Gelman
A Ramos-Fernandez
AH Thompson
AI Nesvizhskii
Andrew S. Wilson
AS Wilson
AS Wilson
AS Wilson
AS Wilson
B Ghesquiere
B Miyake
Blythe P. Durbin-Johnson
Bradley R. Hart
Brian Bothner
BS Weir
C Der Sarkissian
C Ottoni
C Phillips
C Solazzo
C Wadsworth
CA Guenther
CF Bengtsson
CH Lee
Chad Nelson
CM Triggs
CN Laatsch
CT Oien
D Fenyo
D McNevin
D McNevin
D Shteynberg
Daniel J. Fairbanks
David M. Rocke
DE Reich
Deon S. Anex
DG Altman
DM Altshuler
DW Deedrick
E Callaway
EA Graffy
F Liu
FL Mendez
Francesc Calafell
Glendon J. Parker
GM Sheynkman
GR Abecasis
GR Abecasis
H Jeffreys
H Lam
HM Cann
HN Poinar
I Lazaridis
IW Evert
J Edson
J Jeong
J Jia
JA Tennessen
Jeffery Stevens
JJ Kim
JL Bada
JM Butler
JM Curran
Jonathan K. Hilmer
JS Cottrell
K Bryc
KA Lanning
KK Kidd
Krishna Parsawar
KS Robertson
L Orlando
Lisa Baird
M Beck
M Pruner-Bey
M Rasmussen
M Rasmussen
MA Rogers
MA Rogers
Mark Leppert
ME Allentoft
MH Zweig
MK Bunger
MM Houck
MR Hoopmann
N Brautbar
NE Robinson
Nori Matsunami
NR Barthelemy
P Soares
PA Coulombe
PE Bowden
R Craig
R Pinhasi
RA van Oorschot
RC Marshall
RE Handsaker
RH Rice
RH Rice
RM Durbin
Robert H. Rice
S Paabo
Scott R. Woodward
T Lindahl
T Melton
Tami Leppert
TR Disotell
W Fu
W Fu
X Wang
Y Ishihama
YJ Lee
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

YesHuman identification from biological material is largely dependent on the ability to characterize genetic polymorphisms in DNA. Unfortunately, DNA can degrade in the environment, sometimes below the level at which it can be amplified by PCR. Protein however is chemically more robust than DNA and can persist for longer periods. Protein also contains genetic variation in the form of single amino acid polymorphisms. These can be used to infer the status of non-synonymous single nucleotide polymorphism alleles. To demonstrate this, we used mass spectrometry-based shotgun proteomics to characterize hair shaft proteins in 66 European-American subjects. A total of 596 single nucleotide polymorphism alleles were correctly imputed in 32 loci from 22 genes of subjects’ DNA and directly validated using Sanger sequencing. Estimates of the probability of resulting individual non-synonymous single nucleotide polymorphism allelic profiles in the European population, using the product rule, resulted in a maximum power of discrimination of 1 in 12,500. Imputed non-synonymous single nucleotide polymorphism profiles from European–American subjects were considerably less frequent in the African population (maximum likelihood ratio = 11,000). The converse was true for hair shafts collected from an additional 10 subjects with African ancestry, where some profiles were more frequent in the African population. Genetically variant peptides were also identified in hair shaft datasets from six archaeological skeletal remains (up to 260 years old). This study demonstrates that quantifiable measures of identity discrimination and biogeographic background can be obtained from detecting genetically variant peptides in hair shaft protein, including hair from bioarchaeological contexts.The Technology Commercialization Innovation Program (Contracts #121668, #132043) of the Utah Governors Office of Commercial Development, the Scholarship Activitie

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

eScholarship - University of California

Bradford Scholars

FigShare

Performance-based vs socially supportive culture:a cross-national study of descriptive norms and entrepreneurship

Author: A Bandura
A Edmondson
A O’Donnell
A Portes
A Rauch
A Rauch
A Swaminathan
A Swidler
A Van Stel
A Werner
AN Licht
AW Wicker
AWE Wennekers
B Uzzi
B Verplanken
CJ Collins
CM Van Praag
D Landis
DB Audretsch
DC McClelland
DC North
E Schmitt-Rodermund
EM Uslaner
F Delmar
F Fukuyama
F Fukuyama
FL Pryor
G Hofstede
G Hofstede
G Hofstede
G Shteynberg
H Aarts
H Leibenstein
HE Aldrich
HE Aldrich
HP Bowen
I Verheul
J Brinckmann
J Bruederl
J Levie
JC Hayton
JF Hair
JM Nolan
JN Choi
JN Choi
JP Van Oudenhoven
JR Smith
K Peng
KJ Klein
L Hanifan
L Klapper
L Uhlaner
Lorraine M Uhlaner
LW Busenitz
M Baer
M Javidan
M Weber
M Woolcock
MF Peterson
MJ Hatch
MS Granovetter
MW Peng
NF Krueger
NF Krueger
NR Anderson
P Arenius
P Davidsson
P Koellinger
PA Frazier
PB Smith
PB Smith
PD Reynolds
PH Thornton
PJ Hanges
PK Wong
PS Adler
R Fischer
R Fischer
R Fischer
R Maseland
RB Cialdini
RF Hébert
RJ House
RL Tung
RS Burt
RW Jackman
RW Lent
S Djankov
S Shane
S Venaik
S Wennekers
S Wennekers
S-W Kwon
SR Barley
ST Hunter
Ute Stephan
W Arthur
WB Gartner
WJ Baumol
WW Powell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

This paper is a cross-national study testing a framework relating cultural descriptive norms to entrepreneurship in a sample of 40 nations. Based on data from the Global Leadership and Organizational Behavior Effectiveness project, we identify two higher-order dimensions of culture – socially supportive culture (SSC) and performance-based culture (PBC) – and relate them to entrepreneurship rates and associated supply-side and demand-side variables available from the Global Entrepreneurship Monitor. Findings provide strong support for a social capital/SSC and supply-side variable explanation of entrepreneurship rate. PBC predicts demand-side variables, such as opportunity existence and the quality of formal institutions to support entrepreneurship

Lirias

Crossref

Aston Publications Explorer

The evolution of extreme cooperation via shared dysphoric experiences

Author: A Diekmann
A Gómez
A Vázquez
B Mullen
C McCauley
C Navarrete
D Lieberman
DG Rand
DS Wilson
E Aronson
G Shteynberg
H Whitehouse
HN Qirko
J Greenberg
J Yoo
JHW Tan
JR Harrington
JR Ordonana
JR Spoor
K De Jaegher
K Langergraber
L King
L Lehmann
M Archetti
MM Lahr
P Smaldino
R Sosis
RS Walker
S Gavrilets
WB Swann Jr.
WB Swann Jr.
WB Swann Jr.
WB Swann Jr.
WB Swann Jr.
WB Swann Jr.
WD Hamilton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Willingness to lay down one’s life for a group of non-kin, well documented historically and ethnographically, represents an evolutionary puzzle. Building on research in social psychology, we develop a mathematical model showing how conditioning cooperation on previous shared experience can allow individually costly pro-group behavior to evolve. The model generates a series of predictions that we then test empirically in a range of special sample populations (including military veterans, college fraternity/sorority members, football fans, martial arts practitioners, and twins). Our empirical results show that sharing painful experiences produces “identity fusion” – a visceral sense of oneness – which in turn can motivate self-sacrifice, including willingness to fight and die for the group. Practically, our account of how shared dysphoric experiences produce identity fusion helps us better understand such pressing social issues as suicide terrorism, holy wars, sectarian violence, gang-related violence, and other forms of intergroup conflict

Queen's University Belfast Research Portal

Crossref

Royal Holloway - Pure

PubMed Central

Oxford University Research Archive

Kent Academic Repository

Coventry University Pure Portal

University of Melbourne Institutional Repository