Search CORE

169 research outputs found

Biocurators and Biocuration: surveying the 21st century challenges

Author: A. Bateman
Burkhardt
C. O'Donovan
Field
l. Xenarios
M. Cherry
P. Gaudet
S. Burge
Sanderson
St-Pierre
T. K. Attwood
T. Z. Berardini
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Curated databases are an integral part of the tool set that researchers use on a daily basis for their work. For most users, however, how databases are maintained, and by whom, is rather obscure. The International Society for Biocuration (ISB) represents biocurators, software engineers, developers and researchers with an interest in biocuration. Its goals include fostering communication between biocurators, promoting and describing their work, and highlighting the added value of biocuration to the world. The ISB recently conducted a survey of biocurators to better understand their educational and scientific backgrounds, their motivations for choosing a curatorial job and their career goals. The results are reported here. From the responses received, it is evident that biocuration is performed by highly trained scientists and perceived to be a stimulating career, offering both intellectual challenges and the satisfaction of performing work essential to the modern scientific community. It is also apparent that the ISB has at least a dual role to play to facilitate biocurators’ work: (i) to promote biocuration as a career within the greater scientific community; (ii) to aid the development of resources for biomedical research through promotion of nomenclature and data-sharing standards that will allow interconnection of biological databases and better exploit the pivotal contributions that biocurators are making

Crossref

Serveur académique lausannois

PubMed Central

The University of Manchester - Institutional Repository

Archive ouverte UNIGE

Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the AgBioData Consortium

Author: Berardini Tanya Z.
Clarke Jennifer L.
Cooper Laurel D.
Elser Justin
Farmer Andrew D.
Ficklin Stephen
Kumari Sunita
Laporte Marie-Angélique
Nelson Rex T.
Poelchau Monica F.
Sadohara Rie
Selby Peter
Sen Taner Z.
Thessen Anne E.
Whitehead Brandon
Publication venue
Publication date: 17/07/2023
Field of study

Over the last several decades, there has been rapid growth in the number and scope of agricultural genetics, genomics and breeding (GGB) databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, conducted a survey to assess the status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data sharing practices by AgBioData databases are in a healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that ontology use has not substantially changed since a similar survey was conducted in 2017. We recommend 1) providing training for database personnel in specific data sharing techniques, as well as in ontology use; 2) further study on what metadata is shared, and how well it is shared among databases; 3) promoting an understanding of data sharing and ontologies in the stakeholder community; 4) improving data sharing and ontologies for specific phenotypic data types and formats; and 5) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means.Comment: 17 pages, 8 figure

arXiv.org e-Print Archive

MetWAMer: eukaryotic translation initiation site prediction

Author: A Delcher
A Hatzigeorgiou
A Nadershahi
A Pedersen
A Prats
A Rakotondrafara
A Sachs
A Salamov
A Zien
C Bishop
C Iseli
C Lottaz
C Mathé
D Abramczyk
D Cavener
E Birney
G Crooks
G Gremme
G Li
G Stormo
H Li
H Liu
H Liu
J Allen
J Allen
J Crow
L Balvay
L Xing
M de Hoon
M Hirosawa
M Kozak
M Kozak
M Kozak
M Kozak
M Medveczky
M Sparks
M Sparks
M Stanke
M Stanke
M Tech
M Tech
Michael E Sparks
Q Dong
S Altschul
S Hebsgaard
S Russell
S Salzberg
T Berardini
T Mitchell
T Nishikawa
T Preiss
T Schiex
T Schneider
T Sing
V Brendel
Volker Brendel
Y Saeys
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Translation initiation site (TIS) identification is an important aspect of the gene annotation process, requisite for the accurate delineation of protein sequences from transcript data. We have developed the MetWAMer package for TIS prediction in eukaryotic open reading frames of non-viral origin. MetWAMer can be used as a stand-alone, third-party tool for post-processing gene structure annotations generated by external computational programs and/or pipelines, or directly integrated into gene structure prediction software implementations. Results MetWAMer currently implements five distinct methods for TIS prediction, the most accurate of which is a routine that combines weighted, signal-based translation initiation site scores and the contrast in coding potential of sequences flanking TISs using a perceptron. Also, our program implements clustering capabilities through use of the <it>k</it>-medoids algorithm, thereby enabling cluster-specific TIS parameter utilization. In practice, our static weight array matrix-based indexing method for parameter set lookup can be used with good results in data sets exhibiting moderate levels of 5'-complete coverage. Conclusion We demonstrate that improvements in statistically-based models for TIS prediction can be achieved by taking the class of each potential start-methionine into account pending certain testing conditions, and that our perceptron-based model is suitable for the TIS identification task. MetWAMer represents a well-documented, extensible, and freely available software system that can be readily re-trained for differing target applications and/or extended with existing and novel TIS prediction methods, to support further research efforts in this area.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Preclinical efficacy of the bioreductive alkylating agent RH1 against paediatric tumours

Author: A Begleiter
A Hogg
AC Sartorelli
AM Malkinson
C Dive
C Yan
D Hussein
D Hussein
D Siegel
DL Dehn
DL Dehn
E J Estlin
G Gatta
G Tudor
G W J Makin
HD Beall
HF Bligh
HH Chen
HM Katzenstein
J Cummings
J Cummings
J K Adamski
JM Brown
K E Brookes
KK Matthay
MD Berardini
NW Gibson
RD Traver
RM Phillips
S Danson
S Danson
S V Holt
SL Winski
SY Sharp
T Cresteil
T Digby
T Klymenko
T Ward
TC Chou
TC Chou
TC Chou
TH Ward
TH Ward
V Vichai
WF Hodnick
Publication venue: Nature Publishing Group
Publication date: 07/07/2009
Field of study

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Expressed sequence tag analysis of khat (Catha edulis) provides a putative molecular biochemical basis for the biosynthesis of phenylpropylamino alkaloids

Author: Abd El-Mawla AMA
Altschul SF
Balint EE
Berardini TZ
Boatright J
Bredholt T
Chou H-H
Colzato LS
Costelloe SJ
Davis EM
Efraim Lewinsohn
Engel S
Ewing B
Frédéric Marsolais
Gebissa E
Gonda I
Green JBA
Grue-SØrensen G
Grue-SØrensen G
Grue-SØrensen G
Ibdah M
Jillian M. Hagel
Jãna G
Kassie F
Kataoka M
Kataoka M
Klein A
Kliebenstein DJ
Korey Kilpatrick
Krizevski R
Krizevski R
Krizevski R
Leete E
Long MC
Mateen FJ
Meyer D
Miller RT
Murata J
Nierop-Groot MN
Okada T
Peter J. Facchini
Pohl M
Prabhu PR
Raz Krizevski
Schilmiller AL
Thomas BC
Van Moerkercke A
Venglat P
Yamasaki K
Yamasaki K
Yaron Sitrit
Ziegler J
Publication venue: Sociedade Brasileira de Genética
Publication date: 01/01/2011
Field of study

Khat (Catha edulis Forsk.) is a flowering perennial shrub cultivated for its neurostimulant properties resulting mainly from the occurrence of (S)-cathinone in young leaves. The biosynthesis of (S)-cathinone and the related phenylpropylamino alkaloids (1S,2S)-cathine and (1R,2S)-norephedrine is not well characterized in plants. We prepared a cDNA library from young khat leaves and sequenced 4,896 random clones, generating an expressed sequence tag (EST) library of 3,293 unigenes. Putative functions were assigned to > 98% of the ESTs, providing a key resource for gene discovery. Candidates potentially involved at various stages of phenylpropylamino alkaloid biosynthesis from L-phenylalanine to (1S,2S)-cathine were identified

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

Annotation of gene product function from high-throughput studies using the Gene Ontology

Author: Attrill H
Berardini TZ
Chibucos MC
Drabkin H
Engel SR
Fey P
Garmiri P
Gaudet P
Gene Ontology Consortium
Georghiou G
Harris MA
Huntley RP
Lovering RC
Poux S
Reiser L
Sawford T
Tauber R
Toro S
Van Auken KM
Wood V
Publication venue
Publication date: 01/01/2019
Field of study

High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community

UCL Discovery

EMF1 and PRC2 Cooperate to Repress Key Regulators of Arabidopsis Development

Author: A Chini
A Kuzmichev
A Mallory
A To
AR Gendall
B Czermin
C Alexandre
C Bowler
CC Carles
CC Carles
CC Wood
CH Yang
CM Ha
CM Ha
D Aubert
D Bouyer
D Chen
D Schubert
D Zilberman
D Zilberman
Daniel Zilberman
DM Bond
EM Kallin
ER Alvarez-Buylla
F Bratzel
F Bratzel
F Turck
G Wu
GI Dellino
HY Park
I Rubio-Somoza
I Weinhofer
IF King
IJ Majewski
J Goodrich
J Jun
JAKR Simon
Jason W. Reed
Jungeun Lee
L Chen
L Hennig
L Hennig
L Ringrose
L Sanchez-Pulido
L Xu
Leor Eshed-Williams
LJ Chen
M Aida
M Aida
M Bemer
M Calonje
M Calonje
M Kieffer
M Ku
M Lafos
M Luo
ME Griffith
MJ Buck
MR Karim
N Schatlowski
N Yoshida
NJ Francis
PB Brewer
R Cao
R Cao
R Cao
R Muller
R Sanchez
S Masiero
S Schoeftner
Sang Yeol Kim
SY Kim
T Kinoshita
T Klymenko
T Kotake
T Murashige
TI Lee
TJ Strabala
TZ Berardini
U Grossniklaus
X Zhang
Y Chanvivattana
YB Schwartz
YB Schwartz
YH Moon
Z. Renee Sung
ZR Sung
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

EMBRYONIC FLOWER1 (EMF1) is a plant-specific gene crucial to Arabidopsis vegetative development. Loss of function mutants in the EMF1 gene mimic the phenotype caused by mutations in Polycomb Group protein (PcG) genes, which encode epigenetic repressors that regulate many aspects of eukaryotic development. In Arabidopsis, Polycomb Repressor Complex 2 (PRC2), made of PcG proteins, catalyzes trimethylation of lysine 27 on histone H3 (H3K27me3) and PRC1-like proteins catalyze H2AK119 ubiquitination. Despite functional similarity to PcG proteins, EMF1 lacks sequence homology with known PcG proteins; thus, its role in the PcG mechanism is unclear. To study the EMF1 functions and its mechanism of action, we performed genome-wide mapping of EMF1 binding and H3K27me3 modification sites in Arabidopsis seedlings. The EMF1 binding pattern is similar to that of H3K27me3 modification on the chromosomal and genic level. ChIPOTLe peak finding and clustering analyses both show that the highly trimethylated genes also have high enrichment levels of EMF1 binding, termed EMF1_K27 genes. EMF1 interacts with regulatory genes, which are silenced to allow vegetative growth, and with genes specifying cell fates during growth and differentiation. H3K27me3 marks not only these genes but also some genes that are involved in endosperm development and maternal effects. Transcriptome analysis, coupled with the H3K27me3 pattern, of EMF1_K27 genes in emf1 and PRC2 mutants showed that EMF1 represses gene activities via diverse mechanisms and plays a novel role in the PcG mechanism

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Network Analysis Identifies ELF3 as a QTL for the Shade Avoidance Response in Arabidopsis

Quantitative Trait Loci (QTL) analyses in immortal populations are a powerful method for exploring the genetic mechanisms that control interactions of organisms with their environment. However, QTL analyses frequently do not culminate in the identification of a causal gene due to the large chromosomal regions often underlying QTLs. A reasonable approach to inform the process of causal gene identification is to incorporate additional genome-wide information, which is becoming increasingly accessible. In this work, we perform QTL analysis of the shade avoidance response in the Bayreuth-0 (Bay-0, CS954) x Shahdara (Sha, CS929) recombinant inbred line population of Arabidopsis. We take advantage of the complex pleiotropic nature of this trait to perform network analysis using co-expression, eQTL and functional classification from publicly available datasets to help us find good candidate genes for our strongest QTL, SAR2. This novel network analysis detected EARLY FLOWERING 3 (ELF3; AT2G25930) as the most likely candidate gene affecting the shade avoidance response in our population. Further genetic and transgenic experiments confirmed ELF3 as the causative gene for SAR2. The Bay-0 and Sha alleles of ELF3 differentially regulate developmental time and circadian clock period length in Arabidopsis, and the extent of this regulation is dependent on the light environment. This is the first time that ELF3 has been implicated in the shade avoidance response and that different natural alleles of this gene are shown to have phenotypic effects. In summary, we show that development of networks to inform candidate gene identification for QTLs is a promising technique that can significantly accelerate the process of QTL cloning

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

eScholarship - University of California

ProdInra

Hal-Diderot

Pyrosequencing of the Camptotheca acuminata transcriptome reveals putative genes involved in camptothecin biosynthesis and transport

Author: A Belhadj
A Lorence
A Lorence
A Valletta
ACA Yendo
Aiping Lv
Chao Sun
CR Hutchinson
CW Liang
DR Nelson
Enzo Tramontano
G Collua
G Pasqua
H Lu
H Rischer
H Seki
H Warzechaa
H Yao
HM Luo
Hongmei Luo
HT Keat
IE Maldonado-Mendoza
JE Crawford
Jingyuan Song
JR Ketudat Cairns
K Sakai
K Terasaka
L Barleben
Liang Dong
LS Gaertner
M Lopez-Meyer
M López-Meyer
M López-Meyer
M Morant
MJ Coon
N Shitan
NH Oberlies
P Li
RJ Aerts
RJ Burnett
S Chen
S Irmler
S Sirikantaramas
SEO Connor
SH Song
Shilin Chen
SJ Emrich
SY Li
SY Li
T Nomura
TZ Berardini
Y Pi
Y Pommier
Ying Li
Yingjie Zhu
Yongzhen Sun
Yunyun Niu
ZJ Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Camptotheca acuminata is a Nyssaceae plant, often called the "happy tree", which is indigenous in Southern China. C. acuminata produces the terpenoid indole alkaloid, camptothecin (CPT), which exhibits clinical effects in various cancer treatments. Despite its importance, little is known about the transcriptome of C. acuminata and the mechanism of CPT biosynthesis, as only few nucleotide sequences are included in the GenBank database.Results: From a constructed cDNA library of young C. acuminata leaves, a total of 30,358 unigenes, with an average length of 403 bp, were obtained after assembly of 74,858 high quality reads using GS De Novo assembler software. Through functional annotation, a total of 21,213 unigenes were annotated at least once against the NCBI nucleotide (Nt), non-redundant protein (Nr), Uniprot/SwissProt, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Arabidopsis thaliana proteome (TAIR) databases. Further analysis identified 521 ESTs representing 20 enzyme genes that are involved in the backbone of the CPT biosynthetic pathway in the library. Three putative genes in the upstream pathway, including genes for geraniol-10-hydroxylase (CaPG10H), secologanin synthase (CaPSCS), and strictosidine synthase (CaPSTR) were cloned and analyzed. The expression level of the three genes was also detected using qRT-PCR in C. acuminata. With respect to the branch pathway of CPT synthesis, six cytochrome P450s transcripts were selected as candidate transcripts by detection of transcript expression in different tissues using qRT-PCR. In addition, one glucosidase gene was identified that might participate in CPT biosynthesis. For CPT transport, three of 21 transcripts for multidrug resistance protein (MDR) transporters were also screened from the dataset by their annotation result and gene expression analysis.Conclusion: This study produced a large amount of transcriptome data from C. acuminata by 454 pyrosequencing. According to EST annotation, catalytic features prediction, and expression analysis, novel putative transcripts involved in CPT biosynthesis and transport were discovered in C. acuminata. This study will facilitate further identification of key enzymes and transporter genes in C. acuminata

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Cagliari

Investigating the validity of current network analysis on static conglomerate networks by protein network stratification

Author: A Bossi
A Ma'ayan
AC Gavin
AH Tong
AK Ramani
AL Barabasi
AL Hopkins
B Zybailov
C Alfarano
C von Mering
D Eisenberg
D Greenbaum
D Swarbreck
G Balazsi
G Joshi-Tope
G Palla
H Huang
H Jeong
H Ma
H Rutschow
H Yu
H Yu
H Yu
H Zhang
HW Ma
HY Chuang
IW Taylor
J Cui
JD Han
JF Rual
K Baerenfaller
K Yang
KY Yip
LH Hartwell
Long J Lu
M Arita
M Ashburner
M Girvan
M Hamacher
M Miyamoto
M Zhang
ME Newman
Minlu Zhang
MJ Herrgard
MP Samanta
N Bertin
N Guelzim
N Lemke
NM Luscombe
NN Batada
NN Batada
P Braun
P Qiu
PV Missiuro
R Guimera
R Kelley
R Milo
R Sharan
RJ Prill
S Li
S Peri
S Wuchty
SA Teichmann
SE Calvano
T Ideker
T Kislinger
TZ Berardini
U de Lichtenberg
U Stelzl
WH Lin
Y Xia
YR Cho
Z Wang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background A molecular network perspective forms the foundation of systems biology. A common practice in analyzing protein-protein interaction (PPI) networks is to perform network analysis on a conglomerate network that is an assembly of all available binary interactions in a given organism from diverse data sources. Recent studies on network dynamics suggested that this approach might have ignored the dynamic nature of context-dependent molecular systems. Results In this study, we employed a network stratification strategy to investigate the validity of the current network analysis on conglomerate PPI networks. Using the genome-scale tissue- and condition-specific proteomics data in <it>Arabidopsis thaliana</it>, we present here the first systematic investigation into this question. We stratified a conglomerate <it>A. thaliana </it>PPI network into three levels of context-dependent subnetworks. We then focused on three types of most commonly conducted network analyses, i.e., topological, functional and modular analyses, and compared the results from these network analyses on the conglomerate network and five stratified context-dependent subnetworks corresponding to specific tissues. Conclusions We found that the results based on the conglomerate PPI network are often significantly different from those of context-dependent subnetworks corresponding to specific tissues or conditions. This conclusion depends neither on relatively arbitrary cutoffs (such as those defining network hubs or bottlenecks), nor on specific network clustering algorithms for module extraction, nor on the possible high false positive rates of binary interactions in PPI networks. We also found that our conclusions are likely to be valid in human PPI networks. Furthermore, network stratification may help resolve many controversies in current research of systems biology.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central