Search CORE

118 research outputs found

PINT: Pathways INtegration Tool

Author: Babur
Babur
Bruford
C.-L. Hsu
Casalini
Chonghaile
de Matos
Gullick
Hayashi
Hucka
Janga
Jin
Jonsson
Kitano
Le Novere
Le Novere
Le Novere
Lomonosova
Mi
Pruitt
Resnitzky
Rodriguez
Schaefer
Shannon
U.-C. Yang
Viswanathan
Y.-C. Chen
Y.-H. Huang
Y.-T. Wang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

New pathway databases generally display pathways by retrieving information from a database dynamically. Some of them even provide their pathways in SBML or other exchangeable formats. Integrating these models is a challenging work, because these models were not built in the same way. Pathways integration Tool (PINT) may integrate the standard SBML files. Since these files may be obtained from different sources, any inconsistency in component names can be revised by using an annotation editor upon uploading a pathway model. This integration function greatly simplifies the building of a complex model from small models. To get new users started, about 190 curated public models of human pathways were collected by PINT. Relevant models can be selected and sent to the workbench by using a user-friendly query interface, which also accepts a gene list derived from high-throughput experiments. The models on the workbench, from either a public or a private source, can be integrated and painted. The painting function is useful for highlighting important genes or even their expression level on a merged pathway diagram, so that the biological significance can be revealed. This tool is freely available at http://csb2.ym.edu.tw/pint/

CiteSeerX

Crossref

PubMed Central

Epstein-Barr virus transcription factor Zta acts through distal regulatory elements to directly control cellular gene expression

Author: Alison J. Sinclair
Anja Godfrey
Bailey
Bark-Jones
Beatty
Ben-Bassat
Bergbauer
Bhende
Bhende
Broderick
Carey
Chang
Chi
Farnham
Feederle
Feng
Flemington
Flower
Gentleman
Gordon Peters
Grabherr
Gupta
Gustems
Halder
Harshil Patel
Heather
Hollyoake
Hsu
Huang da
Huang da
Ijiel B. Naranjo Perez-Fernandez
Inman
Israel
Jiang
Jianmin Zuo
Kalla
Kalla
Kay Osborn
Kenney
Kenney
Klug
Krig
Kuppers
Li
Lieberman
Lieberman
Lieberman
Longnecker
Ma
Machanick
Martin Rowe
McClellan
Miller
Molyneux
Nicolae Balan
Pruitt
Rajaei Al-Mohammad
Ramasubramanyan
Ramasubramanyan
Richard G. Jenner
Rickinson
Rowe
Rowe
Scacheri
Shannon-Lowe
Sharada Ramasubramanyan
Takada
Thorvaldsdottir
Tsai
Tsang
Vetsika
Whyte
Yang
Young
Young
Yuan
Zuo
Zuo
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/03/2015
Field of study

Lytic replication of the human gamma herpes virus Epstein-Barr virus (EBV) is an essential prerequisite for the spread of the virus. Differential regulation of a limited number of cellular genes has been reported in B-cells during the viral lytic replication cycle. We asked whether a viral bZIP transcription factor, Zta (BZLF1, ZEBRA, EB1), drives some of these changes. Using genome-wide chromatin immunoprecipitation coupled to next-generation DNA sequencing (ChIP-seq) we established a map of Zta interactions across the human genome. Using sensitive transcriptome analyses we identified 2263 cellular genes whose expression is significantly changed during the EBV lytic replication cycle. Zta binds 278 of the regulated genes and the distribution of binding sites shows that Zta binds mostly to sites that are distal to transcription start sites. This differs from the prevailing view that Zta activates viral genes by binding exclusively at promoter elements. We show that a synthetic Zta binding element confers Zta regulation at a distance and that distal Zta binding sites from cellular genes can confer Zta-mediated regulation on a heterologous promoter. This leads us to propose that Zta directly reprograms the expression of cellular genes through distal elements

Crossref

University of Birmingham Research Portal

PubMed Central

Sussex Research Online

Association of Accelerometry-Measured Physical Activity and Cardiovascular Events in Mobility-Limited Older Adults: The LIFE (Lifestyle Interventions and Independence for Elders) Study.

Author: Abbie Wrights
Abby C. King
Alvito Rego
Ami Parks McGucken
Anne B. Newman
Anthony P. Marsh
Barbara Fennelly
Ben P. Butitta
Bhanuprasad D. Sandesara
Bonnie Spring
Bret H. Goodpaster
Bridget M. Mignosa
Carlos A. Vaz Fragoso
Catrine Tudor‐Locke
Curt D. Furberg
Cynthia L. Stowe
Cynthia M. Castro
Daniel P. Beavers
Deborah Barr
Delilah Cook
Denise Bonds
Denise M. Shepard
Diana Kerwin
Diane G. Ives
Don Hire
Eileen Handberg
Fang‐Chi Hsu
Floris F. Singletary
Gail M. Flynn
George Grove
Heidi K. Millet
Holly L. Morris
Hugh C. Hendrie
Irina Korytov
Jackie Causer
Jamehl S. Demons
Janet T. Bonk
Janine Jennings
Janine M. Jennings
Jeffrey A. Katula
Jeffrey D. Knaggs
Jennifer Rush
Joanne M. McGloin
Jodi D. Fitzgerald
Joe Verghese
John A. Dodson
John Hepler
John L. Hankinson
Joshua Hauser
Julia Rushing
Julie A. Bugaj
June J. Pierce
Karen C. Wu
Kathryn Domanchuk
Kathy Berra
Kathy Williams
Katie A. Radcliff
Kaycee M. Sink
Kaycee M. Sink
Kimberly Kennedy
Kushang V. Patel
Laura Lovato
Lea N. Harvin
Leora Henkin
Leslie A. Pruitt
Lynne P. Iannone
Marco Pahor
Margo Fitch
Maria A. Zenoni
Mark A. Espeland
Mark A. Newman
Mark Espeland
Mary M. McDermott
Matthew J. Brennan
Megan S. Lorow
Melissa Nauta Harris
Michael Marsiske
Michael P. Walkup
Nancy W. Glynn
Nancy Woolard
Nathalie de Rekeneire
Nathan E. Britt
Neelesh K. Nadkarni
Oscar Lopez
Peter H. Brubaker
Piera Kost
Rachel Shertzer‐Skinner
Raeleen Mautner
Randall S. Stafford
Rex Graff
Robert M. Kaplan
Robert P. Byington
Robert S. Axtell
Roger A. Fielding
Ron Monce
Rose Fries
Ruben Rodarte
Scott Rushing
Sean N. Halpin
Sergei Romashkan
Shannon H. Cocreham
Shannon K. Cochrane
Shannon L. Mihalko
Sheletta G. Donatto
Shyh‐Huei Chen
Stephanie A. Studenski
Stephen D. Anton
Stephen R. Rapp
Steven N. Blair
Susan Nayfield
Susan S. Kashaf
Theresa Sweeney Barnett
Thomas M. Gill
Thomas W. Buford
Tina E. Brinkley
Todd M. Manini
Valerie H. Myers
Valerie K. Wilson
Veronica Yank
W. Jack Rejeski
Walter T. Ambrosius
Wesley Roberson
William Applegate
William C. Marena
William L. Haskell
Publication venue: eScholarship, University of California
Publication date: 01/12/2017
Field of study

BACKGROUND:Data are sparse regarding the value of physical activity (PA) surveillance among older adults-particularly among those with mobility limitations. The objective of this study was to examine longitudinal associations between objectively measured daily PA and the incidence of cardiovascular events among older adults in the LIFE (Lifestyle Interventions and Independence for Elders) study. METHODS AND RESULTS:Cardiovascular events were adjudicated based on medical records review, and cardiovascular risk factors were controlled for in the analysis. Home-based activity data were collected by hip-worn accelerometers at baseline and at 6, 12, and 24 months postrandomization to either a physical activity or health education intervention. LIFE study participants (n=1590; age 78.9±5.2 [SD] years; 67.2% women) at baseline had an 11% lower incidence of experiencing a subsequent cardiovascular event per 500 steps taken per day based on activity data (hazard ratio, 0.89; 95% confidence interval, 0.84-0.96; P=0.001). At baseline, every 30 minutes spent performing activities ≥500 counts per minute (hazard ratio, 0.75; confidence interval, 0.65-0.89 [P=0.001]) were also associated with a lower incidence of cardiovascular events. Throughout follow-up (6, 12, and 24 months), both the number of steps per day (per 500 steps; hazard ratio, 0.90, confidence interval, 0.85-0.96 [P=0.001]) and duration of activity ≥500 counts per minute (per 30 minutes; hazard ratio, 0.76; confidence interval, 0.63-0.90 [P=0.002]) were significantly associated with lower cardiovascular event rates. CONCLUSIONS:Objective measurements of physical activity via accelerometry were associated with cardiovascular events among older adults with limited mobility (summary score >10 on the Short Physical Performance Battery) both using baseline and longitudinal data. CLINICAL TRIAL REGISTRATION:URL: http://www.clinicaltrials.gov. Unique identifier: NCT01072500

Crossref

eScholarship - University of California

The Biomolecular Interaction Network Database in PSI-MI 2.5

Author: Alfarano
Apweiler
Aranda
Ashburner
Bader
Bader
Bader
Bader
Breitkreutz
Brown
Chaurasia
Chen
Cote
Demir
Gary D. Bader
Han
Hermjakob
Hogue
Jensen
Karsch-Mizrachi
Keshava Prasad
Kirchmair
Kulikova
Laibe
Lynn
McDowall
Montecchi-Palazzi
Ng
Orchard
Prieto
Pruitt
Rashad A. El-Badrawi
Razick
Ruth Isserlin
Salama
Satoru
Shannon
Tarcea
Wu
Publication venue: Oxford University Press
Publication date
Field of study

The Biomolecular Interaction Network Database (BIND) is a major source of curated biomolecular interactions, which has been unmaintained for the last few years, a trend which will eventually result in the loss of a significant amount of unique biomolecular interaction information, mostly as database identifiers become out of date. To help reverse this trend, we converted BIND to a standard format, Proteomics Standard Initiative-Molecular Interaction 2.5, starting from the last curated data release (from 2005) available in a custom XML format and made the core components (interactions and complexes) plus additional valuable curated information available for download (http://download.baderlab.org/BINDTranslation/). Major work during the conversion process was required to update out of date molecule identifiers resulting in a more comprehensive conversion of BIND, by measures including number of species and interactor types covered, than what is currently accessible elsewhere. This work also highlights issues of data modeling, controlled vocabulary adoption and data cleaning that can serve as a general case study on the future compatibility of interaction databases

Crossref

PubMed Central

Background frequencies for residue variability estimates: BLOSUM revisited

Author: A del Sol Mesa
AG Murzin
C Sander
C Shannon
H Berman
I Mihalek
I Mihalek
I Mihalek
I Mihalek
I Nooren
I Reš
J Donald
J Pei
K Pruitt
O Lichtarge
O Lichtarge
P Shenkin
R Development Core Team
R Edgar
S Altschul
S Henikoff
S Jones
S Kullback
S Veerassamy
T Pupko
W Atchley
W Valdar
W Valdar
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pressure, highlighting their functional importance. The inability of the column entropy to differentiate between residue types, however, limits its resolution power. Results In this work we suggest generalizing Shannon's expression to a function with similar mathematical properties, that, at the same time, includes observed propensities of residue types to mutate to each other. To do that, we revisit the original construction of BLOSUM matrices, and re-interpret them as mutation probability matrices. These probabilities are then used as background frequencies in the revised residue conservation measure. Conclusion We show that joint entropy with BLOSUM-proportional probabilities as a reference distribution enables detection of protein functional sites comparable in quality to a time-costly maximum-likelihood evolution simulation method (rate4site), and offers greater resolution than the Shannon entropy alone, in particular in the cases when the available sequences are of narrow evolutionary scope.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations

Author: A Nimgaonkar
Affymetrix
Affymetrix
AI Su
AJ Butte
AJ Butte
AT Adai
BH Mecham
BH Mecham
C Wu
CL Wilson
Crispin J Miller
E Birney
G Liu
G Sherlock
H Wang
HS Leong
J Harbig
J Stuart
KD Pruitt
L Gautier
L Gautier
M Dai
Michał J Okoniewski
O Teuffel
R Gentleman
R Irizarry
S Carter
S Zakharkin
T Attwood
W Shannon
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Microarrays measure the binding of nucleotide sequences to a set of sequence specific probes. This information is combined with annotation specifying the relationship between probes and targets and used to make inferences about transcript- and, ultimately, gene expression. In some situations, a probe is capable of hybridizing to more than one transcript, in others, multiple probes can target a single sequence. These 'multiply targeted' probes can result in non-independence between measured expression levels. RESULTS: An analysis of these relationships for Affymetrix arrays considered both the extent and influence of exact matches between probe and transcript sequences. For the popular HGU133A array, approximately half of the probesets were found to interact in this way. Both real and simulated expression datasets were used to examine how these effects influenced the expression signal. It was found not only to lead to increased signal strength for the affected probesets, but the major effect is to significantly increase their correlation, even in situations when only a single probe from a probeset was involved. By building a network of probe-probeset-transcript relationships, it is possible to identify families of interacting probesets. More than 10% of the families contain members annotated to different genes or even different Unigene clusters. Within a family, a mixture of genuine biological and artefactual correlations can occur. CONCLUSION: Multiple targeting is not only prevalent, but also significant. The ability of probesets to hybridize to more than one gene product can lead to false positives when analysing gene expression. Comprehensive annotation describing multiple targeting is required when interpreting array data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Identification of β-catenin binding regions in colon cancer cells using ChIP-Seq

Author: Andersson
Bailey
Blahnik
Bordonaro
Chinenov
Clevers
Daniel Bottomly
Davis
Dekker
Eferl
Farnham
Fearon
Fullwood
Gan
Gregory S. Yochum
Groden
Hart
Hasselblatt
Hatzis
He
Hedgepeth
Heintzman
Ji
Ji
Jothi
Kharchenko
Kimbrel
Kinzler
Klein
Korinek
Lieberman-Aiden
Miele
Morin
Mosimann
Nateri
Park
Pepke
Polakis
Pomerantz
Pruitt
Rubinfeld
Sancho
Sancho
Schmidt
Shannon K. McWeeney
Sydney L. Kyler
Toualbi
Tuupanen
Wang
Wright
Yochum
Yochum
Yochum
Yochum
Yu
Publication venue: Oxford University Press
Publication date
Field of study

Deregulation of the Wnt/β-catenin signaling pathway is a hallmark of colon cancer. Mutations in the adenomatous polyposis coli (APC) gene occur in the vast majority of colorectal cancers and are an initiating event in cellular transformation. Cells harboring mutant APC contain elevated levels of the β-catenin transcription coactivator in the nucleus which leads to abnormal expression of genes controlled by β-catenin/T-cell factor 4 (TCF4) complexes. Here, we use chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-Seq) to identify β-catenin binding regions in HCT116 human colon cancer cells. We localized 2168 β-catenin enriched regions using a concordance approach for integrating the output from multiple peak alignment algorithms. Motif discovery algorithms found a core TCF4 motif (T/A–T/A–C–A–A–A–G), an extended TCF4 motif (A/T/G–C/G–T/A–T/A–C–A–A–A–G) and an AP-1 motif (T–G–A–C/T–T–C–A) to be significantly represented in β-catenin enriched regions. Furthermore, 417 regions contained both TCF4 and AP-1 motifs. Genes associated with TCF4 and AP-1 motifs bound β-catenin, TCF4 and c-Jun in vivo and were activated by Wnt signaling and serum growth factors. Our work provides evidence that Wnt/β-catenin and mitogen signaling pathways intersect directly to regulate a defined set of target genes

Crossref

PubMed Central

‘Genome design’ model and multicellular complexity: golden middle

Author: Aguilera
Alexander E. Vinogradov
Alon
Barrera
Bateman
Castillo-Davis
Chen
Chen
Claverie
Cohen-Gihon
Comeron
Crow
Drummond
Eisenberg
Fan
Fedorova
Jongeneel
Joshi-Tope
Kanehisa
Kashtan
Keightley
Kudla
Le Hir
Li
Lin
Majewski
Mattick
Mattick
Mulder
Nott
Ogata
Orengo
Pang
Powell
Pozzoli
Pruitt
Remenyi
Romero
Russell
Shannon
Sironi
Storey
Su
Subramanian
Suzuki
Szathmary
The Gene Ontology Consortium
Urrutia
Vinogradov
Vinogradov
Vinogradov
Vinogradov
Vinogradov
Vinogradov
Vinogradov
Vogel
von Mering
Wheeler
Yu
Zaslaver
Zhang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Human tissue-specific genes were reported to be longer than housekeeping genes (both in coding and intronic parts). The competing neutralist and adaptationist models were proposed to explain this observation. Here I show that in human genome the longest are genes with the intermediate expression pattern. From the standpoint of information theory, the regulation of such genes should be most complex. In the genomewide context, they are found here to have the higher informational load on all available levels: from participation in protein interaction networks, pathways and modules reflected in Gene Ontology categories through transcription factor regulatory sets and protein functional domains to amino acid tuples (words) in encoded proteins and nucleotide tuples in introns and promoter regions. Thus, the intermediately expressed genes have the higher functional and regulatory complexity that is reflected in their greater length (which is consistent with the ‘genome design’ model). The dichotomy of housekeeping versus tissue-specific entities is more pronounced on the modular level than on the molecular level. There are much lesser intermediate-specific modules (modules overrepresented in the intermediately expressed genes) than housekeeping or tissue-specific modules (normalized to gene number). The dichotomy of housekeeping versus tissue-specific genes and modules in multicellular organisms is probably caused by the burden of regulatory complexity acted on the intermediately expressed genes

CiteSeerX

Crossref

PubMed Central

Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

Author: A Bairoch
A Christoffels
A Gurevich
A Kozomara
A McKenna
A Mitchell
A Morgulis
A Morgulis
A Pradhan
A Reiner
A Rodriguez-Mari
A Stamatakis
A Yates
AI Makunin
AJ Enright
AL Price
AL Price
Alan Christoffels
Aleksey Komissarov
Alexey Tupikin
Amy Hin Yan Tong
Andrey A. Yurchenko
AR Quinlan
B Langmead
B Star
C Berthelot
C Camacho
C Holt
C Wang
Chen-Shan Chin
CS Chin
D Brawand
D Ellinghaus
DA Benson
Darrell Green
DC Hardie
Dean R. Jerry
DH Alexander
Doreen Lau
DR Kelley
DRS-K C. Jerry
E Casacuberta
E. TG Staristina
EW Myers
F Abascal
F Chen
F Yang
FC Jones
FJ Krsticevic
Fritz J. Sedlazeck
G Abrusan
G Benson
G Lin
G Marcais
G Parra
G Parra
G Tamazian
GH Yue
GH Yue
Gopikrishna Gopalapillai
Gregory W. Vurture
GS Slater
GT Valente
H Li
H Saiga
Heiner Kuhl
HH Kazazian Jr.
I Braasch
Inna S. Kuznetsova
IS Kuznetsova
J Castresana
J Eid
J Huerta-Cepas
J Jurka
J Lin
James P. Drake
JG Ruby
JN Volff
JN Volff
Jolly M. Saju
Jonas Korlach
JS Chew
Junhui Jiang
K Howe
K Katoh
K Prufer
Kathiresan Purushothaman
KD Pruitt
KJ Hoff
KP Koepfli
KW Tzung
Lawrence S. Hon
László Orbán
M Blanchette
M Kanehisa
M Kasahara
M Kolmogorov
M Krzywinski
M Martin
M Schartl
M Tarailoâ-Graovac
M Tine
MA Larkin
Mario Jonas
Marsel Kabilov
Matthew Boitano
MB Stocks
MG Grabherr
Michael C. Schatz
MJ Chaisson
MR Friedlander
N Siegel
Natascha M. Thevasagayam
NM Thevasagayam
O Jaillon
O Otero
P Cingolani
P Ravi
P Schattner
P Shannon
P Xu
Paul M. Richardson
PE Warburton
Peter Van Heusden
R Kajitani
R Lorenz
R Luo
R Moore
R Pethiyagoda
R Poulter
R She
R Sreenivasan
Ramkumar Lachumanan
RD Ward
RD Ward
Richard Hall
RJ Roberts
S Chen
S Guindon
S Hoegg
S Hoegg
S Koren
S Vij
S Zhou
Sai Rama Sridatta Prakki
Sarah Mwangi
SF Altschul
Shubha Vij
Si Lok
Si Yan Ngoh
Siddharth Singh
Simon Moxon
SM Kielbasa
Sridhar Sivasubbu
Stanley Kimbung Mbandi
Stephen J. O'Brien
Stephen W. Turner
T Anantharaman
Tamás Dalmay
Tansyn H. Noble
TD Wu
TF DeLuca
TH O'Hare
TLO Davis
TS Anantharaman
Tyler Garvin
U Consortium
U Grimholt
V Douard
V Ravi
Vinaya Kumar Katneni
Vinod Scaria
Vladimir Trifonov
W Xue
WC Liew
Woei Chang Liew
WS Davidson
X Huang
X Zheng
XG Wang
XG Wang
Xueyan Shen
Y Guiguen
Y Han
Y Hashiguchi
Y Moriya
Y Sato
Y Sato
Y Sato
Z Lai
Ø Hammer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

Public Library of Science (PLOS)

ResearchOnline@JCU

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Research Repository

Repository of the Academy's Library

University of East Anglia digital repository

NSU Works

MPG.PuRe

Comprehensive splice-site analysis using comparative genomics

Author: Abril
Adrian R. Krainer
Ast
Barbosa-Morais
Brow
Brunak
Burge
Burset
Burset
Carmel
Chanfreau
Chong
Cocquet
Collins
Deirdre
Dietrich
Dietrich
Felsenstein
Frilander
Frilander
Hall
Hastings
Hastings
Hastings
Hertz
Hollins
Jackson
Kent
Kullback
Lallena
Lander
Latijnhouwers
Lee
Levine
Lim
Matlin
McConnell
Merendino
Michelle L. Hastings
Minovitsky
Montzka
Moore
Mount
Mount
Nihar Sheth
Otake
Parker
Patel
Patel
Pollard
Pruitt
Ravi Sachidanandam
Reed
Roca
Roca
Schneider
Schneider
Schneider
Senapathy
Shannon
Shapiro
Sharp
Smith
Sparks
Staley
Stephens
Tarn
Tarn
Ted Roeder
Thanaraj
Wassarman
Will
Wu
Wu
Wu
Xavier Roca
Yeo
Yeung
Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

We have collected over half a million splice sites from five species—Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana—and classified them into four subtypes: U2-type GT–AG and GC–AG and U12-type GT–AG and AT–AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT–AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3′ splice sites (3′ss) and (iv) distinct evolutionary histories of 5′ and 3′ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing

CiteSeerX

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

DR-NTU (Digital Repository of NTU)