Search CORE

30 research outputs found

Correlation between amino acid residues converted by RNA editing and functional residues in protein three-dimensional structures in plant organelles

Author: Go Mitiko
Yura Kei
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In plant organelles, specific messenger RNAs (mRNAs) are subjected to conversion editing, a process that often converts the first or second nucleotide of a codon and hence the encoded amino acid. No systematic patterns in converted sites were found on mRNAs, and the converted sites rarely encoded residues located at the active sites of proteins. The role and origin of RNA editing in plant organelles remain to be elucidated. Results Here we study the relationship between amino acid residues encoded by edited codons and the structural characteristics of these residues within proteins, e.g., in protein-protein interfaces, elements of secondary structure, or protein structural cores. We find that the residues encoded by edited codons are significantly biased toward involvement in helices and protein structural cores. RNA editing can convert codons for hydrophilic to hydrophobic amino acids. Hence, only the edited form of an mRNA can be translated into a polypeptide with helix-preferring and core-forming residues at the appropriate positions, which is often required for a protein to form a functional three-dimensional (3D) structure. Conclusion We have performed a novel analysis of the location of residues affected by RNA editing in proteins in plant organelles. This study documents that RNA editing sites are often found in positions important for 3D structure formation. Without RNA editing, protein folding will not occur properly, thus affecting gene expression. We suggest that RNA editing may have conferring evolutionary advantage by acting as a mechanism to reduce susceptibility to DNA damage by allowing the increase in GC content in DNA while maintaining RNA codons essential to encode residues required for protein folding and activity.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility

Author: Go Mitiko
Hijikata Atsushi
Noguti Tosiyuki
Yura Kei
Publication venue: Wiley Subscription Services, Inc., A Wiley Company
Publication date
Field of study

In comparative modeling, the quality of amino acid sequence alignment still constitutes a major bottleneck in the generation of high quality models of protein three-dimensional (3D) structures. Substantial efforts have been made to improve alignment quality by revising the substitution matrix, introducing multiple sequences, replacing dynamic programming with hidden Markov models, and incorporating 3D structure information. Improvements in the gap penalty have not been a major focus, however, following the development of the affine gap penalty and of the secondary structure dependent gap penalty. We revisited the correlation between protein 3D structure and gap location in a large protein 3D structure data set, and found that the frequency of gap locations approximated to an exponential function of the solvent accessibility of the inserted residues. The nonlinearity of the gap frequency as a function of accessibility corresponded well to the relationship between residue mutation pattern and residue accessibility. By introducing this relationship into the gap penalty calculation for pairwise alignment between template and target amino acid sequences, we were able to obtain a sequence alignment much closer to the structural alignment. The quality of the alignments was substantially improved on a pair of sequences with identity in the “twilight zone” between 20 and 40%. The relocation of gaps by our new method made a significant improvement in comparative modeling, exemplified here by the Bacillus subtilis yitF protein. The method was implemented in a computer program, ALAdeGAP (ALignment with Accessibility dependent GAp Penalty), which is available at http://cib.cf.ocha.ac.jp/target_protein/. Proteins 2011; © 2011 Wiley-Liss, Inc

Crossref

PubMed Central

RESOPS: A Database for Analyzing the Correspondence of RNA Editing Sites to Protein Three-Dimensional Structures

Author: Altschul
Benson
Berman
Bock
Bock
Bonnard
Cai
Chateigner-Boutin
Chen
Cochrane
Covello
Covello
Creighton
Cummings
Du
Du
Freyer
Giege
Go
Gott
Gray
Hiesel
Hoch
Keegan
Kei Yura
Kim
Kotera
Kozaki
Kugita
Loladze
Masafumi Shionyu
Mitiko Go
Mower
Phreaner
Powell
Robbins
Sasaki
Shrake
Sintawee Sulaiman
Sugawara
Thompson
Vos
Wakasugi
Yoshinaga
Yosuke Hatta
Yu
Yura
Yura
Zehrmann
Zhou
Zito
Publication venue: Oxford University Press
Publication date
Field of study

Transcripts from mitochondrial and chloroplast DNA of land plants often undergo cytidine to uridine conversion-type RNA editing events. RESOPS is a newly built database that specializes in displaying RNA editing sites of land plant organelles on protein three-dimensional (3D) structures to help elucidate the mechanisms of RNA editing for gene expression regulation. RESOPS contains the following information: unedited and edited cDNA sequences with notes for the target nucleotides of RNA editing, conceptual translation from the edited cDNA sequence in pseudo-UniProt format, a list of proteins under the influence of RNA editing, multiple amino acid sequence alignments of edited proteins, the location of amino acid residues coded by codons under the influence of RNA editing in protein 3D structures and the statistics of biased distributions of the edited residues with respect to protein structures. Most of the data processing procedures are automated; hence, it is easy to keep abreast of updated genome and protein 3D structural data. In the RESOPS database, we clarified that the locations of residues switched by RNA editing are significantly biased to protein structural cores. The integration of different types of data in the database also help advance the understanding of RNA editing mechanisms. RESOPS is accessible at http://cib.cf.ocha.ac.jp/RNAEDITING/

Crossref

PubMed Central

Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

Author: Akiva
Burset
Carninci
Chie Motono
Croft
Danielle Thierry-Mieg
Ewing
Ewing
Faber
Fairbrother
Fairbrother
Gilbert
Hide
Hiroko Hata
Imanishi
Jean Thierry-Mieg
Jun-ichi Takeda
Kanako O. Koyanagi
Karin
Kei Yura
Keiichi Nagai
Kim
Kimura
Kochiwa
Ladd
Lander
Landry
Lee
Lejeune
Lev-Maor
Lihua Jin
Lopez
Magrangeas
Masafumi Shionyu
Mitiko Go
Mitsuteru Nakao
Modrek
Modrek
Modrek
Nakao
Nakao
Nobuo Nomura
Ota
Oyama
Peters
Roberto A. Barrero
Scharf
Schmucker
Smith
Stamm
Stefan Wiemann
Strausberg
Sumio Sugano
Tadashi Imanishi
Takao Isogai
Takashi Gojobori
Tetsuji Otsuki
Vladimir Kuryshev
Wiemann
Wiemann
Will
Wojtowicz
Xing
Yeo
Yutaka Suzuki
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants

Crossref

PubMed Central

Queensland University of Technology ePrints Archive

Research Repository

Coverage of whole proteome by structural genomics observed through protein homology modeling database

Author: A Andreeva
A Krogh
A McPherson
A Stark
A Yamaguchi
AE Todd
Akihiro Yamaguchi
B Contreras-Moreira
B Contreras-Moreira
B John
C Chothia
CA Orengo
CJ Oldfield
D Baker
D Petrey
D Vitkup
DD Leipe
E Dobrovetsky
Editorial Board
EV Koonin
FS Domingues
G Liu
HJ Dyson
HM Berman
IM Wallace
J Kopp
J Kyte
J-M Chandonia
K Kinoshita
K Lundstrom
Kei Yura
L Lo Conte
L Stein
L Xie
LJ DeLucas
M Iwadate
M Ota
MA Marti-Renom
Mitiko Go
MJ Sippl
MO Dayhoff
N O’Toole
O Lichtarge
P Walian
R Linding
R Service
RA Laskowski
RA Laskowski
RF Doolittle
RR Copley
S Goldsmith-Fischman
S Tsoka
S Yokoyama
S-H Kim
S-H Kim
SE Brenner
SJ Campbell
SJ Wodak
SK Burley
SK Burley
T Hirokawa
T Kawabata
U Pieper
V Serre
Y Kyogoku
Publication venue: Kluwer Academic Publishers
Publication date: 01/01/2006
Field of study

We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics

Crossref

Springer - Publisher Connector

PubMed Central

Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

Author: Amid Clara
Apweiler Rolf
Ashurst Jennifer
Auffray Charles
Barrero Roberto A
Bellgard Matthew
Bonaldo Maria de Fatima
Bono Hidemasa
Bromberg Susan K
Brookes Anthony J
Bruford Elspeth
Carninci Piero
Chakraborty Ranajit
Chelala Claude
Chen Zhu
Couillault Christine
Debily Marie-Anne
Devignes Marie-Dominique
Dubchak Inna
Endo Toshinori
Estreicher Anne
Eveno Eric
Eyras Eduardo
Fujii Yasuyuki
Fukami-Kobayashi Kaoru
Fukuchi Satoshi
Go Mitiko
Gojobori Takashi
Gough Craig
Graudens Esther
Hahn Yoonsoo
Han Michael
Han Ze-Guang
Hanada Kousuke
Hanaoka Hideki
Harada Erimi
Hashimoto Katsuyuki
Hayashizaki Yoshihide
Hide Winston
Hilton Phillip
Hinz Ursula
Hirai Momoki
Hirakawa Mika
Hishiki Teruyoshi
Homma Keiichi
Hopkinson Ian
Ikeo Kazuho
Imanishi Tadashi
Imbeaud Sandrine
Inoko Hidetoshi
Isogai Takao
Itoh Takeshi
Jia Libin
Jin Lihua
Kanapin Alexander
Kanehisa Minoru
Kaneko Yayoi
Karavidopoulou Youla
Kasprzyk Arek
Kasukawa Takeya
Kelso Janet
Kersey Paul
Kikuno Reiko
Kim Sangsoo
Kimura Kouichi
Korn Bernhard
Koyanagi Kanako O
Kuryshev Vladimir
Lenhard Boris
Makalowska Izabela
Makalowski Wojciech
Makino Takashi
Mano Shuhei
Mariage-Samson Regine
Mashima Jun
Matsuda Hideo
Mewes Hans-Werner
Minoshima Shinsei
Miyazaki Satoru
Mulder Nicola
Nagai Keiichi
Nagasaki Hideki
Nagata Naoki
Nakai Kenta
Nakao Mitsuteru
Nigam Rajni
Nishikawa Ken
Nishikawa Tetsuo
Nomura Nobuo
O'Donovan Claire
Ogasawara Osamu
Ohara Osamu
Ohtsubo Masafumi
Oishi Michio
Okada Norihiro
Okazaki Yasushi
Okido Toshihisa
Okubo Kousaku
Oota Satoshi
Ota Motonori
Ota Toshio
Otsuki Tetsuji
Piatier-Tonneau Dominique
Poustka Annemarie
Quackenbush John
R. Gopinath Gopal
Ren Shuang-Xi
Richard Roberts
Saitou Naruya
Sakai Hiroaki
Sakai Katsunaga
Sakaki Yoshiyuki
Sakamoto Shigetaka
Sakate Ryuichi
Schupp Ingo
Servant Florence
Sherry Stephen
Shiba Rie
Shimizu Nobuyoshi
Shimoyama Mary
Simpson Andrew J
Soares Bento
Souza Sandro J. de
Steward Charles
Stodolsky Marvin
Strausberg Robert L
Sugano Sumio
Sugawara Hideaki
Suwa Makiko
Suzuki Mami
Suzuki Yoshiyuki
Suzuki Yutaka
Takagi Toshihisa
Takahashi Aiko
Takeda Jun-ichi
Tamiya Gen
Tamura Takuro
Tanaka Hiroshi
Tanaka Susumu
Tanino Motohiko
Tateno Yoshio
Taylor Todd
Terwilliger Joseph D
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thomas Michael A
Tonellato Peter
Unneberg Per
Veeramachaneni Vamsi
Wagner Lukas
Watanabe Shinya
Wiemann Stefan
Wilming Laurens
Yamaguchi-Kabata Yumi
Yamasaki Chisato
Yasuda Norikazu
Yasuda Tomohiro
Yoo Hyang-Sook
Yura Kei
Publication venue: Public Library of Science
Publication date: 01/01/2004
Field of study

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

Research Repository

Hokkaido University Collection of Scholarly and Academic Papers

UPF Digital Repository

White Rose Research Online

MPG.PuRe