Search CORE

105 research outputs found

Efficient learning of context-free grammars from positive structural examples

Author: Sakakibara Yasubumi
Publication venue: Published by Elsevier Inc.
Publication date: 31/03/1992
Field of study

AbstractIn this paper, we introduce a new normal form for context-free grammars, called reversible context-free grammars, for the problem of learning context-free grammars from positive-only examples. A context-free grammar G = (N, Σ, P, S) is said to be reversible if (1) A → α and B → α in P implies A = B and (2) A → αBβ and A → αCβ in P implies B = C. We show that the class of reversible context-free grammars can be identified in the limit from positive samples of structural descriptions and there exists an efficient algorithm to identify them from positive samples of structural descriptions, where a structural description of a context-free grammar is an unlabelled derivation tree of the grammar. This implies that if positive structural examples of a reversible context-free grammar for the target language are available to the learning algorithm, the full class of context-free languages can be learned efficiently from positive samples

Elsevier - Publisher Connector

Learning context-free grammars from structural data in polynomial time

Author: Sakakibara Yasubumi
Publication venue: Published by Elsevier B.V.
Publication date: 21/11/1990
Field of study

AbstractWe consider the problem of learning a context-free grammar from its structural descriptions. Structural descriptions of a context-free grammar are unlabelled derivation trees of the grammar. We present an efficient algorithm for learning context-free grammars using two types of queries: structural equivalence queries and structural membership queries. The learning protocol is based on what is called “minimally adequate teacher”, and it is shown that a grammar learned by the algorithm is not only a correct grammar, i.e. equivalent to the unknown grammar but also structurally equivalent to it. Furthermore, the algorithm runs in time polynomial in the number of states of the minimum frontier-to-root tree automaton for the set of structural descriptions of the unknown grammar and the maximum size of any counter-example returned by a structural equivalence query

Elsevier - Publisher Connector

Robust and accurate prediction of noncoding RNAs from aligned sequences

Author: Saito Yutaka
Sakakibara Yasubumi
Sato Kengo
Publication venue: BioMed Central
Publication date: 15/10/2010
Field of study

Springer - Publisher Connector

PubMed Central

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures

Author: Saito Yutaka
Sakakibara Yasubumi
Sato Kengo
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Improved Measurements of RNA Structure Conservation with Generalized Centroid Estimators

Author: Okada Yohei
Saito Yutaka
Sakakibara Yasubumi
Sato Kengo
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2011
Field of study

Identification of non-protein-coding RNAs (ncRNAs) in genomes is a crucial task for not only molecular cell biology but also bioinformatics. Secondary structures of ncRNAs are employed as a key feature of ncRNA analysis since biological functions of ncRNAs are deeply related to their secondary structures. Although the minimum free energy (MFE) structure of an RNA sequence is regarded as the most stable structure, MFE alone could not be an appropriate measure for identifying ncRNAs since the free energy is heavily biased by the nucleotide composition. Therefore, instead of MFE itself, several alternative measures for identifying ncRNAs have been proposed such as the structure conservation index (SCI) and the base pair distance (BPD), both of which employ MFE structures. However, these measurements are unfortunately not suitable for identifying ncRNAs in some cases including the genome-wide search and incur high false discovery rate. In this study, we propose improved measurements based on SCI and BPD, applying generalized centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that our proposed methods achieve higher accuracy than the original SCI and BPD for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the centroid-based SCI on CLUSTAL W alignments is more accurate than or comparable with that of the original SCI on structural alignments generated with RAF, a high quality structural aligner, for which twofold expensive computational time is required on average. We conclude that our methods are more suitable for genome-wide alignments which are of low quality from the point of view on secondary structures than the original SCI and BPD

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

On case-based learnability of languages

Author: Globig Christoph
Jantke Klaus P.
Lange Steffen
Sakakibara Yasubumi
Publication venue
Publication date: 17/01/2019
Field of study

Case-based reasoning is deemed an important technology to alleviate the bottleneck of knowledge acquisition in Artificial Intelligence (AI). In case-based reasoning, knowledge is represented in the form of particular cases with an appropriate similarity measure rather than any form of rules. The case-based reasoning paradigm adopts the view that an Al system is dynamically changing during its life-cycle which immediately leads to learning considerations. Within the present paper, we investigate the problem of case-based learning of indexable classes of formal languages. Prior to learning considerations, we study the problem of case-based representability and show that every indexable class is case-based representable with respect to a fixed similarity measure. Next, we investigate several models of case-based learning and systematically analyze their strengths as well as their limitations. Finally, the general approach to case-based learnability of indexable classes of formal languages is prototypically applied to so-called containmet decision lists, since they seem particularly tailored to case-based knowledge processing

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Qucosa - Publikationsserver der Universität Leipzig

Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

Author: Fujiyama Asao
Hachiya Tsuyoshi
Itaya Mitsuhiro
Nishito Yukari
Osana Yasunori
Popendorf Kris
Sakakibara Yasubumi
Toyoda Atsushi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background <it>Bacillus subtilis </it>natto is closely related to the laboratory standard strain <it>B. subtilis </it>Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated <it>B. subtilis </it>168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, <it>B. subtilis </it>natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. Results We applied a comparative genome assembly method, which combines <it>de novo </it>assembly and reference guided assembly, to one of the <it>B. subtilis </it>natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for <it>B. subtilis </it>natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of <it>degQ </it>and the coding region of <it>swrAA </it>as the wild strain, RO-FF-1. These are specific for γ-PGA production ability, which is related to natto production. Further, the <it>B. subtilis </it>natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. Conclusions The determination of the whole genome sequence of <it>Bacillus subtilis </it>natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that <it>B. subtilis </it>natto harbors but <it>B. subtilis </it>168 lacks. Multiple genome-level comparisons among five closely related <it>Bacillus </it>species were also carried out. The determined genome sequence of <it>B. subtilis </it>natto and gene annotations are available from the Natto genome browser <url>http://natto-genome.org/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Construction of a genetic AND gate under a new standard for assembly of genetic parts

Author: Ayukawa Shotaro
Hagiya Masami
Hamada Shogo
Kiga Daisuke
Kobayashi Akio
Murata Satoshi
Nakashima Yusaku
Sakakibara Yasubumi
Takagi Hidemasa
Uchiyama Masahiko
Yamamura Masayuki
Yugi Katsuyuki
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Appropriate regulation of respective gene expressions is a bottleneck for the realization of artificial biological systems inside living cells. The modification of several promoter sequences is required to achieve appropriate regulation of the systems. However, a time-consuming process is required for the insertion of an operator, a binding site of a protein for gene expression, to the gene regulatory region of a plasmid. Thus, a standardized method for integrating operator sequences to the regulatory region of a plasmid is required. Results We developed a standardized method for integrating operator sequences to the regulatory region of a plasmid and constructed a synthetic promoter that functions as a genetic AND gate. By standardizing the regulatory region of a plasmid and the operator parts, we established a platform for modular assembly of the operator parts. Moreover, by assembling two different operator parts on the regulatory region, we constructed a regulatory device with an AND gate function. Conclusions We implemented a new standard to assemble operator parts for construction of functional genetic logic gates. The logic gates at the molecular scale have important implications for reprogramming cellular behavior.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Dugesia ryukyuensis Database as a Molecular Resource for Studying Switching of the Reproductive System

Author: Agata Kiyokazu
Hirao Yukako
Hoshi Motonori
Ishizuka Hideyuki
Kawauchi Junpei
Kobayashi Kazuya
Maezawa Takanobu
Matsumoto Midori
Nakagawa Haruka
Nishimura Osamu
Nodono Hanae
Sakakibara Yasubumi
Sekii Kiyono
Tarui Hiroshi
Tasaka Kenta
Publication venue: 'Zoological Society of Japan'
Publication date: 01/01/2007
Field of study

The planarian Dugesia ryukyuensis reproduces both asexually and sexually, and can switch from one mode of reproduction to the other. We recently developed a method for experimentally switching reproduction of the planarian from the asexual to the sexual mode. We constructed a cDNA library from sexualized D. ryukyuensis and sequenced and analyzed 8,988 expressed sequence tags (ESTs). The ESTs were analyzed and grouped into 3,077 non-redundant sequences, leaving 1,929 singletons that formed the basis of unigene sets. Fifty-six percent of the cDNAs analyzed shared similarity (E-value<1E -20) with sequences deposited in NCBI. Highly redundant sequences encoded granulin and actin, which are expressed in the whole body, and other redundant sequences encoded a Vasa-like protein, which is known to be a component of germ-line cells and is expressed in the ovary, and Y-protein, which is expressed in the testis. The sexualized planarian expressed sequence tag database (http://planaria.bio.keio.ac.jp/planaria/) is an open-access, online resource providing access to sequence, classification, clustering, and annotation data. This database should constitute a powerful tool for analyzing sexualization in planarians

Kyoto University Research Information Repository

Directed acyclic graph kernels for structural RNA analysis

Author: B Knudsen
B Schölkopf
CB Do
D Haussler
D Sankoff
DB Searls
DM Tax
E Rivas
EK Freyhult
H Kiryu
H Saigo
I Holmes
IL Hofacker
IL Hofacker
J Hertel
J Hertel
JD Thompson
JS McCaskill
JS Pedersen
JW Brown
K Sato
Kengo Sato
Kiyoshi Asai
MA Rosenblad
P Pacheco
RD Dowell
RE Fan
RJ Klein
S Washietl
S Washietl
S Will
SR Eddy
SR Eddy
SR Eddy
T Babak
T Kin
Toutai Mituyama
W Deng
Y Sakakibara
Y Sakakibara
Y Sakakibara
Yasubumi Sakakibara
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between two RNA sequences from the viewpoint of secondary structures. However, applying stem kernels directly to large data sets of ncRNAs is impractical due to their computational complexity. Results We have developed a new technique based on directed acyclic graphs (DAGs) derived from base-pairing probability matrices of RNA sequences that significantly increases the computation speed of stem kernels. Furthermore, we propose profile-profile stem kernels for multiple alignments of RNA sequences which utilize base-pairing probability matrices for multiple alignments instead of those for individual sequences. Our kernels outperformed the existing methods with respect to the detection of known ncRNAs and kernel hierarchical clustering. Conclusion Stem kernels can be utilized as a reliable similarity measure of structural RNAs, and can be used in various kernel-based applications.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central