Search CORE

8 research outputs found

An Improved Data Structure for Left-Right Maximal Generic Words Problem

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Nakashima Yuto
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th International Symposium on Algorithms and Computation (ISAAC 2019)
Publication date: 01/01/2019
Field of study

For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p|+ r * log n + o * log^2 n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents

Dagstuhl Research Online Publication Server

Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Nakashima Yuto
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Algorithms and Computation (ISAAC 2017)
Publication date: 01/01/2017
Field of study

We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m alpha(m)) time and O(m) space, where alpha denotes the inverse Ackermann function

Dagstuhl Research Online Publication Server

Faster STR-IC-LCS Computation via RLE

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Kuboi Keita
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz\u27s algorithm, and is faster when input strings are compressible via RLE

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Takeda Masayuki
Tsujimaru Yuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)
Publication date: 01/01/2016
Field of study

The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n + |MAW(y)|) time and O(n) working space for integer alphabets

Dagstuhl Research Online Publication Server

Linear-time Computation of DAWGs, Symmetric Indexing Structures, and MAWs for Integer Alphabets

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Takeda Masayuki
Tsujimaru Yuki
Publication venue
Publication date: 03/07/2023
Field of study

The directed acyclic word graph (DAWG) of a string

y

of length

n

is the smallest (partial) DFA which recognizes all suffixes of

y

with only

O(n)

nodes and edges. In this paper, we show how to construct the DAWG for the input string

y

from the suffix tree for

y

, in

O(n)

time for integer alphabets of polynomial size in

n

. In so doing, we first describe a folklore algorithm which, given the suffix tree for

y

, constructs the DAWG for the reversed string of

y

O(n)

time. Then, we present our algorithm that builds the DAWG for

y

O(n)

time for integer alphabets, from the suffix tree for

y

. We also show that a straightforward modification to our DAWG construction algorithm leads to the first

O(n)

-time algorithm for constructing the affix tree of a given string

y

over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. We then discuss how our constructions can lead to linear-time algorithms for building other text indexing structures, such as linear-size suffix tries and symmetric CDAWGs in linear time in the case of integer alphabets. As a further application to our

O(n)

-time DAWG construction algorithm, we show that the set

\mathsf{MAW}(y)

of all minimal absent words (MAWs) of

y

can be computed in optimal, input- and output-sensitive

O(n + |\mathsf{MAW}(y)|)

time and

O(n)

working space for integer alphabets.Comment: This is an extended version of the paper "Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets" from MFCS 201

arXiv.org e-Print Archive

Explainable and Local Correction of Classification Models Using Decision Trees

Author: Fujishige Yuta
Goto Keisuke
Hara Satoshi
Iwashita Hiroaki
Suzuki Hirofumi
Takagi Takuya
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 28/06/2022
Field of study

In practical machine learning, models are frequently updated, or corrected, to adapt to new datasets. In this study, we pose two challenges to model correction. First, the effects of corrections to the end-users need to be described explicitly, similar to standard software where the corrections are described as release notes. Second, the amount of corrections need to be small so that the corrected models perform similarly to the old models. In this study, we propose the first model correction method for classification models that resolves these two challenges. Our idea is to use an additional decision tree to correct the output of the old models. Thanks to the explainability of decision trees, the corrections are describable to the end-users, which resolves the first challenge. We resolve the second challenge by incorporating the amount of corrections when training the additional decision tree so that the effects of corrections to be small. Experiments on real data confirm the effectiveness of the proposed method compared to existing correction methods

Association for the Advancement of Artificial Intelligence: AAAI Publications

The R2R3-MYB transcription factor MiMYB1 regulates light dependent red coloration of ‘Irwin’ mango fruit skin

Author: Albert
Allan
An
Asuka Ichihi
Azuma
Azuma
Azuma
Azuma
Bajpai
Ban
Berardini
Berardini
Buganic
Chagne
Chonhenchob
Coelho
Dorta
Dubos
Espley
Feller
Feng
Feng
Gesell
Gonzalez
Grotewold
Hellens
Heppel
Hoang
Hofman
Holton
Honda
Hudina
Jia
Ju
Kanzaki
Karanjalker
Karanjalker
Kobayashi
Kobayashi
Kobayashi
Koeda
Koes
Kosuke Shimizu
Koyama
Kui
Kumar
Lai
Li
Li
Liu
Lopez-Cobo
Matus
Moriya
Niu
Palapol
Ramsay
Ravaglia
Saito
Senghor
Shi
Shiina Fujishige
Shinya Kanzaki
Sivankalyani
Sooriyapathirana
Sota Koeda
Stracke
Sudheeran
Sudheeran
Takos
Tanaka
Umemura
Viola
Walker
Wan
Wang
Wei
Wu
Yamagishi
Yuta Tanaka
Zhang
Zhu
Zoratti
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref