    Compressed Subsequence Matching and Packed Tree Coloring

    We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size nn compressing a string of size NN and a pattern string of size mm over an alphabet of size σ\sigma, our algorithm uses O(n+nσw)O(n+\frac{n\sigma}{w}) space and O(n+nσw+mlog⁥Nlog⁥w⋅occ)O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ) or O(n+nσwlog⁥w+mlog⁥N⋅occ)O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ) time. Here ww is the word size and occocc is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for occ=o(nlog⁥N)occ=o(\frac{n}{\log N}) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1

    Intra-plant variability in seed size and seed quality in Lupinus albus L

    The origin of variation in seed size and seed quality was studied in 3 genotypes with different architectures (indeterminate, dwarf, determinate) of autumn-sown white lupin under a range of cropping conditions. The environmental conditions (year and density) and the origin of the seed (genotype and pod level) influenced the mean seed weight. The different pod levels corresponded to the pods produced on the mainstem, the primary branches and the secondary branches. The year significantly influenced the mean seed weight of all the genotypes, whatever the density and the pod level. The reduction of the density from 20 to 12 plants/m2 had no effect on the yield and slightly reduced the mean seed weight. For a given genotype, the pod level that produced the greatest number of seeds also produced the largest seeds. The major origin of the variation of the individual seed mass was the within-pod-level variation, as also reported in other species with a determinate or indeterminate growth habit. The autumn-sown white lupin genotype with a determinate architecture showed a reduced within-pod-level variation in individual seed size. The consequences of the genotype and the husbandry techniques on the quality of the seed were analysed. The protein content was mainly influenced by the genotype and was not affected by the density or the pod level. The oil content was highest in seeds from the highest pod level. The possible origin of this feature is discussed. The oil content was also influenced by the genotype but the stand density had no effect.L’origine des variations de la taille du grain et de la qualitĂ© de la graine a Ă©tĂ© Ă©tudiĂ©e sous diffĂ©rentes conditions de culture chez 3 gĂ©notypes de lupin blanc d’hiver prĂ©sentant des architectures contrastĂ©es (indĂ©terminĂ©e, naine et dĂ©terminĂ©e). Les conditions de culture (annĂ©e et densitĂ© de culture) et l’origine de la graine (gĂ©notype et niveau des gousses) influencent le poids moyen d’un grain. Les diffĂ©rents niveaux de gousses correspondent Ă  la tige principale, aux ramifications primaires et aux ramifications secondaires des plantes de lupin. L’annĂ©e de culture a un effet significatif sur le poids moyen d’un grain de tous les gĂ©notypes quels que soient la densitĂ© de culture et le niveau de gousses considĂ©rĂ©. La rĂ©duction de la densitĂ© de culture de 20 Ă  12 plantes/m2 n’a pas d’effet sur le rendement et rĂ©duit lĂ©gĂšrement le poids moyen d’un grain. Pour un gĂ©notype donnĂ©, le niveau de gousses ayant produit le plus grand nombre de grains a aussi produit les plus gros grains. La principale source de variation de la taille individuelle d’un grain est intra-inflorescence comme cela a Ă©tĂ© rapportĂ© chez diffĂ©rentes espĂšces, qu’elles soient Ă  croissance indĂ©terminĂ©e ou dĂ©terminĂ©e. Chez le lupin blanc, le gĂ©notype Ă  croissance dĂ©terminĂ©e est celui qui prĂ©sente la plus faible variance intra-inflorescence. Les consĂ©quences du gĂ©notype et des techniques de culture sur la qualitĂ© des graines ont Ă©tĂ© analysĂ©es. La teneur en protĂ©ines est principalement influencĂ©e par le gĂ©notype et n’est pas affectĂ©e par la densitĂ© ou par le niveau de gousses. La teneur en huile augmente dans les niveaux supĂ©rieurs du couvert. L’origine possible de cette situation est discutĂ©e. La teneur en huile est aussi influencĂ©e par le gĂ©notype mais la densitĂ© de culture n’a pas d’effet. La variation pour la taille du grain Ă  l’intĂ©rieur d’une inflorescence a peu d’effet sur la teneur en huile et aucun sur la teneur en protĂ©ines

    Substring Range Reporting

    We revisit various string indexing problems with range reporting features, namely, position-restricted substring searching, indexing substrings with gaps, and indexing substrings with intervals. We obtain the following main results. We give efficient reductions for each of the above problems to a new problem, which we call substring range reporting. Hence, we unify the previous work by showing that we may restrict our attention to a single problem rather than studying each of the above problems individually.We show how to solve substring range reporting with optimal query time and little space. Combined with our reductions this leads to significantly improved time-space trade-offs for the above problems. In particular, for each problem we obtain the first solutions with optimal time query and O(nlog O(1) n) space, where n is the length of the indexed string.We show that our techniques for substring range reporting generalize to substring range counting and substring range emptiness variants. We also obtain non-trivial time-space trade-offs for these problems. Our bounds for substring range reporting are based on a novel combination of suffix trees and range reporting data structures. The reductions are simple and general and may apply to other combinations of string indexing with range reporting