1,526,399 research outputs found

    String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure

    Full text link
    Burrows-Wheeler transform (BWT) is an invertible text transformation that, given a text TT of length nn, permutes its symbols according to the lexicographic order of suffixes of TT. BWT is one of the most heavily studied algorithms in data compression with numerous applications in indexing, sequence analysis, and bioinformatics. Its construction is a bottleneck in many scenarios, and settling the complexity of this task is one of the most important unsolved problems in sequence analysis that has remained open for 25 years. Given a binary string of length nn, occupying O(n/logn)O(n/\log n) machine words, the BWT construction algorithm due to Hon et al. (SIAM J. Comput., 2009) runs in O(n)O(n) time and O(n/logn)O(n/\log n) space. Recent advancements (Belazzougui, STOC 2014, and Munro et al., SODA 2017) focus on removing the alphabet-size dependency in the time complexity, but they still require Ω(n)\Omega(n) time. In this paper, we propose the first algorithm that breaks the O(n)O(n)-time barrier for BWT construction. Given a binary string of length nn, our procedure builds the Burrows-Wheeler transform in O(n/logn)O(n/\sqrt{\log n}) time and O(n/logn)O(n/\log n) space. We complement this result with a conditional lower bound proving that any further progress in the time complexity of BWT construction would yield faster algorithms for the very well studied problem of counting inversions: it would improve the state-of-the-art O(mlogm)O(m\sqrt{\log m})-time solution by Chan and P\v{a}tra\c{s}cu (SODA 2010). Our algorithm is based on a novel concept of string synchronizing sets, which is of independent interest. As one of the applications, we show that this technique lets us design a data structure of the optimal size O(n/logn)O(n/\log n) that answers Longest Common Extension queries (LCE queries) in O(1)O(1) time and, furthermore, can be deterministically constructed in the optimal O(n/logn)O(n/\log n) time.Comment: Full version of a paper accepted to STOC 201

    Counting "exotics"

    Get PDF
    An introduced or exotic species is commonly defined as an organism accidentally or intentionally introduced to a new location by human activity (Williamson 1996; Richardson et al. 2000; Guo and Ricklefs 2010). However, the counting of exotics is often inconsistent. For example, in the US, previously published plant richness data for each state are only those either native or exotic to the US (USDA and NRCS 2004), not actually to the state. Yet, within-country (e.g., among states, counties) species introductions which form “homegrown exotics” (Cox 1999) or “native invaders” (Simberloff 2011) are undoubtedly numerous. The growing human population and associated activity increase species introductions at all levels, both international and internal but to date intercontinental species introductions have always been the focus. Those species introduced among neighboring areas are often unnoticed but they are actually far more frequent due to the proximity and environmental similarities. Many domestic exotic plant species exhibit high invasiveness such as Spartina alterniflora (smooth cordgrass; introduced from the east coast to California) and Molothrus ater (brown-headed cowbird; introduced from the Great Plains to California)

    Counting Carambolas

    Full text link
    We give upper and lower bounds on the maximum and minimum number of geometric configurations of various kinds present (as subgraphs) in a triangulation of nn points in the plane. Configurations of interest include \emph{convex polygons}, \emph{star-shaped polygons} and \emph{monotone paths}. We also consider related problems for \emph{directed} planar straight-line graphs.Comment: update reflects journal version, to appear in Graphs and Combinatorics; 18 pages, 13 figure

    Counting monomials

    Get PDF
    This paper presents two enumeration techniques based on Hilbert functions. The paper illustrates these techniques by solving two chessboard problems

    Double Counting in LDA+DMFT - The Example of NiO

    Get PDF
    An intrinsic issue of the LDA+DMFT approach is the so called double counting of interaction terms. How to choose the double-counting potential in a manner that is both physically sound and consistent is unknown. We have conducted an extensive study of the charge transfer system NiO in the LDA+DMFT framework using quantum Monte Carlo and exact diagonalization as impurity solvers. By explicitly treating the double-counting correction as an adjustable parameter we systematically investigated the effects of different choices for the double counting on the spectral function. Different methods for fixing the double counting can drive the result from Mott insulating to almost metallic. We propose a reasonable scheme for the determination of double-counting corrections for insulating systems.Comment: 7 pages, 6 figure

    Field-normalized citation impact indicators and the choice of an appropriate counting method

    Full text link
    Bibliometric studies often rely on field-normalized citation impact indicators in order to make comparisons between scientific fields. We discuss the connection between field normalization and the choice of a counting method for handling publications with multiple co-authors. Our focus is on the choice between full counting and fractional counting. Based on an extensive theoretical and empirical analysis, we argue that properly field-normalized results cannot be obtained when full counting is used. Fractional counting does provide results that are properly field normalized. We therefore recommend the use of fractional counting in bibliometric studies that require field normalization, especially in studies at the level of countries and research organizations. We also compare different variants of fractional counting. In general, it seems best to use either the author-level or the address-level variant of fractional counting

    Analysis of General Power Counting Rules in Effective Field Theory

    Full text link
    We derive the general counting rules for a quantum effective field theory (EFT) in d\mathsf{d} dimensions. The rules are valid for strongly and weakly coupled theories, and predict that all kinetic energy terms are canonically normalized. They determine the energy dependence of scattering cross sections in the range of validity of the EFT expansion. We show that the size of cross sections is controlled by the Λ\Lambda power counting of EFT, not by chiral counting, even for chiral perturbation theory (χ\chiPT). The relation between Λ\Lambda and ff is generalized to d\mathsf{d} dimensions. We show that the naive dimensional analysis 4π4\pi counting is related to \hbar counting. The EFT counting rules are applied to χ\chiPT, low-energy weak interactions, Standard Model EFT and the non-trivial case of Higgs EFT.Comment: V2: more details and examples added; version published in journal. 17 pages, 4 figures, 2 table

    Counting Supertubes

    Full text link
    The quantum states of the supertube are counted by directly quantizing the linearized Born-Infeld action near the round tube. The result is an entropy S=2π2(QD0QF1J)S = 2\pi \sqrt{2 (Q_{D0}Q_{F1}-J)}, in accord with conjectures in the literature. As a result, supertubes may be the generic D0-F1 bound state. Our approach also shows directly that supertubes are marginal bound states with a discrete spectrum. We also discuss the relation to recent suggestions of Mathur et al involving three-charge black holes.Comment: 15 pages, v2: reference corrected; v3: few corrections and explicit derivation of a relation are added to appendix

    People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting

    Get PDF
    In this paper we propose a technique to adapt a convolutional neural network (CNN) based object counter to additional visual domains and object types while still preserving the original counting function. Domain-specific normalisation and scaling operators are trained to allow the model to adjust to the statistical distributions of the various visual domains. The developed adaptation technique is used to produce a singular patch-based counting regressor capable of counting various object types including people, vehicles, cell nuclei and wildlife. As part of this study a challenging new cell counting dataset in the context of tissue culture and patient diagnosis is constructed. This new collection, referred to as the Dublin Cell Counting (DCC) dataset, is the first of its kind to be made available to the wider computer vision community. State-of-the-art object counting performance is achieved in both the Shanghaitech (parts A and B) and Penguins datasets while competitive performance is observed on the TRANCOS and Modified Bone Marrow (MBM) datasets, all using a shared counting model.Comment: 10 page
    corecore