153 research outputs found

    Spectral approach to linear programming bounds on codes

    Full text link
    We give new proofs of asymptotic upper bounds of coding theory obtained within the frame of Delsarte's linear programming method. The proofs rely on the analysis of eigenvectors of some finite-dimensional operators related to orthogonal polynomials. The examples of the method considered in the paper include binary codes, binary constant-weight codes, spherical codes, and codes in the projective spaces.Comment: 11 pages, submitte

    Online Pattern Matching for String Edit Distance with Moves

    Full text link
    Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

    Fast phonetic similarity search over large repositories

    Get PDF
    Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors

    Use of Linear Error-Correcting Subcodes in Flow Watermarking for Channels with Substitution and Deletion Errors

    Full text link
    International audienceAn invisible flow watermarking QIM scheme based on linear error-correcting subcodes for channels with substitution and deletion errors is proposed in this paper. The evaluation of scheme demonstrates similar to known scheme performance but with lower complexity as soon as its implementation is mainly based on linear decoding operations

    Quantifying similarity in animal vocal sequences: Which metric performs best?

    Get PDF
    1. Many animals communicate using sequences of discrete acoustic elements which can be complex, vary in their degree of stereotypy, and are potentially open-ended. Variation in sequences can provide important ecological, behavioural, or evolutionary information about the structure and connectivity of populations, mechanisms for vocal cultural evolution, and the underlying drivers responsible for these processes. Various mathematical techniques have been used to form a realistic approximation of sequence similarity for such tasks. 2. Here, we use both simulated and empirical datasets from animal vocal sequences (rock hyrax, Procavia capensis; humpback whale, Megaptera novaeangliae; bottlenose dolphin, Tursiops truncatus; and Carolina chickadee, Poecile carolinensis) to test which of eight sequence analysis metrics are more likely to reconstruct the information encoded in the sequences, and to test the fidelity of estimation of model parameters, when the sequences are assumed to conform to particular statistical models. 3. Results from the simulated data indicated that multiple metrics were equally successful in reconstructing the information encoded in the sequences of simulated individuals (Markov chains, n-gram models, repeat distribution, and edit distance), and data generated by different stochastic processes (entropy rate and n-grams). However, the string edit (Levenshtein) distance performed consistently and significantly better than all other tested metrics (including entropy, Markov chains, n-grams, mutual information) for all empirical datasets, despite being less commonly used in the field of animal acoustic communication. 4. The Levenshtein distance metric provides a robust analytical approach that should be considered in the comparison of animal acoustic sequences in preference to other commonly employed techniques (such as Markov chains, hidden Markov models, or Shannon entropy). The recent discovery that non-Markovian vocal sequences may be more common in animal communication than previously thought, provides a rich area for future research that requires non-Markovian based analysis techniques to investigate animal grammars and potentially the origin of human language.We thank Melinda Rekdahl, Todd Freeberg and his graduate students, Amiyaal Ilany, Elizabeth Hobson, and Jessica Crance for providing comments of on a previous version of this manuscript. We thank Mike Noad, Melinda Rekdahl, and Claire Garrigue for assistance with humpback whale song collection and initial categorisation of the song, Vincent Janik and Laela Sayigh for assistance with signature whistle collection, Todd Freeberg with chickadee recordings, and Eli Geffen and Amiyaal Ilany for assistance with hyrax song collection and analysis. E.C.G is supported by a Newton International Fellowship. Part of this work was conducted while E.C.G. was supported by a National Research Council (National Academy of Sciences) Postdoctoral Fellowship at the National Marine Mammal Laboratory, AFSC, NMFS, NOAA. The findings and conclusions in this paper are those of the authors and do not necessarily represent the views of the National Marine Fisheries Service. We would also like to thank Randall Wells and the Sarasota Dolphin Research Program for the opportunity to record the Sarasota dolphins, where data were collected under a series of National Marine Fisheries Service Scientific Research Permits issued to Randall Wells. A.K. is supported by the Herchel Smith Postdoctoral Fellowship Fund. Part of this work was conducted while A.K. was a Postdoctoral Fellow at the National Institute for Mathematical and Biological Synthesis, an Institute sponsored by the National Science Foundation through NSF Award #DBI-1300426, with additional support from The University of Tennessee, Knoxville.This is the author accepted manuscript. The final version is available from Wiley via http://dx.doi.org/10.1111/2041-210X.1243

    Taking SPARQL 1.1 extensions into account in the SWIP system

    Get PDF
    International audienceThe SWIP system aims at hiding the complexity of expressing a query in a graph query language such as SPARQL. We propose a mechanism by which a query expressed in natural language is translated into a SPARQL query. Our system analyses the sentence in order to exhibit concepts, instances and relations. Then it generates a query in an internal format called the pivot language. Finally, it selects pre-written query patterns and instantiates them with regard to the keywords of the initial query. These queries are presented by means of explicative natural language sentences among which the user can select the query he/she is actually interested in. We are currently focusing on new kinds of queries which are handled by the new version of our system, which is now based on the 1.1 version of SPARQL

    RefConcile – automated online reconciliation of bibliographic references

    Get PDF
    Comprehensive bibliographies often rely on community contributions. In such a setting, de-duplication is mandatory for the bibliography to be useful. Ideally, it works online, i.e., during the addition of new references, so the bibliography remains duplicate-free at all times. While de-duplication is well researched, generic approaches do not achieve the result quality required for automated reconciliation. To overcome this problem, we propose a new duplicate detection and reconciliation technique called RefConcile. Aimed specifically at bibliographic references, it uses dedicated blocking and matching techniques tailored to this type of data. Our evaluation based on a large real-world collection of bibliographic references shows that RefConcile scales well, and that it detects and reconciles duplicates highly accurately
    corecore