185,059 research outputs found
Online Dictionary - Tool for Preservation of Language Heritage
The paper aims to represent a bilingual online dictionary as a useful
tool helping preservation of the natural languages. The author focuses on the
approach that was taken to develop compatible bilingual lexical database for the
Bulgarian-Polish online dictionary. A formal model for the dictionary encoding
is developed in accordance with the complex structures of the dictionary entries. These structures vary depending on the grammatical characteristics of
Bulgarian headwords. The Web-application for presentation of the bilingual
dictionary is also describred
Apprentissage de Dictionnaires Multivariés et Décomposition Parcimonieuse Invariante par Translation et par Rotation 2D
National audienceThis article presents a new tool, Multivariate Dictionary Learning Algorithm, able to learn online the elementary structures associated to a multivariate signals set. Once learned, Multivariate Orthogonal Matching Pursuit codes sparsely all signals of this set. These methods are specified to the 2D rotation-invariant case which induces a small-sized kernels dictionary. Our methods are applied to 2D handwritten data to extract the characteristic patterns of this signals set.Cet article présente le Multivariate Dictionary Learning Algorithm qui apprend en ligne les structures élémentaires associées à un ensemble de signaux multivariés. Une fois apprises, le Multivariate Orthogonal Matching Pursuit décompose tous les signaux de cet ensemble avec parcimonie. Ces méthodes sont spécifiées dans le cas d'invariance par rotation qui produit un dictionnaire restreint de noyaux. Nos méthodes sont appliquées à des données d'écriture manuscrite, afin d'extraire les motifs caractéristiques de cette base de signaux
Learning brain regions via large-scale online structured sparse dictionary-learning
International audienceWe propose a multivariate online dictionary-learning method for obtaining de-compositions of brain images with structured and sparse components (aka atoms). Sparsity is to be understood in the usual sense: the dictionary atoms are constrained to contain mostly zeros. This is imposed via an 1-norm constraint. By "struc-tured", we mean that the atoms are piece-wise smooth and compact, thus making up blobs, as opposed to scattered patterns of activation. We propose to use a Sobolev (Laplacian) penalty to impose this type of structure. Combining the two penalties, we obtain decompositions that properly delineate brain structures from functional images. This non-trivially extends the online dictionary-learning work of Mairal et al. (2010), at the price of only a factor of 2 or 3 on the overall running time. Just like the Mairal et al. (2010) reference method, the online nature of our proposed algorithm allows it to scale to arbitrarily sized datasets. Experiments on brain data show that our proposed method extracts structured and denoised dictionaries that are more intepretable and better capture inter-subject variability in small medium, and large-scale regimes alike, compared to state-of-the-art models
Online Self-Indexed Grammar Compression
Although several grammar-based self-indexes have been proposed thus far,
their applicability is limited to offline settings where whole input texts are
prepared, thus requiring to rebuild index structures for given additional
inputs, which is often the case in the big data era. In this paper, we present
the first online self-indexed grammar compression named OESP-index that can
gradually build the index structure by reading input characters one-by-one.
Such a property is another advantage which enables saving a working space for
construction, because we do not need to store input texts in memory. We
experimentally test OESP-index on the ability to build index structures and
search query texts, and we show OESP-index's efficiency, especially
space-efficiency for building index structures.Comment: To appear in the Proceedings of the 22nd edition of the International
Symposium on String Processing and Information Retrieval (SPIRE2015
Recommended from our members
Beyond definition: Organising semantic information in bilingual dictionaries
This paper considers the process of organising semantic information in bilingual dictionaries with diachronic coverage, from selecting the textual source-material to designing the entries. The discussion centres on practical aspects of ancient Greek lexicography. First, the traditional semantic frameworks are described. Then, more recent approaches are noted, notably those of Adrados and of Chadwick, both of which aim to integrate contextual data within a semantic framework. Since the relevance of contextual information varies with lemma part of speech, different configurations are required for entries describing nouns, adjectives, and verbs. These are illustrated by three entries from a Greek-English dictionary currently being written at Cambridge. In order to organise data to this level of specificity, stylistic templates are indispensable, and digital software provides a means of providing them. However, systems designed for writing new dictionaries require different features from those designed for encoding pre-existing texts. A description is given of how the lexicographic requirements of the Cambridge dictionary were met by a user-designed system
- …