1,329 research outputs found

    Vorosweep: a fast generalized crystal growing Voronoi diagram generation algorithm

    Full text link
    We propose a new algorithm for generating quickly approximate generalized Voronoi diagrams of point sites associated to arbitrary convex distance metric in the Euclidian plane. This algorithm produces connected cells by emulating the growth of crystals starting at the point sites, in order to reduce the complexity of the diagram. The main practical contribution is the Vorosweep package which is the reference implementation of the algorithm. Experimental results and benchmarks are given to demonstrate the versatility of this approach.WIST 3 grant 1017074 DOMHEX (Dominant Hexahedral Mesh Generation

    Benchmarking benchmarks: introducing new automatic indicators for benchmarking Spoken Language Understanding corpora

    Get PDF
    International audienceEmpirical evaluation is nowadays the main evaluation paradigm in Natural Language Processing for assessing the relevance of a new machine-learning based model. If large corpora are available for tasks such as Automatic Speech Recognition , this is not the case for other tasks such as Spoken Language Understanding (SLU), consisting in translating spoken transcriptions into a formal representation often based on semantic frames. Corpora such as ATIS or SNIPS are widely used to compare systems, however differences in performance among systems are often very small, not statistically significant , and can be produced by biases in the data collection or the annotation scheme, as we presented on the ATIS corpus ("Is ATIS too shallow?, IS2018"). We propose in this study a new methodology for assessing the relevance of an SLU corpus. We claim that only taking into account systems performance does not provide enough insight about what is covered by current state-of-the-art models and what is left to be done. We apply our methodology on a set of 4 SLU systems and 5 benchmark corpora (ATIS, SNIPS, M2M, MEDIA) and automatically produce several indicators assessing the relevance (or not) of each corpus for benchmarking SLU models

    Fouille de texte : une approche séquentielle pour découvrir des relations spatiales

    Get PDF
    National audienceDans cet article, nous présentons les premières étapes d'un projet de fouille de données textuelles. Plus précisément, nous appliquons un algorithme d'extraction de motifs séquentiels sous contraintes multiples afin d'identifier des relations entre entités spatiales. Les premiers résultats obtenus montrent l'intérêt de l'utilisation de cette approche et ses limites. Dans cet article, nous détaillons les premières bases de travaux plus ambitieux dont l'objectif est d'apporter des informations cruciales permettant de compléter l'analyse des images satellitaires

    Learnability of Pregroup Grammars

    Get PDF
    International audienceThis paper investigates the learnability by positive examples in the sense of Gold of Pregroup Grammars. In a first part, Pregroup Grammars are presented and a new parsing strategy is proposed. Then, theoretical learnability and non-learnability results for subclasses of Pregroup Grammars are proved. In the last two parts, we focus on learning Pregroup Grammars from a special kind of input called feature-tagged examples. A learning algorithm based on the parsing strategy presented in the first part is given. Its validity is proved and its properties are examplified

    The effect of Time Scales in Photosynthesis on microalgae Productivity

    Get PDF
    International audienceMicroalgae are often seen as a potential biofuel producer. In order to predict achievable productivities in the so called raceway culturing system, the dy- namics of photosynthesis has to be taken into account. In particular, the dynami- cal effect of inhibition by an excess of light (photoinhibition) must be represented. We propose a model considering both photosynthesis and growth dynamics. This model involves three different time scales. We study the response of this model to uctuating light with different frequencies by slow/fast approximations. Therefore, we identify three different regimes for which a simplified expression for the model can be derived. These expressions give a hint on productivity improvement which can be expected by stimulating photosynthesis with a faster hydrodynamics

    Coopération de méthodes statistiques et symboliques pour l'adaptation non-supervisée d'un système d'étiquetage en entités nommées

    Get PDF
    International audienceNamed entity recognition and typing is achieved both by symbolic and probabilistic systems. We report on an experiment for making the rule-based system NP, a high-precision system developed on AFP news corpora and relies on the Aleda named entity database, interact with LIANE, a high-recall probabilistic system trained on oral transcriptions from the ESTER corpus. We show that a probabilistic system such as LIANE can be adapted to a new type of corpus in a non-supervized way thanks to large-scale corpora automatically annotated by NP. This adaptation does not require any additional manual anotation and illustrates the complementarity between numeric and symbolic techniques for tackling linguistic tasks.La détection et le typage des entités nommées sont des tâches pour lesquelles ont étéd éveloppés à la fois des systèmes symboliques et probabilistes. Nous présentons les résultats d'une expérience visant à faire interagir le système à base de règles NP, développé sur des corpus provenant de l'AFP, intégrant la base d'entités Aleda et qui a une bonne précision, et le système LIANE, entraîné sur des transcriptions de l'oral provenant du corpus ESTER et qui a un bon rappel. Nous montrons qu'on peut adapter à un nouveau type de corpus, de manière non supervisée, un système probabiliste tel que LIANE grâce à des corpus volumineux annotés automatiquement par NP. Cette adaptation ne nécessite aucune annotation manuelle supplémentaire et illustre la complémentarité des méthodes numériques et symboliques pour la résolution de tâches linguistiques

    Comparing Sanskrit Texts for Critical Editions: the sequences move problem

    Get PDF
    International audienceA critical edition takes into account various versions of the same text in order to show the differences between two distinct versions, in terms of words that have been missing, changed, omitted or displaced. Traditionally, Sanskrit is written without spaces between words, and the word order can be changed without altering the meaning of a sentence. This paper describes the characteristics which make Sanskrit text comparisons a specific matter. It presents two different methods for comparing Sanskrit texts, which can be used to develop a computer assisted critical edition. The first one method uses the L.C.S., while the second one uses the global alignment algorithm. Comparing them, we see that the second method provides better results, but that neither of these methods can detect when a word or a sentence fragment has been moved. We then present a method based on N-gram that can detect such a movement when it is not too far from its original location. We will see how the method behaves on several examples and look for future possible developments

    Label Pre-annotation for Building Non-projective Dependency Treebanks for French

    Get PDF
    posterInternational audienceThe current interest in accurate dependency parsing make it necessary to build dependency treebanks for French containing both projective and non-projective dependencies. In order to alleviate the work of the annotator, we propose to automatically pre-annotate the sentences with the labels of the dependencies ending on the words. The selection of the dependency labels reduces the ambiguity of the parsing. We show that a maximum entropy Markov model method reaches the label accuracy score of a standard dependency parser (MaltParser). Moreover, this method allows to find more than one label per word, i.e. the more probable ones, in order to improve the recall score. It improves the quality of the parsing step of the annotation process. Therefore, the inclusion of the method in the process of annotation makes the work quicker and more natural to annotators

    Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning

    Get PDF
    International audienceThis paper presents a new semantic frame parsing model, based on Berkeley FrameNet, adapted to process spoken documents in order to perform information extraction from broadcast contents. Building upon previous work that had shown the effectiveness of adversarial learning for domain generalization in the context of semantic parsing of encyclopedic written documents, we propose to extend this approach to elocutionary style generalization. The underlying question throughout this study is whether adversarial learning can be used to combine data from different sources and train models on a higher level of abstraction in order to increase their robustness to lexical and stylistic variations as well as automatic speech recognition errors. The proposed strategy is evaluated on a French corpus of encyclopedic written documents and a smaller corpus of radio podcast transcriptions, both annotated with a FrameNet paradigm. We show that adversarial learning increases all models generalization capabilities both on manual and automatic speech transcription as well as on encyclopedic data
    • …
    corecore