13 research outputs found

    Chunking with Max-Margin Markov Networks

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    Improving chunker performance using a web-based semi-automatic training data analysis tool

    Get PDF

    Parameters estimation for spatio-temporal maximum entropy distributions: application to neural spike trains

    Get PDF
    We propose a numerical method to learn Maximum Entropy (MaxEnt) distributions with spatio-temporal constraints from experimental spike trains. This is an extension of two papers [10] and [4] who proposed the estimation of parameters where only spatial constraints were taken into account. The extension we propose allows to properly handle memory effects in spike statistics, for large sized neural networks.Comment: 34 pages, 33 figure

    Unsupervised Chunking Based on Graph Propagation from Bilingual Corpus

    Get PDF
    This paper presents a novel approach for unsupervised shallow parsing model trained on the unannotated Chinese text of parallel Chinese-English corpus. In this approach, no information of the Chinese side is applied. The exploitation of graph-based label propagation for bilingual knowledge transfer, along with an application of using the projected labels as features in unsupervised model, contributes to a better performance. The experimental comparisons with the state-of-the-art algorithms show that the proposed approach is able to achieve impressive higher accuracy in terms of F-score

    Relação entre a tolerância de espécies arbóreas à inundação e sua distribuição em ecossistemas neotropicais

    Get PDF
    The present study investigated the Neotropical distribution of groups of Amazonian floodplain tree species of contrasting flood-tolerance. The underlying question was to test if tree species of Amazonian white- and black-water floodplains show different distribution patterns at the neotropical scale, and to test whether macro-scale distributions of tree species growing at higher levels in seasonally inundated habitats differ from those growing lower levels. Indicator species (IndVal) from several várzea and igapó inventories performed by the INPA/MAUA Working Group were selected, and classified into groups of low and high flood tolerance. The actual distributions of the species were analyzed using georeferenced records from herbarium collections and published floristic inventories. The potential species distributions were estimated using ecological niche modeling in MAXENT software. All investigated tree species showed neotropical distributions with concentration in tropical and subtropical moist broadleaf forests. High várzea tree species had wider spatial distributions than low várzea and igapó tree species for the south Neotropical region, and the same high várzea species one more low igapó tree species had wider distribution to the north. Geomorphology could be the main factor in habitat preference of the species in Amazonian floodplains.O presente trabalho investigou a distribuição Neotropical de espécies arbóreas de áreas alagáveis na Amazônia com a influência a tolerancia de inundação. As questões subjacentes foram, (i) testar se as espécies arbóreas de áreas alagáveis da Amazônia de água branca e preta mostram diferentes padrões de distribuição em escala neotropical, (ii) e testar se a distribuição em macro-escala de espécies de árvores de inundação alta seria diferente de espécies de árvores de inundação baixa. Espécies indicadoras (IndVal) de 51 ha de inventários de várzea e de igapó realizadas pelo Grupo de Trabalho INPA/ MAUA foram selecionados e classificados em grupos de tolerância baixa e alta à inundação. A distribuição real das espécies foi analisada usando registros georreferenciados de herbários, coleções e inventários florísticos publicados. A distribuição potencial das espécies foi estimada utilizando modelos de nicho ecológico no programa MAXENT. Todas as espécies de árvores investigadas apresentaram distribuição neotropical, com concentração nas florestas tropicais e subtropicais ombrófilas úmidas. Uma espécie de árvore (Guarea guidonia) de várzea alta teve uma distribuição espacial mais ampla ao sul da região neotropical do que as espécies de árvores várzea baixa e de igapó, e uma espécie de várzea alta e uma de igapó baixo (respectivamente Guarea guidonia e Hirtella racemosa) tiveram distribuição mais ampla ao norte. Possivelmente a geomorfologia seria o principal fator para a preferência de habitat das espécies em várzeas amazônicas. A inundação não foi identificada como uma variável que diferencie a distribuição das espécies investigadas nesse trabalho. Porém, os fatores ambientais climáticos de precipitação e temperatura mostraram grande influência sendo bons indicadores para a distribuição dessas espécies em escala neotropical

    Unsupervised Syntactic Structure Induction in Natural Language Processing

    Get PDF
    This work addresses unsupervised chunking as a task for syntactic structure induction, which could help understand the linguistic structures of human languages especially, low-resource languages. In chunking, words of a sentence are grouped together into different phrases (also known as chunks) in a non-hierarchical fashion. Understanding text fundamentally requires finding noun and verb phrases, which makes unsupervised chunking an important step in several real-world applications. In this thesis, we establish several baselines and discuss our three-step knowledge transfer approach for unsupervised chunking. In the first step, we take advantage of state-of-the-art unsupervised parsers, and in the second, we heuristically induce chunk labels from them. We propose a simple heuristic that does not require any supervision of annotated grammar and generates reasonable (albeit noisy) chunks. In the third step, we design a hierarchical recurrent neural network (HRNN) that learns from these pseudo ground-truth labels. The HRNN explicitly models the composition of words into chunks and smooths out the noise from heuristically induced labels. Our HRNN a) maintains both word-level and phrase-level representations and b) explicitly handles the chunking decisions by providing autoregressiveness at each step. Furthermore, we make a case for exploring the self-supervised learning objectives for unsupervised chunking. Finally, we discuss our attempt to transfer knowledge from chunking back to parsing in an unsupervised setting. We conduct comprehensive experiments on three datasets: CoNLL-2000 (English), CoNLL-2003 (German), and the English Web Treebank. Results show that our HRNN improves upon the teacher model (Compound PCFG) in terms of both phrase F1 and tag accuracy. Our HRNN can smooth out the noise from induced chunk labels and accurately capture the chunking patterns. We evaluate different chunking heuristics and show that maximal left-branching performs the best, reinforcing the fact that left-branching structures indicate closely related words. We also present rigorous analysis on the HRNN's architecture and discuss the performance of vanilla recurrent neural networks

    Arabic named entity recognition

    Full text link
    En esta tesis doctoral se describen las investigaciones realizadas con el objetivo de determinar las mejores tecnicas para construir un Reconocedor de Entidades Nombradas en Arabe. Tal sistema tendria la habilidad de identificar y clasificar las entidades nombradas que se encuentran en un texto arabe de dominio abierto. La tarea de Reconocimiento de Entidades Nombradas (REN) ayuda a otras tareas de Procesamiento del Lenguaje Natural (por ejemplo, la Recuperacion de Informacion, la Busqueda de Respuestas, la Traduccion Automatica, etc.) a lograr mejores resultados gracias al enriquecimiento que a~nade al texto. En la literatura existen diversos trabajos que investigan la tarea de REN para un idioma especifico o desde una perspectiva independiente del lenguaje. Sin embargo, hasta el momento, se han publicado muy pocos trabajos que estudien dicha tarea para el arabe. El arabe tiene una ortografia especial y una morfologia compleja, estos aspectos aportan nuevos desafios para la investigacion en la tarea de REN. Una investigacion completa del REN para elarabe no solo aportaria las tecnicas necesarias para conseguir un alto rendimiento, sino que tambien proporcionara un analisis de los errores y una discusion sobre los resultados que benefician a la comunidad de investigadores del REN. El objetivo principal de esta tesis es satisfacer esa necesidad. Para ello hemos: 1. Elaborado un estudio de los diferentes aspectos del arabe relacionados con dicha tarea; 2. Analizado el estado del arte del REN; 3. Llevado a cabo una comparativa de los resultados obtenidos por diferentes tecnicas de aprendizaje automatico; 4. Desarrollado un metodo basado en la combinacion de diferentes clasificadores, donde cada clasificador trata con una sola clase de entidades nombradas y emplea el conjunto de caracteristicas y la tecnica de aprendizaje automatico mas adecuados para la clase de entidades nombradas en cuestion. Nuestros experimentos han sido evaluados sobre nueve conjuntos de test.Benajiba, Y. (2009). Arabic named entity recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8318Palanci

    Genetic Resources and Adaptive Management of Conifers in a Changing World

    Get PDF
    Climatic change causes a mismatch between tree populations on sites they currently occupy and the climate to which they have adapted in the past. The maintenance of productivity and of ecological and societal services requires resilient populations and ecosystems, particularly close to the vulnerable trailing (xeric) range limits. The studies confirm the selective effect of diverse habitat/climate conditions across the species ranges. Soil conditions may mask climate effects and should be considered separately. The unique potential of provenance tests is illustrated by growth response projections that may be less dramatic than provided by usual inventory data analyses. Assisted migration appears to be a feasible management action to compensate for climatic warming. However, the choice of populations needs special care under extreme conditions and outside the limits of current natural distribution ranges. The proper differentiation of measures according to the present and future adaptive challenges require the continuation of long-term analyses and the establishment of better focused field trials in disparate climates that contain populations from a representative range of habitats. The studies present results obtained from diverse regions of the temperate forest zone, from Central and Northwestern Europe, the Mediterranean, Russia, China, North and Central America

    Metodología orientada a la optimización automática de la calidad de los requisitos

    Get PDF
    Las fases iniciales en los proyectos software marcan su desarrollo y resultado final. Defectos provocados en las fases iniciales afectan considerablemente a la calidad y alteran las fechas de finalización. Las organizaciones internacionales se han hecho eco de este problema y se dedican gran cantidad de esfuerzos en investigación para mejorar la calidad en las primeras etapas del desarrollo. Con esta iniciativa surge la ingeniería de requisitos, disciplina encargada de proporcionar procesos de ingeniería en el desarrollo de especificaciones de requisitos necesarias para definir proyectos con cierta complejidad. Por ello han surgido numerosas guías y estándares para asegurar la calidad de los requisitos que componen las especificaciones, evitando así que posibles defectos en los requisitos provoquen errores en el desarrollo y en el producto final. Una de las mayores dificultades relacionadas con la calidad en las especificaciones de requisitos es su dependencia a las exigencias de los distintos proyectos, y a las restricciones impuestas por los distintos dominios. En esta tesis se presenta una metodología que permite incluir las restricciones impuestas mediante el procesamiento de corpus de requisitos clasificados en función de su calidad por expertos del proyecto y del dominio. El objetivo de la metodología es proporcionar métodos automáticos para la optimización de la calidad en los requisitos de ingeniería. Para ello se propone un proceso para desarrollar un clasificador que permita emular la estimación de la calidad que otorgaría el experto del dominio a un requisito, un sistema de asesoramiento automático para mejorar la calidad de requisitos defectuosos y un método para la generación automática de patrones sintáctico-semánticos, que puedan ser empleados como guía en la redacción de nuevos requisitos asegurando así una composición estructuralmente correcta. Con el fin de corroborar las propuestas de la investigación, se presentan casos de estudio mediante el tratamiento de un corpus de requisitos proporcionado por el Grupo de Trabajo de la organización INCOSE (International Council on Systems Engineering 2016) y se analizan los resultados obtenidos.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: José Ambrosio Toval Álvarez.- Secretario: María Isabel Sánchez Segura.- Vocal: Susana Irene Díaz Rodrígue
    corecore