3 research outputs found

    Impact of Entity Graphs on Extracting Semantic Relations

    Get PDF
    International audienceRelation extraction (RE) between a pair of entity mentions from text is an important and challenging task specially for open domain relations. Generally, relations are extracted based on the lexical and syntactical information at the sentence level. However, global information about known entities has not been explored yet for RE task. In this paper, we propose to extract a graph of entities from the overall corpus and to compute features on this graph that are able to capture some evidences of holding relationships between a pair of entities. The proposed features boost the RE performance significantly when these are combined with some linguistic features

    Slang feature extraction by analysing topic change on social media

    Get PDF
    Recently, the authors often see words such as youth slang, neologism and Internet slang on social networking sites (SNSs) that are not registered on dictionaries. Since the documents posted to SNSs include a lot of fresh information, they are thought to be useful for collecting information. It is important to analyse these words (hereinafter referred to as ‘slang’) and capture their features for the improvement of the accuracy of automatic information collection. This study aims to analyse what features can be observed in slang by focusing on the topic. They construct topic models from document groups including target slang on Twitter by latent Dirichlet allocation. With the models, they chronologically the analyse change of topics during a certain period of time to find out the difference in the features between slang and general words. Then, they propose a slang classification method based on the change of features

    Hacia una tipología de los fenómenos de variación morfológica en el Shipibo-Konibo: una contribución para su traducción automática

    Get PDF
    La lengua shipibo-konibo (SK) es una de las más grandes de la Amazonía peruana. Debido a su numerosa población vernácula que alcanza los 23.000 hablantes, fue favorecida por un proyecto que busca proveer a las lenguas minoritarias del Perú de herramientas computacionales. Desarrollado en el marco del proyecto del Fondo Nacional de Desarrollo Científico Tecnológico y de Innovación Tecnológica (FONDECYT), la iniciativa tiene como meta desarrollar una plataforma para la traducción automática desde el shipibo-konibo al castellano y viceversa. Este proyecto llamado Chana, ha formado un equipo interdisciplinario de ingenieros y lingüistas con el objetivo de desarrollar el software y corpus necesario para implementar dicho traductor. En este contexto, la presente investigación propone una tipología que describe la lengua para los fines prácticos de la traducción automática y con ella, ayudar a la solución de problemas que se presentarán en el nivel de programación morfológico del traductor. La tipología expuesta busca clasificar las variaciones alomórficas en cuatro frentes, primero en su nivel lingüístico de condicionamiento en que se presenta la alomorfía, luego ofrece el número de variaciones formales que alcanza en el SK, también menciona el nivel de predictibilidad de dichas variaciones y finalmente describe la semejanza formal entre los alomorfos de un morfema. Estos elementos permitirán a los ingenieros del proyecto identificar las alomorfías por medio del inventario entregado y recomendar algunos ajustes que creo necesarios se deberán considerar en la programación en el nivel morfológico del TA y que servirán para solucionar los problemas cuando el traductor en línea deba traducir desde castellano al SK.The Shipibo-Konibo (SK) language is one the largest ones in the Peruvian Amazonia. Due to its numerous vernacular population that reaches 23,000 speakers, it was favored by a project which seeks to provide some minority languages in Peru with computational tools. Developed under the framework of the project of the Fondo Nacional de Desarrollo Científico Tecnológico y de Innovación Tecnológica (FONDECYT), the initiative has as goal to develop a platform for a machine translation (MT) from Shipibo-Konibo to Spanish and vice versa. This project called Chana has formed an interdisciplinary team of engineers and linguists with the goal of developing the necessary software and corpus for the implementation of such translator. In this context, the present investigation proposes a typology that describes the language with practical purposes for automated (MT) and, with this, help to solve problems that will come up at the level of the morphological programming of the translator. The typology exposed here looks to classify the alomorphic variations in four fronts, first the linguistic level of conditioning in which the allomorphy presents itself, then it offers the number of formal variations that reaches in the SK, it also mentions the level of predictability of such variations and finally it describes the formal similarity between the allomorphs of a morpheme. These elements will allow the engineers of the project to identify the allomorphies through the delivered inventory and to recommend some adjustments that I see necessary to be considered in the programming at the morphological level of the MT and that will be useful to solve the problems when the online translator should translate from Spanish to SK.Tesi
    corecore