2 research outputs found

    TÉCNICAS DE PROCESSAMENTO DE LINGUAGEM NATURAL APLICADAS AO PROCESSO DE MINERAÇÃO DE TEXTOS: RESULTADOS PRELIMINARES DE UM MAPEAMENTO SISTEMÁTICO

    Get PDF
    Text mining is an activity that aims to discover knowledge in not-structured data (textual. This process uses itself algorithms as well as known and consolidated techniques, among which can be termed Natural Language Processing (NLP) which has incremented obtained results and has justified the necessary computational effort. Objective: The aim of this study was to identify and evaluate the techniques of NLP available to perform data mining in textual databases. Method: We applied a systematic mapping study to identify, evaluate and interpret relevant studies about this research topic. Results: We identify 24 papers discussing about 11 NLP techniques applied in text mining, in which the ontology was presented as the most efficient technique throughout the years.A mineração de textos é a atividade que surgiu com o propósito de descobrir conhecimento em dados não estruturados (textuais). Este processo utiliza além de algoritmos próprios, técnicas já conhecidas e consolidadas, dentre elas o Processamento de Linguagem Natural (PLN) tem incrementado os resultados obtidos. Objetivo: Este estudo teve como objetivo identificar e avaliar as técnicas de PLN disponíveis para realizar mineração em bases de dados textuais com o intuito de discutir sobre essas técnicas a partir das experiências publicadas neste contexto. Método: Foi utilizada a técnica de mapeamento sistemático, cujo propósito é identificar, avaliar e interpretar estudos disponíveis e relevantes sobre uma determinada questão de pesquisa, executando um processo de revisão rigoroso e confiável. Resultados: Foram analisados 24 estudos aplicando 11 técnicas diferentes de PLN na mineração de textos, sendo que dentre todas essas técnicas, a ontologia se mostrou a mais recorrente e eficiente.

    Developing a Dataset for Technology Structure Mining

    Get PDF
    Conference paperThis paper describes steps that have been taken to construct a development dataset for the task of Technology Structure Mining. We have defined the proposed task as the process of mapping a scientific corpus into a labeled digraph named a Technology Structure Graph as described in the paper. The generated graph expresses the domain semantics in terms of interdependencies between pairs of technologies that are named (introduced) in the target scientific corpus. The dataset comprises a set of sentences extracted from the ACL Anthology Corpus. Each sentence is annotated with at least two technologies in the domain of Human Language Technology and the interdependence between them. The annotations - technology mark-up and their interdependencies - are expressed at two layers: lexical and termino-conceptual. Lexical representation of technologies comprises varying lexicalizations of a technology. However, at the termino-conceptual layer all these lexical variations refer to the same concept. We have adopted the same approach for representing Semantic Relations, at the lexical layer a semantic relation is a predicate i.e. defined based on the sentence surface structure, however at the termino-conceptual layer semantic relations are classified into conceptual relations either taxonomic or non-taxonomic. Morover, the contexts that interdependencies are extracted from are classified into five groups based on the linguistic criteria and syntactic structure that are identified by the human annotators. The dataset initially comprises of 482 sentences. We hope this effort results in a benchmark that can be used for the technology structure mining task as defined in the paper
    corecore