14 research outputs found

    Suggesting new words to extract keywords from title and abstract

    Get PDF
    When talking about the fundamentals of writing research papers, we find that keywords are still present in most research papers, but that does not mean that they exist in all of them, we can find papers that do not contain keywords. Keywords are those words or phrases that accurately reflect the content of the research paper. Keywords are an exact abbreviation of what the research carries in its content. The right keywords may increase the chance of finding the article or research paper and chances of reaching more people who should reach them. The importance of keywords and the essence of the research and address is mainly to attract these highly specialized and highly influential writers in their fields and who specialize in reading what holds the appropriate characteristics but they do not read and cannot read everything. In this paper, we extract new keywords by suggesting a set of words, these words were suggested according to the many mentioned in the researches with multiple disciplines in the field of computer. In our system, we take a number of words (as many as specified in the program) that come before the proposed words and consider it as new keywords. This system proved to be effective in finding keywords that correspond to some extent with the keywords developed by the author in his research

    A Text Mining Approach to the Analysis of BTS Fever

    Get PDF
    K-POP is steadily growing with global competitiveness.The rise of K-POP\u27s popularity has continued to create Korean idol groups. However, many idol groups weredismantled and there islack of measures for overseas advance and success. Therefore, this study aims to analyze the success factors of BTS by focusing onthe text mining techniques. After collecting Twitter\u27s online postings using crawling technique, we will analyzein three text mining techniques: topic modeling, keyword extraction, andterm frequencyanalysis. By analyzing data with three text miningmethods, we willderivehow BTS couldsuccess globallyand form a huge fandom. And with the derived key factors, we will suggest a success strategy based on the analysis results. In contrast to previous studies that were centered on case studiesorinterview, this study has implicationsin that the actual data was collected and analyzed through three text mining techniques

    Dynamic Document Annotation for Efficient Data Retrieval

    Get PDF
    Document annotation is considered as one of the most popular methods, where metadata present in document is used to search documents from a large text documents database. Few application domains such as scientific networks, blogs share information in a large amount is usually in unstructured data text documents. Manual annotation of each document becomes a tedious task. Annotations facilitate the task of finding the document topic and assist the reader to quickly overview and understand document. Dynamic document annotation provides a solution to such type of problems. Dynamic annotation of documents is generally considered as a semi-supervised learning task. The documents are dynamically assigned to one of a set of predefined classes based on the features extracted from their textual content. This paper proposes survey on Collaborative Adaptive Data sharing platform (CADS) for document annotation and use of query workload to direct the annotation process. A key novelty of CADS is that it learns with time the most important data attributes of the application, and uses this knowledge to guide the data insertion and querying

    Improving keyword extraction in multilingual texts

    Get PDF
    The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively

    Analisis Frekuensi Kata untuk Mengekstrak Kata Kunci dari Artikel Ilmiah Berbahasa Indonesia

    Get PDF
    Publikasi hasil penelitian merupakan suatu proses yang harus dilaksanakan dalam sebuah kegiatan penelitian. Publikasi dapat dilaksanakan dalam bentuk presentasi dalam sebuah seminar ilmiah, maupun dalam bentuk jurnal ilmiah. Sebelum memasuki proses seleksi, artikel ilmiah tersebut dipilah sesuai dengan kompetensi yang dimiliki oleh tim penilai. Umumnya proses pemilahan artikel ilmiah dilakukan secara manual oleh panitia pengelola seminar ilmiah, sehingga membutuhkan waktu dan membutuhkan ketepatan dalam penentuan tim penilai yang sesuai dengan artikel ilmiah. Pemilahan artikel ilmiah dapat dilakukan dengan menerapkan algoritma string similarity, yaitu dengan mencari kata-kata kunci yang terdapat dalam sebuah karya ilmiah. Kata kunci yang berada dalam artikel yang dihasilkan berdasarkan frekuensi kata yang muncul. Sebelum dicari kata yang banyak muncul, dilakukan proses filtering untuk menghilangkan kata sambung yang sering muncul sehingga tidak dianggap sebagai kata kunci artikel. Filtering menggunakan data stopword list yang digunakan oleh Tala. Sistem dibangun dalam bentuk aplikasi web menggunakan bahasa pemrograman PHP dan database MySQL dengan teknik responsive web design. Hasil penelitian ini menjelaskan bahwa artikel yang dimasukkan ke dalam sistem dapat dihasilkan kembali kata kunci yang sesuai dengan mendata kata-kata yang banyak muncul.Publikasi hasil penelitian merupakan suatu proses yang harus dilaksanakan dalam sebuah kegiatan penelitian. Publikasi dapat dilaksanakan dalam bentuk presentasi dalam sebuah seminar ilmiah, maupun dalam bentuk jurnal ilmiah. Sebelum memasuki proses seleksi, artikel ilmiah tersebut dipilah sesuai dengan kompetensi yang dimiliki oleh tim penilai. Umumnya proses pemilahan artikel ilmiah dilakukan secara manual oleh panitia pengelola seminar ilmiah, sehingga membutuhkan waktu dan membutuhkan ketepatan dalam penentuan tim penilai yang sesuai dengan artikel ilmiah. Pemilahan artikel ilmiah dapat dilakukan dengan menerapkan algoritma string similarity, yaitu dengan mencari kata-kata kunci yang terdapat dalam sebuah karya ilmiah. Kata kunci yang berada dalam artikel yang dihasilkan berdasarkan frekuensi kata yang muncul. Sebelum dicari kata yang banyak muncul, dilakukan proses filtering untuk menghilangkan kata sambung yang sering muncul sehingga tidak dianggap sebagai kata kunci artikel. Filtering menggunakan data stopword list yang digunakan oleh Tala. Sistem dibangun dalam bentuk aplikasi web menggunakan bahasa pemrograman PHP dan database MySQL dengan teknik responsive web design. Hasil penelitian ini menjelaskan bahwa artikel yang dimasukkan ke dalam sistem dapat dihasilkan kembali kata kunci yang sesuai dengan mendata kata-kata yang banyak muncul

    Query Refinement Using Conversational Context: a Method and an Evaluation Resource

    Get PDF
    This paper introduces a query refinement method applied to queries asked by users during a meeting or a conversation. Current approaches suffer from poor quality to achieve this goal, but we argue that their performance could be improved by focusing on the local context of the conversation. The proposed technique first represents the local context by extracting keywords from the transcript of the conversation. It then expands the queries with keywords that best represent the topic of the query (e.g. pairs of expansion keywords together with a weight indicating their topical similarity to the query). Moreover, we present a dataset called AREX and an evaluation metric. We compared our query expansion approach with other methods, on topics extracted from the AREX dataset and based on relevance judgments collected in a crowdsourcing experiment. The comparisons indicate the superiority of our method on both manual and ASR transcripts of the AMI Meeting Corpus

    PROCESSO DE CONSTRUÇÃO DE REFERENCIAL TEÓRICO VIA EXTRAÇÃO DE DADOS DA PLATAFORMA SCOPUS E INDICADORES DE PRODUÇÃO

    Get PDF
    Background: The research and elaboration of a theoretical framework of relevance in scientific productions is an arduous and time-consuming process for a novice author or one with little experience. Objective: This manuscript presents a bibliographical research process for the initial construction of the theoretical framework with the use of Information Technology (IT) for access, recovery, and management of the bibliography. Materials and methods: The experiments carried out used qualitative and inductive methodologies, aiming to carry out online searches in the Scopus electronic journal base, through searches by keywords and filtering parameters. To select the best references, the h-index, and journals ranked by Qualis-CAPES were used. In addition, the Mendeley reference manager program was the IT tool used for managing the bibliography. Results: The research strategies proposed for the selection of keywords, together with the analysis of the quality of publications, provided conditions to reduce from 18,087 to 22 articles, with 2,263 relevant citations to construction of the theoretical framework. The Mendeley program allowed the organization and exploration of the details of the articles with ease and practicality. Conclusion: Finally, it is inferred that the research strategies used in conjunction with the proposed methodology and the selected IT tool proved to be efficient in supporting the process of searching, retrieving, and cataloging scientific articles for the elaboration of a reference relevant theory.Contexto: La investigación y elaboración de un marco teórico de relevancia en las producciones científicas es un proceso arduo y lento para un autor novel o con poca experiencia. Objetivo: Este manuscrito presenta un proceso de investigación bibliográfica para la construcción inicial del marco teórico con el uso de Tecnologías de la Información (TI) para el acceso, recuperación y gestión de la bibliografía. Materiales y métodos: Los experimentos realizados utilizaron metodologías cualitativas e inductivas, con el objetivo de realizar búsquedas en línea en la base de revistas electrónicas Scopus, a través de búsquedas por palabras clave y parámetros de filtrado. Para seleccionar las mejores referencias se utilizó el índice h y revistas clasificadas por Qualis-CAPES. Además, el programa gestor de referencias de Mendeley fue la herramienta informática utilizada para la gestión de la bibliografía. Resultados: Las estrategias de investigación propuestas para la selección de palabras clave, junto con el análisis de la calidad de las publicaciones, proporcionaron condiciones para reducir de 18.087 a 22 artículos, con 2.263 citas relevantes para la construcción del marco teórico. El programa Mendeley permitió la organización y exploración de los detalles de los artículos con facilidad y practicidad. Conclusión: Finalmente, se infiere que las estrategias de investigación utilizadas en conjunto con la metodología propuesta y la herramienta informática seleccionada demostraron ser eficientes para apoyar el proceso de búsqueda, recuperación y catalogación de artículos científicos para la elaboración de una teoría relevante de referencia.Contexto: A pesquisa e elaboração de referencial teórico de relevância em produções científicas é um processo árduo e demorado para o autor iniciante ou com pouca experiência. Objetivo: Este artigo apresenta um processo de pesquisa bibliográfica para a construção inicial do referencial teórico com o uso da Tecnologia da Informação (TI) para acesso, recuperação e gerenciamento da bibliografia. Materiais e métodos: Os experimentos executados empregaram metodologias qualitativas e indutivas, visando a realização de pesquisas online na base de periódicos eletrônicos Scopus, por meio de buscas por palavras-chave e parâmetros de filtragem. Para selecionar as melhores referências, foram utilizados o índice “h” e a classificação dos periódicos pelo Qualis-CAPES. Além disso, o programa gerenciador de referências Mendeley foi a ferramenta de TI usada para o gerenciamento da bibliografia. Resultados: As estratégias de pesquisa propostas para a seleção de palavras-chave, em conjunto com a análise da qualidade das publicações, forneceram condições para reduzir de 18.087 para 22 artigos, totalizando 2.263 citações relevantes para a construção do referencial teórico. O programa Mendeley permitiu a organização e exploração de detalhes dos artigos com facilidade e praticidade. Conclusão: Por fim, infere-se que as estratégias de pesquisa utilizadas em conjunto com a metodologia proposta e com a ferramenta de TI selecionada mostraram-se eficientes como suporte ao processo de busca, recuperação e catalogação de artigos científicos para a elaboração de referencial teórico relevante.Contexto: A pesquisa e elaboração de referencial teórico de relevância em produções científicas é um processo árduo e demorado para o autor iniciante ou com pouca experiência. Objetivo: Este artigo apresenta um processo de pesquisa bibliográfica para a construção inicial do referencial teórico com o uso da Tecnologia da Informação (TI) para acesso, recuperação e gerenciamento da bibliografia. Materiais e métodos: Os experimentos executados empregaram metodologias qualitativas e indutivas, visando a realização de pesquisas online na base de periódicos eletrônicos Scopus, por meio de buscas por palavras-chave e parâmetros de filtragem. Para selecionar as melhores referências, foram utilizados o índice “h” e a classificação dos periódicos pelo Qualis-CAPES. Além disso, o programa gerenciador de referências Mendeley foi a ferramenta de TI usada para o gerenciamento da bibliografia. Resultados: As estratégias de pesquisa propostas para a seleção de palavras-chave, em conjunto com a análise da qualidade das publicações, forneceram condições para reduzir de 18.087 para 22 artigos, totalizando 2.263 citações relevantes para a construção do referencial teórico. O programa Mendeley permitiu a organização e exploração de detalhes dos artigos com facilidade e praticidade. Conclusão: Por fim, infere-se que as estratégias de pesquisa utilizadas em conjunto com a metodologia proposta e com a ferramenta de TI selecionada mostraram-se eficientes como suporte ao processo de busca, recuperação e catalogação de artigos científicos para a elaboração de referencial teórico relevante

    Extracting keywords from tweets

    Get PDF
    Nos últimos anos, uma enorme quantidade de informações foi disponibilizada na Internet. As redes sociais estão entre as que mais contribuem para esse aumento no volume de dados. O Twitter, em particular, abriu o caminho, enquanto plataforma social, para que pessoas e organizações possam interagir entre si, gerando grandes volumes de dados a partir dos quais é possível extrair informação útil. Uma tal quantidade de dados, permitirá por exemplo, revelar-se importante se e quando, vários indivíduos relatarem sintomas de doença ao mesmo tempo e no mesmo lugar. Processar automaticamente um tal volume de informações e obter a partir dele conhecimento útil, torna-se, no entanto, uma tarefa impossível para qualquer ser humano. Os extratores de palavras-chave surgem neste contexto como uma ferramenta valiosa que visa facilitar este trabalho, ao permitir, de uma forma rápida, ter acesso a um conjunto de termos caracterizadores do documento. Neste trabalho, tentamos contribuir para um melhor entendimento deste problema, avaliando a eficácia do YAKE (um algoritmo de extração de palavras-chave não supervisionado) em cima de um conjunto de tweets, um tipo de texto, caracterizado não só pelo seu reduzido tamanho, mas também pela sua natureza não estruturada. Embora os extratores de palavras-chave tenham sido amplamente aplicados a textos genéricos, como a relatórios, artigos, entre outros, a sua aplicabilidade em tweets é escassa e até ao momento não foi disponibilizado formalmente nenhum conjunto de dados. Neste trabalho e por forma a contornar esse problema optámos por desenvolver e tornar disponível uma nova coleção de dados, um importante contributo para que a comunidade científica promova novas soluções neste domínio. O KWTweet foi anotado por 15 anotadores e resultou em 7736 tweets anotados. Com base nesta informação, pudemos posteriormente avaliar a eficácia do YAKE! contra 9 baselines de extração de palavra-chave não supervisionados (TextRank, KP-Miner, SingleRank, PositionRank, TopicPageRank, MultipartiteRank, TopicRank, Rake e TF.IDF). Os resultados obtidos demonstram que o YAKE! tem um desempenho superior quando comparado com os seus competidores, provando-se assim a sua eficácia neste tipo de textos. Por fim, disponibilizamos uma demo que visa demonstrar o funcionamento do YAKE! Nesta plataforma web, os utilizadores têm a possibilidade de fazer uma pesquisa por utilizador ou hashtag e dessa forma obter as palavras chave mais relevantes através de uma nuvem de palavra

    Modeling Users' Information Needs in a Document Recommender for Meetings

    Get PDF
    People are surrounded by an unprecedented wealth of information. Access to it depends on the availability of suitable search engines, but even when these are available, people often do not initiate a search, because their current activity does not allow them, or they are not aware of the existence of this information. Just-in-time retrieval brings a radical change to the process of query-based retrieval, by proactively retrieving documents relevant to users' current activities, in an easily accessible and non-intrusive manner. This thesis presents a novel set of methods intended to improve the relevance of a just-in-time retrieval system, specifically a document recommender system designed for conversations, in terms of precision and diversity of results. Additionally, we designed an evaluation protocol to compare the proposed methods in the thesis with other ones using crowdsourcing. In contrast to previous systems, which model users' information needs by extracting keywords from clean and well-structured texts, this system models them from the conversation transcripts, which contain noise from automatic speech recognition (ASR) and have a free structure, often switching between several topics. To deal with these issues, we first propose a novel keyword extraction method which preserves both the relevance and the diversity of topics of the conversation, to properly capture possible users' needs with minimum ASR noise. Implicit queries are then built from these keywords. However, the presence of multiple unrelated topics in one query introduces significant noise into the retrieval results. To reduce this effect, we separate users' needs by topically clustering keyword sets into several subsets or implicit queries. We introduce a merging method which combines the results of multiple queries which are prepared from users' conversation to generate a concise, diverse and relevant list of documents. This method ensures that the system does not distract its users from their current conversation by frequently recommending them a large number of documents. Moreover, we address the problem of explicit queries that may be asked by users during a conversation. We introduce a query refinement method which leverages the conversation context to answer the users' information needs without asking for additional clarifications and therefore, again, avoiding to distract users during their conversation. Finally, we implemented the end-to-end document recommender system by integrating the ideas proposed in this thesis and then proposed an evaluation scenario with human users in a brainstorming meeting
    corecore