2,726 research outputs found

    Multi Domain Semantic Information Retrieval Based on Topic Model

    Get PDF
    Over the last decades, there have been remarkable shifts in the area of Information Retrieval (IR) as huge amount of information is increasingly accumulated on the Web. The gigantic information explosion increases the need for discovering new tools that retrieve meaningful knowledge from various complex information sources. Thus, techniques primarily used to search and extract important information from numerous database sources have been a key challenge in current IR systems. Topic modeling is one of the most recent techniquesthat discover hidden thematic structures from large data collections without human supervision. Several topic models have been proposed in various fields of study and have been utilized extensively for many applications. Latent Dirichlet Allocation (LDA) is the most well-known topic model that generates topics from large corpus of resources, such as text, images, and audio.It has been widely used in many areas in information retrieval and data mining, providing efficient way of identifying latent topics among document collections. However, LDA has a drawback that topic cohesion within a concept is attenuated when estimating infrequently occurring words. Moreover, LDAseems not to consider the meaning of words, but rather to infer hidden topics based on a statisticalapproach. However, LDA can cause either reduction in the quality of topic words or increase in loose relations between topics. In order to solve the previous problems, we propose a domain specific topic model that combines domain concepts with LDA. Two domain specific algorithms are suggested for solving the difficulties associated with LDA. The main strength of our proposed model comes from the fact that it narrows semantic concepts from broad domain knowledge to a specific one which solves the unknown domain problem. Our proposed model is extensively tested on various applications, query expansion, classification, and summarization, to demonstrate the effectiveness of the model. Experimental results show that the proposed model significantly increasesthe performance of applications

    Methodologies for the Automatic Location of Academic and Educational Texts on the Internet

    Get PDF
    Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Methodologies for the Automatic Location of Academic and Educational Texts on the Internet

    Get PDF
    Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined

    Analyzing the Overturning of Roe vs Wade on Twitter using Natural Language Processing Techniques

    Get PDF
    In 1973, the historic U.S. Supreme Court (SCOTUS) case of Roe vs. Wade provided the constitutional rightto abortion. However, on May 2, 2022, Politico magazine leaked the draft opinion on the Dobbs v. Jackson Women’s Health Organization. The leak generated a surge of users to post their opinion on the case that would eliminate abortion as a constitutional right. Then, on June 24, 2022, SCOTUS overturned Roe vs. Wade. In this thesis, we aim to investigate the public opinion and reaction towards the overturning of Roe vs. Wade. We collected 20,640,166 tweets using Twitter API for Academic Research and an open-sourced dataset published during two periods. The first period was a week before Politico magazine leaked theSCOTUS decision and the week after. The second period was a week before and over a week after theoverturning of Roe vs. Wade. Using natural language processing techniques, including sentiment analysis,emotion recognition, topic modeling, and bi-grams, we could develop insight into public opinion based onthe posted tweets. Our research investigates if there is a change in sentiment over time, a change in theemotion expressed within the text over time, and which topics are most common within the collection oftweets. The results demonstrate a significant increase on the day of the Politico leak, which showed thatmost of the tweets published on that day expressed a positive sentiment. However, in the weeks before andafter the overturning of Roe v. Wade, we witness a decrease from the beginning of the period up to the dayof the overturn. Regarding emotion recognition, there is a significant decrease in the proportion of tweetsclassified as expressing optimism. There’s also an increase in the proportion of tweets expressing anger when comparing the day of the Politico leak and the day of the overturn. The topic model we applied to thetweets published on the day of the Politico leak revealed that states’ rights and children were discussed. Using bigram of the most negative tweets, we witnessed gun control and healthcare as words that frequently occurred within the collection

    DARIAH and the Benelux

    Get PDF

    Modeling Users Feedback Using Bayesian Methods for Data-Driven Requirements Engineering

    Get PDF
    Data-driven requirements engineering represents a vision for a shift from the static traditional methods of doing requirements engineering to dynamic data-driven user-centered methods. App developers now receive abundant user feedback from user comments in app stores and social media, i.e., explicit feedback, to feedback from usage data and system logs, i.e, implicit feedback. In this dissertation, we describe two novel Bayesian approaches that utilize the available user\u27s to support requirements decisions and activities in the context of applications delivered through software marketplaces (web and mobile). In the first part, we propose to exploit implicit user feedback in the form of usage data to support requirements prioritization and validation. We formulate the problem as a popularity prediction problem and present a novel Bayesian model that is highly interpretable and offers early-on insights that can be used to support requirements decisions. Experimental results demonstrate that the proposed approach achieves high prediction accuracy and outperforms competitive models. In the second part, we discuss the limitations of previous approaches that use explicit user feedback for requirements extraction, and alternatively, propose a novel Bayesian approach that can address those limitations and offer a more efficient and maintainable framework. The proposed approach (1) simplifies the pipeline by accomplishing the classification and summarization tasks using a single model, (2) replaces manual steps in the pipeline with unsupervised alternatives that can accomplish the same task, and (3) offers an alternative way to extract requirements using example-based summaries that retains context. Experimental results demonstrate that the proposed approach achieves equal or better classification accuracy and outperforms competitive models in terms of summarization accuracy. Specifically, we show that the proposed approach can capture 91.3% of the discussed requirement with only 19% of the dataset, i.e., reducing the human effort needed to extract the requirements by 80%
    • …
    corecore