106 research outputs found

    Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem

    Get PDF
    Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users\u27 queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy

    Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem

    Get PDF
    Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users' queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy.publishedVersio

    A Survey on Visual Analytics of Social Media Data

    Get PDF
    The unprecedented availability of social media data offers substantial opportunities for data owners, system operators, solution providers, and end users to explore and understand social dynamics. However, the exponential growth in the volume, velocity, and variability of social media data prevents people from fully utilizing such data. Visual analytics, which is an emerging research direction, ha..

    Debunking rumors on Twitter with tree transformer

    Get PDF

    Big Data Management for Cloud-Enabled Geological Information Services

    Get PDF

    Efficient Text Classification with Linear Regression Using a Combination of Predictors for Flu Outbreak Detection

    Get PDF
    Early prediction of disease outbreaks and seasonal epidemics such as Influenza may reduce their impact on daily lives. Today, the web can be used for surveillance of diseases.Search engines and Social Networking Sites can be used to track trends of different diseases more quickly than government agencies such as Center of Disease Control and Prevention(CDC). Today, Social Networking Sites (SNS) are widely used by diverse demographic populations. Thus, SNS data can be used effectively to track disease outbreaks and provide necessary warnings. Although the generated data of microblogging sites is valuable for real time analysis and outbreak predictions, the volume is huge. Therefore, one of the main challenges in analyzing this huge volume of data is to find the best approach for accurate analysis in an efficient time. Regardless of the analysis time, many studies show only the accuracy of applying different machine learning approaches. Current SNS-based flu detection and prediction frameworks apply conventional machine learning approaches that require lengthy training and testing, which is not the optimal solution for new outbreaks with new signs and symptoms. The aim of this study is to propose an efficient and accurate framework that uses SNS data to track disease outbreaks and provide early warnings, even for newest outbreaks accurately. The presented framework of outbreak prediction consists of three main modules: text classification, mapping, and linear regression for weekly flu rate predictions. The text classification module utilizes the features of sentiment analysis and predefined keyword occurrences. Various classifiers, including FastText and six conventional machine learning algorithms, are evaluated to identify the most efficient and accurate one for the proposed framework. The text classifiers have been trained and tested using a pre-labeled dataset of flu-related and unrelated Twitter postings. The selected text classifier is then used to classify over 8,400,000 tweet documents. The flu-related documents are then mapped ona weekly basis using a mapping module. Lastly, the mapped results are passed together with historical Center for Disease Control and Prevention (CDC) data to a linear regression module for weekly flu rate predictions. The evaluation of flu tweet classification shows that FastText together with the extracted features, has achieved accurate results with anF-measure value of 89.9% in addition to its efficiency. Therefore, FastText has been chosen to be the classification module to work together with the other modules in the proposed framework, including the linear regression module, for flu trend predictions. The prediction results are compared with the available recent data from CDC as the ground truth and show a strong correlation of 96.2%

    Information Reliability on the Social Web - Models and Applications in Intelligent User Interfaces

    Get PDF
    The Social Web is undergoing continued evolution, changing the paradigm of information production, processing and sharing. Information sources have shifted from institutions to individual users, vastly increasing the amount of information available online. To overcome the information overload problem, modern filtering algorithms have enabled people to find relevant information in efficient ways. However, noisy, false and otherwise useless information remains a problem. We believe that the concept of information reliability needs to be considered along with information relevance to adapt filtering algorithms to today's Social Web. This approach helps to improve information search and discovery and can also improve user experience by communicating aspects of information reliability.This thesis first shows the results of a cross-disciplinary study into perceived reliability by reporting on a novel user experiment. This is followed by a discussion of modeling, validating, and communicating information reliability, including its various definitions across disciplines. A selection of important reliability attributes such as source credibility, competence, influence and timeliness are examined through different case studies. Results show that perceived reliability of information can vary greatly across contexts. Finally, recent studies on visual analytics, including algorithm explanations and interactive interfaces are discussed with respect to their impact on the perception of information reliability in a range of application domains

    Spatial and Temporal Sentiment Analysis of Twitter data

    Get PDF
    The public have used Twitter world wide for expressing opinions. This study focuses on spatio-temporal variation of georeferenced Tweets’ sentiment polarity, with a view to understanding how opinions evolve on Twitter over space and time and across communities of users. More specifically, the question this study tested is whether sentiment polarity on Twitter exhibits specific time-location patterns. The aim of the study is to investigate the spatial and temporal distribution of georeferenced Twitter sentiment polarity within the area of 1 km buffer around the Curtin Bentley campus boundary in Perth, Western Australia. Tweets posted in campus were assigned into six spatial zones and four time zones. A sentiment analysis was then conducted for each zone using the sentiment analyser tool in the Starlight Visual Information System software. The Feature Manipulation Engine was employed to convert non-spatial files into spatial and temporal feature class. The spatial and temporal distribution of Twitter sentiment polarity patterns over space and time was mapped using Geographic Information Systems (GIS). Some interesting results were identified. For example, the highest percentage of positive Tweets occurred in the social science area, while science and engineering and dormitory areas had the highest percentage of negative postings. The number of negative Tweets increases in the library and science and engineering areas as the end of the semester approaches, reaching a peak around an exam period, while the percentage of negative Tweets drops at the end of the semester in the entertainment and sport and dormitory area. This study will provide some insights into understanding students and staff ’s sentiment variation on Twitter, which could be useful for university teaching and learning management
    • …
    corecore