32 research outputs found

    Unsupervised keyword extraction from microblog posts via hashtags

    Full text link
    © River Publishers. Nowadays, huge amounts of texts are being generated for social networking purposes on Web. Keyword extraction from such texts like microblog posts benefits many applications such as advertising, search, and content filtering. Unlike traditional web pages, a microblog post usually has some special social feature like a hashtag that is topical in nature and generated by users. Extracting keywords related to hashtags can reflect the intents of users and thus provides us better understanding on post content. In this paper, we propose a novel unsupervised keyword extraction approach for microblog posts by treating hashtags as topical indicators. Our approach consists of two hashtag enhanced algorithms. One is a topic model algorithm that infers topic distributions biased to hashtags on a collection of microblog posts. The words are ranked by their average topic probabilities. Our topic model algorithm can not only find the topics of a collection, but also extract hashtag-related keywords. The other is a random walk based algorithm. It first builds a word-post weighted graph by taking into account posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides the algorithm to extract keywords according to hashtag topics. Last, the final ranking score of a word is determined by the stationary probability after a number of iterations. We evaluate our proposed approach on a collection of real Chinese microblog posts. Experiments show that our approach is more effective in terms of precision than traditional approaches considering no hashtag. The result achieved by the combination of two algorithms performs even better than each individual algorithm

    Understanding and exploiting user intent in community question answering

    Get PDF
    A number of Community Question Answering (CQA) services have emerged and proliferated in the last decade. Typical examples include Yahoo! Answers, WikiAnswers, and also domain-specific forums like StackOverflow. These services help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently. In this thesis, we analyse the intent of each question in CQA by classifying it into five dimensions, namely: subjectivity, locality, navigationality, procedurality, and causality. By making use of advanced machine learning techniques, such as Co-Training and PU-Learning, we are able to attain consistent and significant classification improvements over the state-of-the-art in this area. In addition to the textual features, a variety of metadata features (such as the category where the question was posted to) are used to model a user's intent, which in turn help the CQA service to perform better in finding similar questions, identifying relevant answers, and recommending the most relevant answerers. We validate the usefulness of user intent in two different CQA tasks. Our first application is question retrieval, where we present a hybrid approach which blends several language modelling techniques, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using our proposed hybrid approach, and then validates whether the answer of the top candidate can be served as an answer to a new question by leveraging sentiment analysis, query quality assessment, and search lists validation

    Understanding and exploiting user intent in community question answering

    Get PDF
    A number of Community Question Answering (CQA) services have emerged and proliferated in the last decade. Typical examples include Yahoo! Answers, WikiAnswers, and also domain-specific forums like StackOverflow. These services help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently. In this thesis, we analyse the intent of each question in CQA by classifying it into five dimensions, namely: subjectivity, locality, navigationality, procedurality, and causality. By making use of advanced machine learning techniques, such as Co-Training and PU-Learning, we are able to attain consistent and significant classification improvements over the state-of-the-art in this area. In addition to the textual features, a variety of metadata features (such as the category where the question was posted to) are used to model a user's intent, which in turn help the CQA service to perform better in finding similar questions, identifying relevant answers, and recommending the most relevant answerers. We validate the usefulness of user intent in two different CQA tasks. Our first application is question retrieval, where we present a hybrid approach which blends several language modelling techniques, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using our proposed hybrid approach, and then validates whether the answer of the top candidate can be served as an answer to a new question by leveraging sentiment analysis, query quality assessment, and search lists validation

    Twitter card : exploring the feature and adding new services

    Get PDF
    This study examines the Twitter card, a feature that adds a preview for rich content links included in tweets. Previous research has not cover this recently added feature, though it has an obvious impact on the user interaction with his/her timeline and provides a new way of representing the user interface. At the same time, a default Twitter client has a major drawback – the number of web services that are officially supported is quite small. This study explores the possible future of Twitter card and presents a prototype application that shows what will happen if all the links will have previews, how this can be implemented and what users think about it. The prototype application was developed using HTML/CSS/JavaScript, PhoneGap framework and ported to Android operating system. The source code of the application is provided as part of this research. This study provides also user experience evaluation of Twitter card and the prototype application, and finds out the improvements in the hedonistic perception of the service if the number of supported services increases

    Topical relevance models

    Get PDF
    An inherent characteristic of information retrieval (IR) is that the query expressing a user's information need is often multi-faceted, that is, it encapsulates more than one specific potential sub-information need. This multifacetedness of queries manifests itself as a topic distribution in the retrieved set of documents, where each document can be considered as a mixture of topics, one or more of which may correspond to the sub-information needs expressed in the query. In some specific domains of IR, such as patent prior art search, where the queries are full patent articles and the objective is to (in)validate the claims contained therein, the queries themselves are multi-topical in addition to the retrieved set of documents. The overall objective of the research described in this thesis involves investigating techniques to recognize and exploit these multi-topical characteristics of the retrieved documents and the queries in IR and relevance feedback in IR. First, we hypothesize that segments of documents in close proximity to the query terms are indicative of these segments being topically related to the query terms. An intuitive choice for the unit of such segments, in close proximity to query terms within documents, is the sentences, which characteristically represent a collection of semantically related terms. This way of utilizing term proximity through the use of sentences is empirically shown to select potentially relevant topics from among those present in a retrieved document set and thus improve relevance feedback in IR. Secondly, to handle the very long queries of patent prior art search which are essentially multi-topical in nature, we hypothesize that segmenting these queries into topically focused segments and then using these topically focused segments as separate queries for retrieval can retrieve potentially relevant documents for each of these segments. The results for each of these segments then need to be merged to obtain a final retrieval result set for the whole query. These two conceptual approaches for utilizing the topical relatedness of terms in both the retrieved documents and the queries are then integrated more formally within a single statistical generative model, called the topical relevance model (TRLM). This model utilizes the underlying multi-topical nature of both retrieved documents and the query. Moreover, the model is used as the basis for construction of a novel search interface, called TopicVis, which lets the user visualize the topic distributions in the retrieved set of documents and the query. This visualization of the topics is beneficial to the user in the following ways. Firstly, through visualization of the ranked retrieval list, TopicVis facilitates the user to choose one or more facets of interest from the query in a feedback step, after which it retrieves documents primarily composed of the selected facets at top ranks. Secondly, the system provides an access link to the first segment within a document focusing on the selected topic and also supports navigation links to subsequent segments on the same topic in other documents. The methods proposed in this thesis are evaluated on datasets from the TREC IR benchmarking workshop series, and the CLEF-IP 2010 data, a patent prior art search data set. Experimental results show that relevance feedback using sentences and segmented retrieval for patent prior art search queries significantly improve IR effectiveness for the standard ad-hoc IR and patent prior art search tasks. Moreover, the topical relevance model (TRLM), designed to encapsulate these two complementary approaches within a single framework, significantly improves IR effectiveness for both standard ad-hoc IR and patent prior art search. Furthermore, a task based user study experiment shows that novel features of topic visualization, topic-based feedback and topic-based navigation, implemented in the TopicVis interface, lead to effective and efficient task completion achieving good user satisfaction

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace

    Reviews and Perspectives on Smart and Sustainable Metropolitan and Regional Cities

    Get PDF
    The notion of smart and sustainable cities offers an integrated and holistic approach to urbanism by aiming to achieve the long-term goals of urban sustainability and resilience. In essence, a smart and sustainable city is an urban locality that functions as a robust system of systems with sustainable practices to generate desired outcomes and futures for all humans and non-humans. This book contributes to improving research and practice in smart and sustainable metropolitan as well as regional cities and urbanism by bringing together literature reviews and scholarly perspective pieces, forming an open access knowledge warehouse. It contains contributions that offer insights into research and practice in smart and sustainable metropolitan and regional cities by producing in-depth conceptual debates and perspectives, insights from the literature and best practice, and thoroughly identified research themes and development trends. This book serves as a repository of relevant information, material, and knowledge to support research, policymaking, practice, and the transferability of experiences to address challenges in establishing smart and sustainable metropolitan as well as regional cities and urbanism in the era of climate change, biodiversity collapse, natural disasters, pandemics, and socioeconomic inequalities

    Making the Palace Machine Work

    Get PDF
    This volume brings the studies of institutions, labour, and material cultures to bear on the history of science and technology by tracing the workings of the Imperial Household Department (Neiwufu) in the Qing court and empire. An enormous apparatus that employed 22,000 men and women at its heyday, the Department operated a "machine" with myriad moving parts. The first part of the book portrays the people who kept it running, from technical experts to menial servants, and scrutinises the paper trails they left behind. Part two uncovers the working principles of the machine by following the production chains of some of its most splendid products: gilded statues, jade, porcelain, and textiles. Part three tackles the most complex task of all, managing living organisms in nature, including lotus plants grown in imperial ponds in Beijing, fresh medicines sourced from disparate regions, and tribute elephants from Southeast Asia

    Remoulding the Chinese mind : mental hygiene promotion in Republican Shanghai

    Get PDF
    In this thesis, I uncover the history of the Mental Hygiene Movement in Republican Shanghai. I show that it had a far-reaching role to play in the city, promoting mental hygiene throughout China and influencing other cities like Canton, Peking, and Chongqing. This movement could be dated from the 1910s but achieved a dramatic expansion in the 1930s before being attenuated in warfare in the 1940s and coming to an end with the foundation of the PRC. Chinese intellectuals and foreign missionaries and experts, despite their different aims, jointly promoted this movement, and it reached out to the Chinese populace through the public media. Mental hygiene arose from the conjuncture of a new understanding toward the mind and mental problems, the establishment of asylums and psychiatric hospitals, new tools of mass publicity available to the government and a range of non-governmental institutes. In addition, development in Shanghai was powerfully shaped by politics and ideology. Initially, mental hygiene emerged in relation to a colonial aim of regulating the unwanted on streets and creating public hygiene and order. Subsequently, Shanghai would increasingly see a strong revolutionary and political influence in remoulding the ‘national’ mind. In particular, nationalism and modernism were powerful factors in development. More generally, the progressive ideology of scienfication lay behind the development of disciplines and clinics. Shanghai was part of the international history of mental hygiene, but it also demonstrates the importance of locality. For both Chinese governments and the populace, the significance of the Mental Hygiene Movement was more symbolic than pragmatic. The ideological project of remoulding the mind outweighed medical research and the treatment of mental illnesses. I argue that the reason psychiatric policies were not facilitated in China on a large scale was due to the lack of a powerful government force. The acceptance of mind remoulding and self-improvement, however, was more pervasive. One reason for this was that it benefited from the inheritance of a tradition of self-introspection. The Chinese Mental Hygiene movement, therefore, reflected novelty and Western influence but also a convention. Intellectual radicalism was questioned in the process of popularisation and was modified to become more pragmatic in line with everyday practices. Traditional thinking about the mind, while under fierce critique from the modernisers, showed resilience in compromising and integrating new knowledge and ideologies
    corecore