26 research outputs found

    Utilizing external resources for enriching information retrieval

    Get PDF
    Information retrieval (IR) seeks to support users in finding information relevant to their information needs. One obstacle for many IR algorithms to achieve better results in many IR tasks is that there is insufficient information available to enable relevant content to be identified. For example, users typically enter very short queries, in text-based image retrieval where textual annotations often describe the content of the images inadequately, or there is insufficient user log data for personalization of the search process. This thesis explores the problem of inadequate data in IR tasks. We propose methods for Enriching Information Retrieval (ENIR) which address various challenges relating to insufficient data in IR. Applying standard methods to address these problems can face unexpected challenges. For example, standard query expansion methods assume that the target collection contains sufficient data to be able to identify relevant terms to add to the original query to improve retrieval effectiveness. In the case of short documents, this assumption is not valid. One strategy to address this problem is document side expansion which has been largely overlooked in the past research. Similarly, topic modeling in personalized search often lacks the knowledge required to form adequate models leading to mismatch problems when trying to apply these models improve search. This thesis focuses on methods of ENIR for tasks affected by problems of insufficient data. To achieve ENIR, our overall solution is to include external resources for ENIR. This research focuses on developing methods for two typical ENIR tasks: text-based image retrieval and personalized web data search. In this research, the main relevant areas within existing IR research are relevance feedback and personalized modeling. ENIR is shown to be effective to augment existing knowledge in these classical areas. The areas of relevance feedback and personalized modeling are strongly correlated since user modeling and document modeling in personalized retrieval enrich the data from both sides of the query and document, which is similar to query and document expansion in relevance feedback. Enriching IR is the key challenge in these areas for IR. By addressing these two research areas, this thesis provides a prototype for an external resource based search solution. The experimental results show external resources can play a key role in enriching IR

    EMIR: A novel emotion-based music retrieval system

    Get PDF
    Music is inherently expressive of emotion meaning and affects the mood of people. In this paper, we present a novel EMIR (Emotional Music Information Retrieval) System that uses latent emotion elements both in music and non-descriptive queries (NDQs) to detect implicit emotional association between users and music to enhance Music Information Retrieval (MIR). We try to understand the latent emotional intent of queries via machine learning for emotion classification and compare the performance of emotion detection approaches on different feature sets. For this purpose, we extract music emotion features from lyrics and social tags crawled from the Internet, label some for training and model them in high-dimensional emotion space and recognize latent emotion of users by query emotion analysis. The similarity between queries and music is computed by verified BM25 model

    Building user interest profiles from wikipedia clusters

    Get PDF
    Users of search systems are often reluctant to explicitly build profiles to indicate their search interests. Thus automatically building user profiles is an important research area for personalized search. One difficult component of doing this is accessing a knowledge system which provides broad coverage of user search interests. In this work, we describe a method to build category id based user profiles from a user's historical search data. Our approach makes significant use of Wikipedia as an external knowledge resource

    Progress of Fintech Industry from Venture Capital Point of View

    Get PDF
    Fintech (financial technology) is a term that broadly described the innovation of financial product and service using IT technology. Fintech uses a lot of new business model such as FXP2P and supply chain finance, and new technology such as blockchain and cryptocurrency. According to World Economic Forum report, fintech bring disruptive innovations that are reshaping the way financial services. Also, in World Economic Forum report have structured research framework against six (6) function of financial services and eleven (11) clusters of innovation. CrunchBase is a web 2.0 wiki-liked database for startup company that includes the information for founders, key team members, basic financial information, and venture capital funding and important events. This paper uses Python to web scrape CrunchBase website for studying the progress of Fintech industry and find out the development of IT technology and innovation. The objectives of this paper are to study the CrunchBase database to compare with World Economic Forum report research framework and find out the actual progress of Fintech industry

    A Cross-Sectional and Temporal Analysis of Information Consumption on Twitter

    Get PDF
    We report on an exploratory analysis of the similarities and differences among three different forms of information consumption on Twitter viz., following, listing and subscribing. We construct a cross- sectional and temporal framework to analyze the relationships among these three forms. Our analysis reveals several interesting patterns of information consumption on Twitter. First, we find that people not only consume information by following others explicitly but also by listing and subscribing to lists and that the people they list or subscribe to are not the same as the ones they follow. Second, we find that listing and following are more similar to each other than listing and subscribing or subscribing and following. Using temporal analysis, we find that initially, people prefer to use following as a form of information consumption while subscription is a more volatile form of information consumption than following or listing

    EMIR: a novel music retrieval system for mobile devices incorporating analysis of user emotion

    Get PDF
    We present an Emotional Music Information Retrieval system for mobile devices that utilizes a machine learning approach to detect latent emotion from within both user queries (non-descriptive queries) and the lyrics of songs and uses both elements to develop an effective Music Information Retrieval system. Emotion is extracted from the songs and queries and mapped into a high-dimensional emotion space, which allows for the employment of conventional text retrieval techniques to calculate the similarity between a user query and the latent emotion in song lyrics, thereby producing a ranked list of songs for playback

    Followee recommendation based on text analysis of micro-blogging activity

    Get PDF
    Nowadays, more and more users keep up with news through information streams coming from real-time micro-blogging activity offered by services such as Twitter. In these sites, information is shared via a followers/followees social network structure in which a follower receives all the micro-blogs from his/her followees. Recent research efforts on understanding micro-blogging as a novel form of communication and news spreading medium, have identified three different categories of users in these systems: information sources, information seekers and friends. As social networks grow in the number of registered users, finding relevant and reliable users to receive interesting information becomes essential. In this paper we propose a followee recommender system based on both the analysis of the content of micro-blogs to detect users´ interests and in the exploration of the topology of the network to find candidate users for recommendation. Experimental evaluation was conducted in order to determine the impact of different profiling strategies based on the text analysis of micro-blogs as well as several factors that allows the identification of users acting as good information sources. We found that user-generated content available in the network is a rich source of information for profiling users and finding like-minded people.Fil: Armentano, Marcelo Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Godoy, Daniela Lis. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Amandi, Analia Adriana. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentin

    How people share information about food: Insights from tweets regarding two Italian Regions

    Get PDF
    Sharing information about food through Twitter contributes to the evolution of food cultures, accelerating the exchange of information and knowledge about food. The aim of this study is to describe the type of information regarding food shared on Twitter and what kind of network is established between Twitter users in those cases when the #food in question is associated to a geographical area (#Tuscany or #Sicily). Using two different methodological approaches, combining quantitative tools with Network Analysis, the study highlights the fact that there are differences between the two networks surveyed, both with regard to the actors involved and to the way in which they share information on Twitter

    SMS: A Framework for Service Discovery by Incorporating Social Media Information

    Full text link
    © 2008-2012 IEEE. With the explosive growth of services, including Web services, cloud services, APIs and mashups, discovering the appropriate services for consumers is becoming an imperative issue. The traditional service discovery approaches mainly face two challenges: 1) the single source of description documents limits the effectiveness of discovery due to the insufficiency of semantic information; 2) more factors should be considered with the generally increasing functional and nonfunctional requirements of consumers. In this paper, we propose a novel framework, called SMS, for effectively discovering the appropriate services by incorporating social media information. Specifically, we present different methods to measure four social factors (semantic similarity, popularity, activity, decay factor) collected from Twitter. Latent Semantic Indexing (LSI) model is applied to mine semantic information of services from meta-data of Twitter Lists that contains them. In addition, we assume the target query-service matching function as a linear combination of multiple social factors and design a weight learning algorithm to learn an optimal combination of the measured social factors. Comprehensive experiments based on a real-world dataset crawled from Twitter demonstrate the effectiveness of the proposed framework SMS, through some compared approaches

    Temporal Web Image Retrieval

    Full text link
    International audienceTemporal Web Image Retrieval can be defined as the process that retrieves sets ofWeb images with their temporal dimension from explicit or implicit temporal text queries. Supposing that (a) the temporal dimension is included in image indexing and (b) the query is explicitly expressed with a time tag (e.g. "Fukushima 2011"), the retrieval task can be straightforward as image retrieval has been studied for several years with success. However, text queries are usually implicit in time (e.g. "Second World War") and automatically capturing the time dimension included in Web images is a challenge that has not been studied so far to the best of our knowledge. In this paper, we will discuss different research issues about Temporal Web Image Retrieval and the current progresses of our research in temporal ephemeral clustering and temporal image filtering
    corecore