19 research outputs found

    개인화 검색 및 파트너쉽 선정을 위한 사용자 프로파일링

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 치의과학과, 2014. 2. 김홍기.The secret of change is to focus all of your energy not on fighting the old, but on building the new. - Socrates The automatic identification of user intention is an important but highly challenging research problem whose solution can greatly benefit information systems. In this thesis, I look at the problem of identifying sources of user interests, extracting latent semantics from it, and modelling it as a user profile. I present algorithms that automatically infer user interests and extract hidden semantics from it, specifically aimed at improving personalized search. I also present a methodology to model user profile as a buyer profile or a seller profile, where the attributes of the profile are populated from a controlled vocabulary. The buyer profiles and seller profiles are used in partnership match. In the domain of personalized search, first, a novel method to construct a profile of user interests is proposed which is based on mining anchor text. Second, two methods are proposed to builder a user profile that gather terms from a folksonomy system where matrix factorization technique is explored to discover hidden relationship between them. The objective of the methods is to discover latent relationship between terms such that contextually, semantically, and syntactically related terms could be grouped together, thus disambiguating the context of term usage. The profile of user interests is also analysed to judge its clustering tendency and clustering accuracy. Extensive evaluation indicates that a profile of user interests, that can correctly or precisely disambiguate the context of user query, has a significant impact on the personalized search quality. In the domain of partnership match, an ontology termed as partnership ontology is proposed. The attributes or concepts, in the partnership ontology, are features representing context of work. It is used by users to lay down their requirements as buyer profiles or seller profiles. A semantic similarity measure is defined to compute a ranked list of matching seller profiles for a given buyer profile.1 Introduction 1 1.1 User Profiling for Personalized Search . . . . . . . . 9 1.1.1 Motivation . . . . . . . . . . . . . . . . . . . 10 1.1.2 Research Problems . . . . . . . . . . . . . . 11 1.2 User Profiling for Partnership Match . . . . . . . . 18 1.2.1 Motivation . . . . . . . . . . . . . . . . . . . 19 1.2.2 Research Problems . . . . . . . . . . . . . . 24 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . 25 1.4 System Architecture - Personalized Search . . . . . 29 1.5 System Architecture - Partnership Match . . . . . . 31 1.6 Organization of this Dissertation . . . . . . . . . . 32 2 Background 35 2.1 Introduction to Social Web . . . . . . . . . . . . . . 35 2.2 Matrix Decomposition Methods . . . . . . . . . . . 40 2.3 User Interest Profile For Personalized Web Search Non Folksonomy based . . . . . . . . . . . . . . . . 43 2.4 User Interest Profile for Personalized Web Search Folksonomy based . . . . . . . . . . . . . . . . . . . 45 2.5 Personalized Search . . . . . . . . . . . . . . . . . . 47 2.6 Partnership Match . . . . . . . . . . . . . . . . . . 52 3 Mining anchor text for building User Interest Profile: A non-folksonomy based personalized search 56 3.1 Exclusively Yours' . . . . . . . . . . . . . . . . . . . 59 3.1.1 Infer User Interests . . . . . . . . . . . . . . 61 3.1.2 Weight Computation . . . . . . . . . . . . . 64 3.1.3 Query Expansion . . . . . . . . . . . . . . . 67 3.2 Exclusively Yours' Algorithm . . . . . . . . . . . . 68 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . 71 3.3.1 DataSet . . . . . . . . . . . . . . . . . . . . 72 3.3.2 Evaluation Metrics . . . . . . . . . . . . . . 73 3.3.3 User Profile Efficacy . . . . . . . . . . . . . 74 3.3.4 Personalized vs. Non-Personalized Results . 76 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . 80 4 Matrix factorization for building Clustered User Interest Profile: A folksonomy based personalized search 82 4.1 Aggregating tags from user search history . . . . . 86 4.2 Latent Semantics in UIP . . . . . . . . . . . . . . . 90 4.2.1 Computing the tag-tag Similarity matrix . . 90 4.2.2 Tag Clustering to generate svdCUIP and modSvdCUIP 98 4.3 Personalized Search . . . . . . . . . . . . . . . . . . 101 4.4 Experimental Evaluation . . . . . . . . . . . . . . . 103 4.4.1 Data Set and Experiment Methodology . . . 103 4.4.1.1 Custom Data Set and Evaluation Metrics . . . . . . . . . . . . . . . 103 4.4.1.2 AOL Query Data Set and Evaluation Metrics . . . . . . . . . . . . . 107 4.4.1.3 Experiment set up to estimate the value of k and d . . . . . . . . . . 107 4.4.1.4 Experiment set up to compare the proposed approaches with other approaches . . . . . . . . . . . . . . . 109 4.4.2 Experiment Results . . . . . . . . . . . . . . 111 4.4.2.1 Clustering Tendency . . . . . . . . 111 4.4.2.2 Determining the value for dimension parameter, k, for the Custom Data Set . . . . . . . . . . . . . . . 113 4.4.2.3 Determining the value of distinctness parameter, d, for the Custom data set . . . . . . . . . . . . . . . 115 4.4.2.4 CUIP visualization . . . . . . . . . 117 4.4.2.5 Determining the value of the dimension reduction parameter k for the AOL data set. . . . . . . . . . . . 119 4.4.2.6 Determining the value of distinctness parameter, d, for the AOL data set . . . . . . . . . . . . . . . . . . 120 4.4.2.7 Time to generate svdCUIP and modSvd-CUIP . . . . . . . . . . . . . . . . 122 4.4.2.8 Comparison of the svdCUIP, modSvd-CUIP, and tfIdfCUIP for different classes of queries . . . . . . . . . . 123 4.4.2.9 Comparing all five methods - Improvement . . . . . . . . . . . . . . 124 4.4.3 Discussion . . . . . . . . . . . . . . . . . . . 126 5 User Profiling for Partnership Match 133 5.1 Supplier Selection . . . . . . . . . . . . . . . . . . . 137 5.2 Criteria for Partnership Establishment . . . . . . . 140 5.3 Partnership Ontology . . . . . . . . . . . . . . . . . 143 5.4 Case Study . . . . . . . . . . . . . . . . . . . . . . 147 5.4.1 Buyer Profile and Seller Profile . . . . . . . 153 5.4.2 Semantic Similarity Measure . . . . . . . . . 155 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . 160 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . 162 6 Conclusion 164 6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . 167 6.1.1 Degree of Personalization . . . . . . . . . . . 167 6.1.2 Filter Bubble . . . . . . . . . . . . . . . . . 168 6.1.3 IPR issues in Partnership Match . . . . . . . 169 Bibliography 170 Appendices 193 .1 Pairs of Query and target URL . . . . . . . . . . . 194 .2 Examples of Expanded Queries . . . . . . . . . . . 197 .3 An example of svdCUIP, modSvdCUIP, tfIdfCUIP 198Docto

    On Two Web IR Boosting Tools: Clustering and Ranking

    Get PDF
    This thesis investigates several research problems which arise in modern Web Information Retrieval (WebIR). The Holy Grail of modern WebIR is to find a way to organize and to rank results so that the most ``relevant' come first. The first break-through technique was the exploitation of the link structure of the Web graph in order to rank the result pages, using the well-known Hits and Pagerank algorithms. This link-analysis approaches have been improved and extended, but yet they seem to be insufficient in providing a satisfying search experience. In a number of situations a flat list of search results is not enough, and the users might desire to have search results grouped on-the-fly in folders of similar topics. In addition, the folders should be annotated with meaningful labels for rapid identification of the desired group of results. In other situations, users may have different search goals even when they express them with the same query. In this case the search results should be personalized according to the users' on-line activities. In order to address this need, we will discuss the algorithmic ideas behind SnakeT, a hierarchical clustering meta-search engine which personalizes searches according to the clusters selected by users on-the-fly. There are also situations where users might desire to access fresh information. In these cases, traditional link analysis could not be suitable. In fact, it is possible that there is not enough time to have many links pointing to a recently produced piece of information. In order to address this need, we will discuss the algorithmic and numerical ideas behind a new ranking algorithm suitable for ranking fresh type of information, such as news articles or blogs. When link analysis suffices to produce good quality search results, the huge amount of Web information asks for fast ranking methodologies. We will discuss numerical methodologies for accelerating the eingenvector-like computation, commonly used by link analysis. An important result of this thesis is that we show how to address the above predominant issues of Web Information Retrieval by using clustering and ranking methodologies. We will demonstrate that both clustering and ranking have a mutual reinforcement propriety which has not yet been studied intensively. This propriety can be exploited to boost the precision of both the two methodologies

    USING EXTERNAL SOURCES TO IMPROVE RESEARCH TALK RECOMMENDATION IN SMALL COMMUNITIES

    Get PDF
    In academic research communities, a typical way to spread ideas or seek for collaboration is through research talks, which might be presented at departmental colloquia or might be in given at conferences. Given a large number of research talks, with some of them happening in parallel, it becomes increasingly harder to focus on those of that are of most interest. To solve this problem, talk recommendation systems can help academics identify the most useful talks among many. This dissertation investigates methods to improve research talk recommendations, both for conference attendees and for faculty and students at a research university. More specifically, the focus of this thesis is the use of external information about user interests as a way to address the challenges of having limited data about target users. The thesis examines several kinds of external sources such as user home page, bibliography, external bookmarks, and user profiles from external information systems and explores impact of this information on the quality of talk recommendation in a general situation and in a cold-start context. For this study, the dissertation uses data from two existing talk recommendation systems, CoMeT and Conference Navigator 3, and an academic paper search system, SciNet

    Addressing the cold start problem in tag-based recommender systems

    Get PDF
    Folksonomies have become a powerful tool to describe, discover, search, and navigate online resources (e.g., pictures, videos, blogs) on the Social Web. Unlike taxonomies and ontologies, which impose a hierarchical categorisation on content, folksonomies directly allow end users to freely create and choose the categories (in this case, tags) that best describe a piece of information. However, the freedom aafforded to users comes at a cost: as tags are defined informally, the retrieval of information becomes more challenging. Different solutions have been proposed to help users discover content in this highly dynamic setting. However, they have proved to be effective only for users who have already heavily used the system (active users) and who are interested in popular items (i.e., items tagged by many other users). In this thesis we explore principles to help both active users and more importantly new or inactive users (cold starters) to find content they are interested in even when this content falls into the long tail of medium-to-low popularity items (cold start items). We investigate the tagging behaviour of users on content and show how the similarities between users and tags can be used to produce better recommendations. We then analyse how users create new content on social tagging websites and show how preferences of only a small portion of active users (leaders), responsible for the vast majority of the tagged content, can be used to improve the recommender system's scalability. We also investigate the growth of the number of users, items and tags in the system over time. We then show how this information can be used to decide whether the benefits of an update of the data structures modelling the system outweigh the corresponding cost. In this work we formalize the ideas introduced above and we describe their implementation. To demonstrate the improvements of our proposal in recommendation efficacy and efficiency, we report the results of an extensive evaluation conducted on three different social tagging websites: CiteULike, Bibsonomy and MovieLens. Our results demonstrate that our approach achieves higher accuracy than state-of-the-art systems for cold start users and for users searching for cold start items. Moreover, while accuracy of our technique is comparable to other techniques for active users, the computational cost that it requires is much smaller. In other words our approach is more scalable and thus more suitable for large and quickly growing settings

    Benefits of the application of web-mining methods and techniques for the field of analytical customer relationship management of the marketing function in a knowledge management perspective

    Get PDF
    Le Web Mining (WM) reste une technologie relativement méconnue. Toutefois, si elle est utilisée adéquatement, elle s'avère être d'une grande utilité pour l'identification des profils et des comportements des clients prospects et existants, dans un contexte internet. Les avancées techniques du WM améliorent grandement le volet analytique de la Gestion de la Relation Client (GRC). Cette étude suit une approche exploratoire afin de déterminer si le WM atteint, à lui seul, tous les objectifs fondamentaux de la GRC, ou le cas échéant, devrait être utilisé de manière conjointe avec la recherche marketing traditionnelle et les méthodes classiques de la GRC analytique (GRCa) pour optimiser la GRC, et de fait le marketing, dans un contexte internet. La connaissance obtenue par le WM peut ensuite être administrée au sein de l'organisation dans un cadre de Gestion de la Connaissance (GC), afin d'optimiser les relations avec les clients nouveaux et/ou existants, améliorer leur expérience client et ultimement, leur fournir de la meilleure valeur. Dans un cadre de recherche exploratoire, des entrevues semi-structurés et en profondeur furent menées afin d'obtenir le point de vue de plusieurs experts en (web) data rnining. L'étude révéla que le WM est bien approprié pour segmenter les clients prospects et existants, pour comprendre les comportements transactionnels en ligne des clients existants et prospects, ainsi que pour déterminer le statut de loyauté (ou de défection) des clients existants. Il constitue, à ce titre, un outil d'une redoutable efficacité prédictive par le biais de la classification et de l'estimation, mais aussi descriptive par le biais de la segmentation et de l'association. En revanche, le WM est moins performant dans la compréhension des dimensions sous-jacentes, moins évidentes du comportement client. L'utilisation du WM est moins appropriée pour remplir des objectifs liés à la description de la manière dont les clients existants ou prospects développent loyauté, satisfaction, défection ou attachement envers une enseigne sur internet. Cet exercice est d'autant plus difficile que la communication multicanale dans laquelle évoluent les consommateurs a une forte influence sur les relations qu'ils développent avec une marque. Ainsi le comportement en ligne ne serait qu'une transposition ou tout du moins une extension du comportement du consommateur lorsqu'il n'est pas en ligne. Le WM est également un outil relativement incomplet pour identifier le développement de la défection vers et depuis les concurrents ainsi que le développement de la loyauté envers ces derniers. Le WM nécessite toujours d'être complété par la recherche marketing traditionnelle afin d'atteindre ces objectives plus difficiles mais essentiels de la GRCa. Finalement, les conclusions de cette recherche sont principalement dirigées à l'encontre des firmes et des gestionnaires plus que du côté des clients-internautes, car ces premiers plus que ces derniers possèdent les ressources et les processus pour mettre en œuvre les projets de recherche en WM décrits.\ud ______________________________________________________________________________ \ud MOTS-CLÉS DE L’AUTEUR : Web mining, Gestion de la connaissance, Gestion de la relation client, Données internet, Comportement du consommateur, Forage de données, Connaissance du consommateu

    Using community trained recommender models for enhanced information retrieval

    Get PDF
    Research in Information Retrieval (IR) seeks to develop methods which better assist users in finding information which is relevant to their current information needs. Personalization is a significant focus of research for the development of next generation of IR systems. Commercial search engines are exploring methods to incorporate models of the user’s interests to facilitate personalization in IR to improve retrieval effectiveness. However, in some situations there may be no opportunity to learn about the interests of a specific user on a certain topic. This is a significant challenge for IR researchers attempting to improve search effectiveness by exploiting user search behaviour. We propose a solution to this problem based on recommender systems (RSs) in a novel IR model which combines a recommender model with traditional IR methods to improve retrieval results for search tasks, where the IR system has no opportunity to acquire prior information about the user’s knowledge of a domain for which they have not previously entered a query. We use search behaviour data from other previous users to build topic category models based on topic interests. When a user enters a query on a topic which is new to this user, but related to a topical search category, the appropriate topic category model is selected and used to predict a ranking which this user may find interesting based on previous search behaviour. The recommender outputs are used in combination with the output of a standard IR system to produce the overall output to the user. In this thesis, the IR and recommender components of this integrated model are investigated

    B!SON: A Tool for Open Access Journal Recommendation

    Get PDF
    Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project
    corecore