7 research outputs found

    IMPLEMENTATION OF EXPERT SEARCH IN CONFINED WEB ENVIRONMENT

    Get PDF
    Expert search turn out to be a hot investigate area as commencing of TREC enterprise path. Searching experts on web is dissimilar from managerial expert search in that we believe normal web pages as well as people names. Early approach in support of expert search involves construction a knowledge base which encloses descriptions of people ability in an association. For modernizing profiles in systems in a regular manner there is requiring for intelligent knowledge. In real world, heat diffuse in a medium from arrangement with advanced temperatures to those by minor temperatures. By means of a large amount of co-occurrence information, noises may possibly be suppressed while noisy co-occurrences would not come into view regularly on web. The perception following the diffusion representation is by constructing matrix; we essentially combined co-occurrence information between people as well as words to imitate the association strength among each pair of objects. Facilitating collaboration all the way through expert discovery applications is an essential part of making sure that capability in an organization is efficiently exploited. The commence of Expert Finding task at TREC has made a lot of attention in knowledge recovery, by quick improvement being made in terms of modelling, algorithms, and assessment aspects

    Recommendation on the Web Search by Using Co-Occurrence

    Get PDF
    ABSTRACT: In our day to day, the usage of internet and searching the information should be increases rapidly. Because of this, now a days we have facing the problems like whether the retrieving information would be noise free or not and having many confusions with the usage of keywords to get the exact result. To avoid this problem we are going to propose the concepts called Co-Occurrence and recommendation. These two concepts increases the effectiveness and of the result. By using the recommendation concept we have multiple choices to select the desired thing. The web search increases dramatically [1] user search performance leads to large number of confusions. We examine a general expert search problem: searching experts on the web, where millions of web pages and thousands of names are considered. The two main issues are: Web pages might be of untrustworthy and have more noise; the knowledge evidences spotted in web pages are frequently unclear and ambiguous. The skilled search has been studied in different contexts, e.g., enterprises, academic communities. We propose to influence the huge quantity of co-occurrence information to calculate the significance and status of a person name for a query which is given. So this makes the recommendation system the most important and the trust worthiness of the system will be analyzed in the better way. The personalization will be depended based on the individual user process in the web search mainly worked in E-Commerce application

    Modeling User Expertise in Folksonomies by Fusing Multi-type Features

    Get PDF
    Abstract. The folksonomy refers to the online collaborative tagging system which offers a new open platform for content annotation with uncontrolled vocabulary. As folksonomies are gaining in popularity, the expert search and spammer detection in folksonomies attract more and more attention. However, most of previous work are limited on some folksonomy features. In this paper, we introduce a generic and flexible user expertise model for expert search and spammer detection. We first investigate a comprehensive set of expertise evidences related to users, objects and tags in folksonomies. Then we discuss the rich interactions between them and propose a unified Continuous CRF model to integrate these features and interactions. This model's applications for expert recommendation and spammer detection are also exploited. Extensive experiments are conducted on a real tagging dataset and demonstrate the model's advantages over previous methods, both in performance and coverage

    PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques

    Get PDF
    The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE

    Mining and Analyzing the Academic Network

    Get PDF
    Social Network research has attracted the interests of many researchers, not only in analyzing the online social networking applications, such as Facebook and Twitter, but also in providing comprehensive services in scientific research domain. We define an Academic Network as a social network which integrates scientific factors, such as authors, papers, affiliations, publishing venues, and their relationships, such as co-authorship among authors and citations among papers. By mining and analyzing the academic network, we can provide users comprehensive services as searching for research experts, published papers, conferences, as well as detecting research communities or the evolutions hot research topics. We can also provide recommendations to users on with whom to collaborate, whom to cite and where to submit.In this dissertation, we investigate two main tasks that have fundamental applications in the academic network research. In the first, we address the problem of expertise retrieval, also known as expert finding or ranking, in which we identify and return a ranked list of researchers, based upon their estimated expertise or reputation, to user-specified queries. In the second, we address the problem of research action recommendation (prediction), specifically, the tasks of publishing venue recommendation, citation recommendation and coauthor recommendation. For both tasks, to effectively mine and integrate heterogeneous information and therefore develop well-functioning ranking or recommender systems is our principal goal. For the task of expertise retrieval, we first proposed or applied three modified versions of PageRank-like algorithms into citation network analysis; we then proposed an enhanced author-topic model by simultaneously modeling citation and publishing venue information; we finally incorporated the pair-wise learning-to-rank algorithm into traditional topic modeling process, and further improved the model by integrating groups of author-specific features. For the task of research action recommendation, we first proposed an improved neighborhood-based collaborative filtering approach for publishing venue recommendation; we then applied our proposed enhanced author-topic model and demonstrated its effectiveness in both cited author prediction and publishing venue prediction; finally we proposed an extended latent factor model that can jointly model several relations in an academic environment in a unified way and verified its performance in four recommendation tasks: the recommendation on author-co-authorship, author-paper citation, paper-paper citation and paper-venue submission. Extensive experiments conducted on large-scale real-world data sets demonstrated the superiority of our proposed models over other existing state-of-the-art methods

    Intelligent methods for information filtering of research resources

    Get PDF
    This thesis presents several content-based methods to address the task of filtering research resources. The explosive growth of the Web in the last decades has led to an important increase in available scientific information. This has contributed to the need for tools which help researchers to deal with huge amounts of data. Examples of such tools are digital libraries, dedicated search engines, and personalized information filters. The latter, also known as recommenders, have proved useful for non-academic purposes and in the last years have started to be considered for recommendation of scholarly resources. This thesis explores new developments in this context. In particular, we focus on two different tasks. First we explore how to make maximal use of the semi-structured information typically available for research papers, such as keywords, authors, or journal, to assess research paper similarity. This is important since in many cases the full text of the articles is not available and the information used for tasks such as article recommendation is often limited to the abstracts. To exploit all the available information, we propose several methods based on both the vector space model and language modeling. In the first case, we study how the popular combination of tf-idf and cosine similarity can be used not only with the abstract, but also with the keywords and the authors. We also combine the abstract and these extra features by using Explicit Semantic Analysis. In the second case, we estimate separate language models based on each of the features to subsequently interpolate them. Moreover, we employ Latent Dirichlet Allocation (LDA) to discover latent topics which can enrich the models, and we explore how to use the keywords and the authors to improve the performance of the standard LDA algorithm. Next, we study the information available in call for papers (CFPs) of conferences to exploit it in content-based methods to match users with CFPs. Specifically, we distinguish between textual content such as the introductory text and topics in the scope of the conference, and names of the program committee. This second type of information can be used to retrieve the research papers written by these people, which provides the system with new data about the conference. Moreover, the research papers written by the users are employed to represent their interests. Again, we explore methods based on both the vector space model and language modeling to combine the different types of information. The experimental results indicate that the use of these extra features can lead to significant improvements. In particular, our methods based on interpolation of language models perform well for the task of assessing the similarity between research papers. On the contrary, when addressing the problem of filtering CFPs the methods based on the vector space model are shown to be more robust.Dit proefschrift stelt verschillende content-gebaseerde methoden voor om het probleem van het filteren van onderzoeksgerelateerde resources aan te pakken. De explosieve groei van het internet in de laatste decennia heeft geleid tot een belangrijke toename van de beschikbare wetenschappelijke informatie. Dit heeft bijgedragen aan de behoefte aan tools die onderzoekers helpen om om te gaan met grote hoeveelheden van data. Voorbeelden van dergelijke tools zijn digitale bibliotheken, specifieke zoekmachines, en gepersonaliseerde informatiefilters. Deze laatste, ook gekend als aanbevelingssystemen, hebben ruimschoots hun nut bewezen voor niet-academische doeleinden, en in de laatste jaren is men ze ook beginnen inzetten voor de aanbeveling van wetenschappelijke resources. Dit proefschrift exploreert nieuwe ontwikkelingen in deze context. In het bijzonder richten we ons op twee verschillende taken. Eerst onderzoeken we hoe we maximaal gebruik kunnen maken van de semigestructureerde informatie die doorgaans beschikbaar is voor wetenschappelijke artikels, zoals trefwoorden, auteurs, of tijdschrift, om de gelijkenis tussen wetenschappelijke artikels te beoordelen. Dit is belangrijk omdat in veel gevallen de volledige tekst van de artikelen niet beschikbaar is en de informatie gebruikt voor taken zoals aanbeveling van artikels vaak beperkt is tot de abstracts. Om alle beschikbare informatie te benutten, stellen we een aantal methoden voor op basis van zowel het vector space model en language models. In het eerste geval bestuderen we hoe de populaire combinatie van tf-idf en cosinussimilariteit gebruikt kan worden met niet alleen de abstract, maar ook met de trefwoorden en de auteurs. We combineren ook de abstract met deze extra informatie door het gebruik van Explicit Semantic Analysis. In het tweede geval schatten we afzonderlijke taalmodellen die gebaseerd zijn op de verschillende soorten informatie om ze daarna te interpoleren. Bovendien maken we gebruik van Latent Dirichlet Allocation (LDA) om latente onderwerpen te ontdekken die de modellen kunnen verrijken, en we onderzoeken hoe de trefwoorden en de auteurs gebruikt kunnen worden om de prestaties van de standaard LDA algoritme te verbeteren. Vervolgens bestuderen we de informatie beschikbaar in de call for papers (CFPs) van conferenties om deze te exploiteren in content-gebaseerde methoden om gebruikers te matchen met CFPs. Met name maken we onderscheid tussen tekstuele inhoud, zoals de inleidende tekst en onderwerpen in het kader van de conferentie, en de namen van de programmacommissie. Dit tweede type informatie kan gebruikt worden om de artikels geschreven door deze mensen te achterhalen, wat het systeem voorziet van bijkomende gegevens over de conferentie. Bovendien worden de artikels geschreven door de gebruikers gebruikt om hun interesses te voorstellen. Opnieuw onderzoeken we methoden gebaseerd op zowel het vector space model als op language models om de verschillende soorten informatie te combineren. De experimentele resultaten tonen aan dat het gebruik van deze extra informatie kan leiden tot significante verbeteringen. In het bijzonder presteren onze methoden op basis van interpolatie van taalmodellen goed voor de taak van het beoordelen van de gelijkenis tussen wetenschappelijke artikels. Daarentegen zijn de methoden gebaseerd op het vector space model meer robuust voor het probleem van het filteren van CFPs
    corecore