30 research outputs found
Organization and Usage of Learning Objects within Personal Computers
Research report of the ProLearn Network of Excellence (IST 507310), Deliverable 7.6To promote the integration of Desktop related Knowledge Management and Technology Enhanced Learning this deliverable aims at increasing the awareness of Desktop research within the Professional Learning community and at familiarizing the e-Learning researchers with the state-of-the-art in the relevant areas of Personal Information Management (PIM), as well as with the currently on-going activities and some of the regular PIM publication venues
Emerging Applications of Link Analysis for Ranking
The booming growth of digitally available information has thoroughly increased the popularity of search engine technology over the past years. At the same time, upon interacting with this overwhelming quantity of data, people usually inspect only the very few most relevant items for their task. It is thus very important to utilize high quality ranking measures which efficiently identify these items under the various information retrieval activities we pursue. In this thesis we provide a twofold contribution to the Information Retrieval field. First, we identify those application areas in which a user oriented ranking is missing, though extremely necessary in order to facilitate a qualitative access to relevant resources. Second, for each of these areas we propose appropriate ranking algorithms which exploit their underlying social characteristics, either at the macroscopic, or at the microscopic level. We achieve this by utilizing link analysis techniques, which build on top of the graph based representation of relations between resources in order to rank them or simply to identify social patterns relative to the investigated data set. W
services
Searching the web has become a task in many people’s work, without which subsequent tasks would be hard to carry out or even impossible. But as people tend to have less time for querying the web or even for searching their personal computer for information they need, it becomes common to skip information gathering activities like trying to find useful resources on the web because of the “effort” it takes to query a web search engine. In this paper we propose to use software agents that collect useful web specific related information which would otherwise not be viewed at all. More specifically, we present two new algorithms to automatically search the web and recommend URLs relevant to user’s current work, defined through his or her active personal desktop documents. Our experiments show our proposed algorithms, Sentence Selection and Lexical Compounds, to yield significant improvement over simple Term Frequency based web query generation, which we used as a baseline
Analyzing user behavior to rank desktop items
Existing desktop search applications, trying to keep up with the rapidly increasing storage capacities of our hard disks, are an important step towards more efficient personal information management, yet they offer an incomplete solution. While their indexing functionalities in terms of different file types they are able to cope with are impressive, their ranking capabilities are basic, and rely only on textual retrieval measures, comparable to the first generation of web search engines. In this paper we propose to connect semantically related desktop items by exploiting usage analysis information about sequences of accesses to local resources, as well as about each user’s local resource organization structures. We investigate and evaluate in detail the possibilities to translate this information into a desktop linkage structure, and we propose several algorithms that exploit these newly created links in order to efficiently rank desktop items. Finally, we empirically show that the access based links lead to ranking results comparable with TFxIDF ranking, and significantly surpass TFxIDF when used in combination with it, making them a very valuable source of input to desktop search ranking algorithms
Abstract
This paper investigates the influence of different page features on the ranking of search engine results. We use Google (via its API) as our testbed and analyze the result rankings for several queries of different categories using statistical methods. We reformulate the problem of learning the underlying, hidden scores as a binary classification problem. To this problem we then apply both linear and non-linear methods. In all cases, we split the data into a training set and a test set to obtain a meaningful, unbiased estimator for the quality of our predictor. Although our results clearly show that the scoring function cannot be approximated well using only the observed features, we do obtain many interesting insights along the way and discuss ways of obtaining a better estimate and main limitations in trying to do so.
Finding Related Pages Using the Link Structure of the WWW
Most of the current algorithms for finding related pages are exclusively based on text corpora of the WWW or incorporate only authority or hub values of pages. In this paper, we present HubFinder, a new fast algorithm for finding related pages exploring the link structure of the Web graph. Its criterion for filtering output pages is \u94pluggable\u94, depending on the user\u92s interests, and may vary from global page ranks to text content, etc. We also introduce HubRank, a new ranking algorithm which gives a more complete view of page \u94importance\u94 by biasing the authority measure of PageRank towards hub values of pages. Finally, we present an evaluation of these algorithms in order to prove their qualities experimentally
Knowing Where to Search: Personalized Search Strategies for Peers In P2P Networks
Optimizing and focusing search and results ranking in P2P networks becomes more and more important with the increasing size of these networks. Even though a few approaches have already started to investigate the computation of PageRank-like values in P2P environments, none so far has investigated how personalization could be added to it. This paper tackles the problem of distributedly computing Personalized PageRank values in such a distributed environment and presents an algorithm which uses them to optimize and focus search in the P2P network. The paper also discusses how these algorithms improve current distributed search in power law networks and gives some simulation results
PROS: A Personalized Ranking Platform for Web Search
Current search engines rely on centralized page ranking algorithms which compute page rank values as single (global) values for each Web page. Recent work on topic-sensitive PageRank [6] and personalized PageRank [8] has explored how to extend PageRank values with personalization aspects. To achieve personalization, these algorithms need specific input: [8] for example needs a set of personalized hub pages with high PageRank to drive the computation. In this paper we show how to automate this hub selection process and build upon the latter algorithm to implement a platform for personalized ranking.We start from the set of bookmarks collected by a user and extend it to contain a set of hubs with high PageRank related to them. To get additional input about the user, we implemented a proxy server which tracks and analyzes user\u92s surfing behavior and outputs a set of pages preferred by the user. This set is then enrichened using our HubFinder algorithm, which finds related pages, and used as extended input for the [8] algorithm. All algorithms are integrated into a prototype of a personalized Web search system, for which we present a first evaluation. 1 Introduction Using the link structure of the World Wide Web to rank pages in search engines has been investigated heavily in recent years. The success of the Google Search Engine [5, 3] has inspired much of this work, but has lead also to the realization that further improvements are needed. The amount and diversity of Web pages (Google now indicates about 4.3 billion pages) lead researchers to explore faster and more personalized page ranking algorithms, in order to address the fact that some topics are covered by only a few thousand pages and some are covered by millions. For many topics, the existing PageRank algorithm is not sufficient to filter the results of a search engine query. Take for example the well-known query with the word \u93Java\u94 which should return top results for either the programming language or the island in the Pacific: Google definitively prefers the programming language because there are many more important pages on it than on the island. Moreover, most of the existing search engines focus only on answering user queries, although personalization will be more and more important as the amount of information available in the Web increases. Recently, several approaches to solve such problems have been investigated, building upon content analysis or on algorithms which build page ranks personalized for users or classes of users. The mos
Search Strategies for Scientific Collaboration Networks
Can we improve P2P search by looking into our social network? In this paper, we argue that P2P networks built upon specific communities (e.g., scientific social networks) could achieve such a goal, by providing an implicit personalization to the output results set. Existing work in social networks investigating co-authorship relations has shown that scientific collaboration networks are scale-free. At the same time, P2P systems based on synthesized small-world networks have emerged, with a positive impact on search efficiency. We propose to use existing social collaboration graphs as foundation for the P2P topology instead of creating purely technological topologies. To get an insight into the relationship between scientific collaboration and co-authorship, we compared both for an existing collaboration network. Based on this analysis, we then generated a large P2P collaboration network derived from co-authorship data collections as basis for our experiments. The most prevalent search type in the scientific context is keyword search for relevant publications. We investigate different search strategies suitable in that context and show our initial experimental results