263 research outputs found

    Reliability and effectiveness of clickthrough data for automatic image annotation

    Get PDF
    Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive manual annotation effort. We investigate and evaluate this approach using a collection of 97,628 photographic images. The results indicate that the contribution of search log based training data is positive despite their inherent noise; in particular, the combination of manual and automatically generated training data outperforms the use of manual data alone. It is therefore possible to use clickthrough data to perform large-scale image annotation with little manual annotation effort or, depending on performance, using only the automatically generated training data. An extensive presentation of the experimental results and the accompanying data can be accessed at http://olympus.ee.auth.gr/~diou/civr2009/

    Text Extraction and Web Searching in a Non-Latin Language

    Get PDF
    Recent studies of queries submitted to Internet Search Engines have shown that non-English queries and unclassifiable queries have nearly tripled during the last decade. Most search engines were originally engineered for English. They do not take full account of inflectional semantics nor, for example, diacritics or the use of capitals which is a common feature in languages other than English. The literature concludes that searching using non-English and non-Latin based queries results in lower success and requires additional user effort to achieve acceptable precision. The primary aim of this research study is to develop an evaluation methodology for identifying the shortcomings and measuring the effectiveness of search engines with non-English queries. It also proposes a number of solutions for the existing situation. A Greek query log is analyzed considering the morphological features of the Greek language. Also a text extraction experiment revealed some problems related to the encoding and the morphological and grammatical differences among semantically equivalent Greek terms. A first stopword list for Greek based on a domain independent collection has been produced and its application in Web searching has been studied. The effect of lemmatization of query terms and the factors influencing text based image retrieval in Greek are also studied. Finally, an instructional strategy is presented for teaching non-English students how to effectively utilize search engines. The evaluation of the capabilities of the search engines showed that international and nationwide search engines ignore most of the linguistic idiosyncrasies of Greek and other complex European languages. There is a lack of freely available non-English resources to work with (test corpus, linguistic resources, etc). The research showed that the application of standard IR techniques, such as stopword removal, stemming, lemmatization and query expansion, in Greek Web searching increases precision. i

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Towards memory supporting personal information management tools

    Get PDF
    In this article we discuss re-retrieving personal information objects and relate the task to recovering from lapse(s) in memory. We propose that fundamentally it is lapses in memory that impede users from successfully re-finding the information they need. Our hypothesis is that by learning more about memory lapses in non-computing contexts and how people cope and recover from these lapses, we can better inform the design of PIM tools and improve the user's ability to re-access and re-use objects. We describe a diary study that investigates the everyday memory problems of 25 people from a wide range of backgrounds. Based on the findings, we present a series of principles that we hypothesize will improve the design of personal information management tools. This hypothesis is validated by an evaluation of a tool for managing personal photographs, which was designed with respect to our findings. The evaluation suggests that users' performance when re-finding objects can be improved by building personal information management tools to support characteristics of human memory

    Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop

    Get PDF

    Ashesi network traffic analysis

    Get PDF
    Thesis submitted to the Department of Computer Science, Ashesi University College, in partial fulfillment of Bachelor of Science degree in Management Information Systems, April 2017The web has evolved from an information exchange system to a data mining, knowledge creation or Knowledge dissemination platform where the internet is viewed as a critical and vital component of success by students, lecturers and researchers in higher institutions particularly within universities and colleges. There is thus pressure on network directors managing network services within these institutions to provide regular and correct utilization of the available bandwidth. Network directors are being pressured to ensure that there is sufficient bandwidth available for every network user and also ensure the bandwidth is used productively. With the expansion of digitally made contents and Internet computing demands in the last couple of years, network users often complain of insufficient bandwidth available to completely satisfy their wants. In an attempt to attend to users complains, network directors ought to determine the main objective or purpose of the offered bandwidth and identify unproductive applications eating the bandwidth. In this paper, an attempt is made to investigate network traffic on the Ashesi University network system and to prioritize applications on the network based on user needs and based on what the bandwidth is purchased for. A policy framework is additionally outlined for best utilization and management of the university`s network system. The findings of the study indicate that there is the need to prioritize applications on the network because it was made clear that users access certain types of applications during school hours and some other type of applications after school hours. There is also the need to install certain servers locally in alternative to scaling back the traffic caused during peak days and also during peak hours of the Ashesi network system.Ashesi University Colleg

    Search engine optimisation using past queries

    Get PDF
    World Wide Web search engines process millions of queries per day from users all over the world. Efficient query evaluation is achieved through the use of an inverted index, where, for each word in the collection the index maintains a list of the documents in which the word occurs. Query processing may also require access to document specific statistics, such as document length; access to word statistics, such as the number of unique documents in which a word occurs; and collection specific statistics, such as the number of documents in the collection. The index maintains individual data structures for each these sources of information, and repeatedly accesses each to process a query. A by-product of a web search engine is a list of all queries entered into the engine: a query log. Analyses of query logs have shown repetition of query terms in the requests made to the search system. In this work we explore techniques that take advantage of the repetition of user queries to improve the accuracy or efficiency of text search. We introduce an index organisation scheme that favours those documents that are most frequently requested by users and show that, in combination with early termination heuristics, query processing time can be dramatically reduced without reducing the accuracy of the search results. We examine the stability of such an ordering and show that an index based on as little as 100,000 training queries can support at least 20 million requests. We show the correlation between frequently accessed documents and relevance, and attempt to exploit the demonstrated relationship to improve search effectiveness. Finally, we deconstruct the search process to show that query time redundancy can be exploited at various levels of the search process. We develop a model that illustrates the improvements that can be achieved in query processing time by caching different components of a search system. This model is then validated by simulation using a document collection and query log. Results on our test data show that a well-designed cache can reduce disk activity by more than 30%, with a cache that is one tenth the size of the collection

    Using community trained recommender models for enhanced information retrieval

    Get PDF
    Research in Information Retrieval (IR) seeks to develop methods which better assist users in finding information which is relevant to their current information needs. Personalization is a significant focus of research for the development of next generation of IR systems. Commercial search engines are exploring methods to incorporate models of the user’s interests to facilitate personalization in IR to improve retrieval effectiveness. However, in some situations there may be no opportunity to learn about the interests of a specific user on a certain topic. This is a significant challenge for IR researchers attempting to improve search effectiveness by exploiting user search behaviour. We propose a solution to this problem based on recommender systems (RSs) in a novel IR model which combines a recommender model with traditional IR methods to improve retrieval results for search tasks, where the IR system has no opportunity to acquire prior information about the user’s knowledge of a domain for which they have not previously entered a query. We use search behaviour data from other previous users to build topic category models based on topic interests. When a user enters a query on a topic which is new to this user, but related to a topical search category, the appropriate topic category model is selected and used to predict a ranking which this user may find interesting based on previous search behaviour. The recommender outputs are used in combination with the output of a standard IR system to produce the overall output to the user. In this thesis, the IR and recommender components of this integrated model are investigated
    corecore