172 research outputs found

    A User-Aware and Semantic Approach for Enterprise Search

    Get PDF
    This article describes how in addition to general purposes search engines, specialized search engines have appeared and have gained their part of the market. An enterprise search engine enables the search inside the enterprise information, mainly web pages but also other kinds of documents; the search is performed by people inside the enterprise or by customers. This article proposes an enterprise search engine called AMBIT-SE that relies on two enhancements: first, it is user-aware in the sense that it takes into consideration the profile of the users that perform the query; second, it exploits semantic techniques to consider not only exact matches but also synonyms and related terms. It performs two main activities: (1) information processing to analyse the documents and build the user profile and (2) search and retrieval to search for information that matches user’s query and profile. An experimental evaluation of the proposed approach is performed on different real websites, showing its benefits over other well-established approaches

    Social impact retrieval: measuring author influence on information retrieval

    Get PDF
    The increased presence of technologies collectively referred to as Web 2.0 mean the entire process of new media production and dissemination has moved away from an authorcentric approach. Casual web users and browsers are increasingly able to play a more active role in the information creation process. This means that the traditional ways in which information sources may be validated and scored must adapt accordingly. In this thesis we propose a new way in which to look at a user's contributions to the network in which they are present, using these interactions to provide a measure of authority and centrality to the user. This measure is then used to attribute an query-independent interest score to each of the contributions the author makes, enabling us to provide other users with relevant information which has been of greatest interest to a community of like-minded users. This is done through the development of two algorithms; AuthorRank and MessageRank. We present two real-world user experiments which focussed around multimedia annotation and browsing systems that we built; these systems were novel in themselves, bringing together video and text browsing, as well as free-text annotation. Using these systems as examples of real-world applications for our approaches, we then look at a larger-scale experiment based on the author and citation networks of a ten year period of the ACM SIGIR conference on information retrieval between 1997-2007. We use the citation context of SIGIR publications as a proxy for annotations, constructing large social networks between authors. Against these networks we show the effectiveness of incorporating user generated content, or annotations, to improve information retrieval

    Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science

    Get PDF
    These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)

    A novel approach integrating ranking functions discovery, optimization and infernce to improve retrieval performance

    Get PDF
    The significant roles play by ranking function in the performance and success of Information Retrieval (IR) systems and search engines cannot be underestimated. Diverse ranking functions are available in IR literature. However, empirical studies show that ranking functions do not perform constantly well across different contexts (queries, collections, users). In this study, a novel three-stage integrated ranking framework is proposed for implementing discovering, optimizing and inference rankings used in IR systems. The first phase, discovery process is based on Genetic Programming (GP) approach which smartly combines structural and contents features in the documents while the second phase, optimization process is based on Genetic Algorithm (GA) which combines document retrieval scores of various well-known ranking functions. In the 3rd phase, Fuzzy inference proves as soft search constraints to be applied on documents. We demonstrate how these two features are combined to bring new tasks and processes within the three concept stages of integrated framework for effective IR

    Search beyond traditional probabilistic information retrieval

    Get PDF
    "This thesis focuses on search beyond probabilistic information retrieval. Three ap- proached are proposed beyond the traditional probabilistic modelling. First, term associ- ation is deeply examined. Term association considers the term dependency using a factor analysis based model, instead of treating each term independently. Latent factors, con- sidered the same as the hidden variables of ""eliteness"" introduced by Robertson et al. to gain understanding of the relation among term occurrences and relevance, are measured by the dependencies and occurrences of term sequences and subsequences. Second, an entity-based ranking approach is proposed in an entity system named ""EntityCube"" which has been released by Microsoft for public use. A summarization page is given to summarize the entity information over multiple documents such that the truly relevant entities can be highly possibly searched from multiple documents through integrating the local relevance contributed by proximity and the global enhancer by topic model. Third, multi-source fusion sets up a meta-search engine to combine the ""knowledge"" from different sources. Meta-features, distilled as high-level categories, are deployed to diversify the baselines. Three modified fusion methods are employed, which are re- ciprocal, CombMNZ and CombSUM with three expanded versions. Through extensive experiments on the standard large-scale TREC Genomics data sets, the TREC HARD data sets and the Microsoft EntityCube Web collections, the proposed extended models beyond probabilistic information retrieval show their effectiveness and superiority.

    The development of a model of information seeking behaviour of students in higher education when using internet search engines.

    Get PDF
    This thesis develops a model of Web information seeking behaviour of postgraduate students with a specific focus on Web search engines' use. It extends Marchionini's eight stage model of information seeking, geared towards electronic environments, to holistically encompass the physical, cognitive, affective and social dimensions of Web users' behaviour. The study recognises the uniqueness of the Web environment as a vehicle for information dissemination and retrieval, drawing on the distinction between information searching and information seeking and emphasises the importance of following user-centred holistic approaches to study information seeking behaviour. It reviews the research in the field and demonstrates that there is no comprehensive model that explains the behaviour of Web users when employing search engines for information retrieval. The methods followed to develop the study are explained with a detailed analysis of the four dimensions of information seeking (physical, cognitive affective, social). Emphasis is placed on the significance of combined methods (qualitative and quantitative) and the ways in which they can enrich the examination of human behaviour. This is concluded with a discussion of methodological issues. The study is supported by an empirical investigation, which examines the relationship between interactive information retrieval using Web search engines and human information seeking processes. This investigates the influence of cognitive elements (such as learning and problem style, and creative ability) and affective characteristics (e. g. confidence, loyalty, familiarity, ease of use), as well as the role that system experience, domain knowledge and demographics play in information seeking behaviour and in user overall satisfaction with the retrieval result. The influence of these factors is analysed by identifying users' patterns of behaviour and tactics, adopted to solve specific problems. The findings of the empirical study are incorporated into an enriched information-seeking model, encompassing use of search engines, which reveals a complex interplay between physical, cognitive, affective and social elements and that none of these characteristics can be seen in isolation when attempting to explain the complex phenomenon of information seeking behaviour. Although the model is presented in a linear fashion the dynamic, reiterative and circular character of the information seeking process is explained through an emphasis on transition patterns between the different stages. The research concludes with a discussion of problems encountered by Web information seekers which provides detailed analysis of the reasons why users express satisfaction or dissatisfaction with the results of Web searching, areas in which Web search engines can be improved and issues related to the need for students to be given additional training and support are identified. These include planning and organising information, recognising different dimensions of information intents and needs, emphasising the importance of variety in Web information seeking, promoting effective formulation of queries and ranking, reducing overload of information and assisting effective selection of Web sites and critical examination of results

    Towards effective cross-lingual search of user-generated internet speech

    Get PDF
    The very rapid growth in user-generated social spoken content on online platforms is creating new challenges for Spoken Content Retrieval (SCR) technologies. There are many potential choices for how to design a robust SCR framework for UGS content, but the current lack of detailed investigation means that there is a lack of understanding of the specifc challenges, and little or no guidance available to inform these choices. This thesis investigates the challenges of effective SCR for UGS content, and proposes novel SCR methods that are designed to cope with the challenges of UGS content. The work presented in this thesis can be divided into three areas of contribution as follows. The first contribution of this work is critiquing the issues and challenges that in influence the effectiveness of searching UGS content in both mono-lingual and cross-lingual settings. The second contribution is to develop an effective Query Expansion (QE) method for UGS. This research reports that, encountered in UGS content, the variation in the length, quality and structure of the relevant documents can harm the effectiveness of QE techniques across different queries. Seeking to address this issue, this work examines the utilisation of Query Performance Prediction (QPP) techniques for improving QE in UGS, and presents a novel framework specifically designed for predicting of the effectiveness of QE. Thirdly, this work extends the utilisation of QPP in UGS search to improve cross-lingual search for UGS by predicting the translation effectiveness. The thesis proposes novel methods to estimate the quality of translation for cross-lingual UGS search. An empirical evaluation that demonstrates the quality of the proposed method on alternative translation outputs extracted from several Machine Translation (MT) systems developed for this task. The research then shows how this framework can be integrated in cross-lingual UGS search to find relevant translations for improved retrieval performance

    Using Search Term Positions for Determining Document Relevance

    Get PDF
    The technological advancements in computer networks and the substantial reduction of their production costs have caused a massive explosion of digitally stored information. In particular, textual information is becoming increasingly available in electronic form. Finding text documents dealing with a certain topic is not a simple task. Users need tools to sift through non-relevant information and retrieve only pieces of information relevant to their needs. The traditional methods of information retrieval (IR) based on search term frequency have somehow reached their limitations, and novel ranking methods based on hyperlink information are not applicable to unlinked documents. The retrieval of documents based on the positions of search terms in a document has the potential of yielding improvements, because other terms in the environment where a search term appears (i.e. the neighborhood) are considered. That is to say, the grammatical type, position and frequency of other words help to clarify and specify the meaning of a given search term. However, the required additional analysis task makes position-based methods slower than methods based on term frequency and requires more storage to save the positions of terms. These drawbacks directly affect the performance of the most user critical phase of the retrieval process, namely query evaluation time, which explains the scarce use of positional information in contemporary retrieval systems. This thesis explores the possibility of extending traditional information retrieval systems with positional information in an efficient manner that permits us to optimize the retrieval performance by handling term positions at query evaluation time. To achieve this task, several abstract representation of term positions to efficiently store and operate on term positional data are investigated. In the Gauss model, descriptive statistics methods are used to estimate term positional information, because they minimize outliers and irregularities in the data. The Fourier model is based on Fourier series to represent positional information. In the Hilbert model, functional analysis methods are used to provide reliable term position estimations and simple mathematical operators to handle positional data. The proposed models are experimentally evaluated using standard resources of the IR research community (Text Retrieval Conference). All experiments demonstrate that the use of positional information can enhance the quality of search results. The suggested models outperform state-of-the-art retrieval utilities. The term position models open new possibilities to analyze and handle textual data. For instance, document clustering and compression of positional data based on these models could be interesting topics to be considered in future research
    corecore