Search CORE

42 research outputs found

COSPO/CENDI Industry Day Conference

Author
Publication venue
Publication date
Field of study

The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless

NASA Technical Reports Server

Managing tail latency in large scale information retrieval systems

Author: Mackenzie J
Publication venue: RMIT University
Publication date
Field of study

As both the availability of internet access and the prominence of smart devices continue to increase, data is being generated at a rate faster than ever before. This massive increase in data production comes with many challenges, including efficiency concerns for the storage and retrieval of such large-scale data. However, users have grown to expect the sub-second response times that are common in most modern search engines, creating a problem - how can such large amounts of data continue to be served efficiently enough to satisfy end users? This dissertation investigates several issues regarding tail latency in large-scale information retrieval systems. Tail latency corresponds to the high percentile latency that is observed from a system - in the case of search, this latency typically corresponds to how long it takes for a query to be processed. In particular, keeping tail latency as low as possible translates to a good experience for all users, as tail latency is directly related to the worst-case latency and hence, the worst possible user experience. The key idea in targeting tail latency is to move from questions such as &quot;what is the median latency of our search engine?&quot; to questions which more accurately capture user experience such as &quot;how many queries take more than 200ms to return answers?&quot; or &quot;what is the worst case latency that a user may be subject to, and how often might it occur?&quot; While various strategies exist for efficiently processing queries over large textual corpora, prior research has focused almost entirely on improvements to the average processing time or cost of search systems. As a first contribution, we examine some state-of-the-art retrieval algorithms for two popular index organizations, and discuss the trade-offs between them, paying special attention to the notion of tail latency. This research uncovers a number of observations that are subsequently leveraged for improved search efficiency and effectiveness. We then propose and solve a new problem, which involves processing a number of related queries together, known as multi-queries, to yield higher quality search results. We experiment with a number of algorithmic approaches to efficiently process these multi-queries, and report on the cost, efficiency, and effectiveness trade-offs present with each. Ultimately, we find that some solutions yield a low tail latency, and are hence suitable for use in real-time search environments. Finally, we examine how predictive models can be used to improve the tail latency and end-to-end cost of a commonly used multi-stage retrieval architecture without impacting result effectiveness. By combining ideas from numerous areas of information retrieval, we propose a prediction framework which can be used for training and evaluating several efficiency/effectiveness trade-off parameters, resulting in improved trade-offs between cost, result quality, and tail latency

RMIT Research Repository

Efficient and effective retrieval using Higher-Order proximity models

Author: Lu X
Publication venue: RMIT University
Publication date
Field of study

Information Retrieval systems are widely used to retrieve documents that are relevant to a user's information need. Systems leveraging proximity heuristics to estimate the relevance of a document have shown to be effective. However, the computational cost of proximity-based models is rarely considered, which is an important concern over large-scale document collections. The large-scale collections also make collection-based evaluation challenging since only a small number of documents are judged given the limited budget. Effectiveness, efficiency and reliable evaluation are coherent components that should be considered when developing a good retrieval system.This thesis makes several contributions from the three aspects. Many proximity-based retrieval models are effective, but it is also important to find efficient solutions to extract proximity features, especially for models using higher-order proximity statistics. We therefore propose a one-pass algorithm based on the PlaneSweep approach. We demonstrate that the new one-pass algorithm reduces the cost of capturing a full dependency relation of a query, regardless of the input representations. Although our proposed methods can capture higher-ordered proximity features efficiently, the trade-offs between effectiveness and efficiency when using proximity-based models remains largely unexplored. We consider different variants of proximity statistics and demonstrate that using local proximity statistics can achieve an improved trade-off between effectiveness and efficiency. Another important aspect in IR is reliable system comparisons. We conduct a series of experiments that explore the interaction between pooling and evaluation depth, interactions between evaluation metrics and evaluation depth and also correlations between two different evaluation metrics. We show that different evaluation configurations on large test collections, where only a limited number of relevance labels are available, can lead to different system comparison conclusions. We also demonstrate the pitfalls of choosing an arbitrary evaluation depth regardless of the metrics employed and the pooling depth of the test collections. Lastly, we provide suggestions on the evaluation configurations for the reliable comparisons of retrieval systems on large test collections. On these large test collections, a shallow judgment pool may be employed as assumed budgets are often limited, which may lead to an imprecise evaluation of system performance, especially when a deep evaluation metric is used. We propose an estimation framework for estimating deep metric score on shallow judgment pools. With an initial shallow judgment pool, rank-level estimators are designed to estimate the effectiveness gain at each ranking. Based on the rank-level estimations, we propose an optimization framework to obtain a more precise score estimate

RMIT Research Repository

Recommended from our members

Bursting the broadband bubble

Author: Enabulele Elizabeth Abimbola
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2008
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Broadband has revolutionised the way the Internet is used and has become the critical enabling infrastructure of our modem and knowledge-based economy. Its widespread introduction has not only greatly enhanced the speed at which information online can be accessed, but also the range and sophistication of the content available. It is still penetrating the telecommunication market and is seen by some as the most significant evolutionary step since the emergence of the Internet. However in the rush to achieve market share, there is a risk that insufficient attention may be paid to quality issues, the central theme of this research. The research addresses the issues of broadband quality with a stated objective of assessing broadband quality by means of an integrated framework that encompasses factors beyond strict technical characteristics of broadband networks. Indeed, the concept of quality is a multi-facetted one, for which various perspectives can be distinguished. In this work, broadband quality as perceived by users, ISP and Government in the United Kingdom (UK) is looked at and a survey report is given and analysed. The aim of this doctoral research was to provide much needed empirical broadband quality framework that would guide the service provider as well as the UK government in the provision of quality broadband to its consumers. It will also stand as a benchmark to countries wanting to provide quality broadband to its citizens. A survey research approach was employed to achieve the overall aim and objective of this research. This was conducted using the response of 133 participants located in various boroughs in the UK. The results of the survey show that quality, though desired by many, has been short-changed by the desire to have access to the Internet via broadband at the lowest cost possible. However, this has not encouraged some consumers to switch to broadband from dial-up service despite continuous low prices being offered by service providers. Furthermore, the results also indicated that focusing on broadband quality will improve and promote investment in broadband capacity and decrease the uncertainty in consumer demand for applications such as multi-media content delivery, enhanced electronic commerce and telecommuting that exploit broadband access

Brunel University Research Archive

Satellite Networks: Architectures, Applications, and Technologies

Author: Bhasin Kul
Publication venue
Publication date
Field of study

Since global satellite networks are moving to the forefront in enhancing the national and global information infrastructures due to communication satellites' unique networking characteristics, a workshop was organized to assess the progress made to date and chart the future. This workshop provided the forum to assess the current state-of-the-art, identify key issues, and highlight the emerging trends in the next-generation architectures, data protocol development, communication interoperability, and applications. Presentations on overview, state-of-the-art in research, development, deployment and applications and future trends on satellite networks are assembled

NASA Technical Reports Server

Inter-relaão das técnicas Term Extration e Query Expansion aplicadas na recuperação de documentos textuais

Author: Bettio Raphael Winckler de
Publication venue: Florianópolis, SC
Publication date: 01/01/2007
Field of study

Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-graduação em Engenharia e Gestão do ConhecimentoConforme Sighal (2006) as pessoas reconhecem a importância do armazenamento e busca da informação e, com o advento dos computadores, tornou-se possível o armazenamento de grandes quantidades dela em bases de dados. Em conseqüência, catalogar a informação destas bases tornou-se imprescindível. Nesse contexto, o campo da Recuperação da Informação, surgiu na década de 50, com a finalidade de promover a construção de ferramentas computacionais que permitissem aos usuários utilizar de maneira mais eficiente essas bases de dados. O principal objetivo da presente pesquisa é desenvolver um Modelo Computacional que possibilite a recuperação de documentos textuais ordenados pela similaridade semântica, baseado na intersecção das técnicas de Term Extration e Query Expansion

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Personalisierungskonzepte für Digitale Bibliotheken

Author: Zeitz Andre
Publication venue
Publication date: 14/11/2001
Field of study

Ein entscheidender Nachteil heutiger Digitaler Bibliotheken Lösungen ist ihr unpersönlicher Charakter. Nutzer einer Digitalen Bibliothek möchten sich aber beispielsweise ihre eigene Sicht auf den Datenbestand definieren, Dokumente annotieren oder informiert werden, wenn sich Änderungen an Datenbeständen ergeben, die für sie von Interesse sind. In dieser Diplomarbeit werden darum Konzepte zur Personalisierung Digitaler Bibliotheken vorgestellt und in Form von Diensten in die Architektur einer bestehenden Digitalen Bibliothek eingefügt. Die Implementierung einiger dieser Dienste schließt die Arbeit ab

Universität Rostock, Lehrstuhl Datenbank- und Informationssysteme: Dbis Repository

Efficient Source Selection For SPARQL Endpoint Query Federation

Author: Saleem Muhammad
Publication venue
Publication date: 13/05/2016
Field of study

The Web of Data has grown enormously over the last years. Currently, it comprises a large compendium of linked and distributed datasets from multiple domains. Due to the decentralised architecture of the Web of Data, several of these datasets contain complementary data. Running complex queries on this compendium thus often requires accessing data from different data sources within one query. The abundance of datasets and the need for running complex query has thus motivated a considerable body of work on SPARQL query federation systems, the dedicated means to access data distributed over the Web of Data. This thesis addresses two key areas of federated SPARQL query processing: (1) efficient source selection, and (2) comprehensive SPARQL benchmarks to test and ranked federated SPARQL engines as well as triple stores. Efficient Source Selection: Efficient source selection is one of the most important optimization steps in federated SPARQL query processing. An overestimation of query relevant data sources increases the network traffic, result in irrelevant intermediate results, and can significantly affect the overall query processing time. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising source selection approaches beyond triple pattern-wise source selection has not received much attention. Similarly, only little attention has been paid to the effect of duplicated data on federated querying. This thesis presents HiBISCuS and TBSS, novel hypergraph-based source selection approaches, and DAW, a duplicate-aware source selection approach to federated querying over the Web of Data. Each of these approaches can be combined directly with existing SPARQL query federation engines to achieve the same recall while querying fewer data sources. We combined the three (HiBISCuS, DAW, and TBSS) source selections approaches with query rewriting to form a complete SPARQL query federation engine named Quetsal. Furthermore, we present TopFed, a Cancer Genome Atlas (TCGA) tailored federated query processing engine that exploits the data distribution to perform intelligent source selection while querying over large TCGA SPARQL endpoints. Finally, we address the issue of rights managements and privacy while accessing sensitive resources. To this end, we present SAFE: a global source selection approach that enables decentralised, policy-aware access to sensitive clinical information represented as distributed RDF Data Cubes. Comprehensive SPARQL Benchmarks: Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate federated SPARQL engines and triple stores, they mostly provide a one-fits-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable to evaluate the performance of a triple store for a given application with particular requirements. The fitness of current SPARQL query federation approaches for real applications is difficult to evaluate with current benchmarks as current benchmarks are either synthetic or too small in size and complexity. Furthermore, state-of-the-art federated SPARQL benchmarks mostly focused on a single performance criterion, i.e., the overall query runtime. Thus, they cannot provide a fine-grained evaluation of the systems. We address these drawbacks by presenting FEASIBLE, an automatic approach for the generation of benchmarks out of the query history of applications, i.e., query logs and LargeRDFBench, a billion-triple benchmark for SPARQL query federation which encompasses real data as well as real queries pertaining to real bio-medical use cases. Our evaluation results show that HiBISCuS, TBSS, TopFed, DAW, and SAFE all can significantly reduce the total number of sources selected and thus improve the overall query performance. In particular, TBSS is the first source selection approach to remain under 5% overall relevant sources overestimation. Quetsal has reduced the number of sources selected (without losing recall), the source selection time as well as the overall query runtime as compared to state-of-the-art federation engines. The LargeRDFBench evaluation results suggests that the performance of current SPARQL query federation systems on simple queries does not reflect the systems\\\'' performance on more complex queries. Moreover, current federation systems seem unable to deal with many of the challenges that await them in the age of Big Data. Finally, the FEASIBLE\\\''s evaluation results shows that it generates better sample queries than the state-of-the-art. In addition, the better query selection and the larger set of query types used lead to triple store rankings which partly differ from the rankings generated by previous works

Qucosa - Publikationsserver der Universität Leipzig

Understanding the determinants of evaluation, adoption and routinisation of ERP technology (Enterprise Resource Planning) in the context of agricultural farms

Author: Haberli Junior Caetano
Publication venue
Publication date: 02/07/2019
Field of study

A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Information and Decision SystemsThe purpose of this thesis is to investigate the determinants of the adoption of ERP (Enterprise Resource Planning) technology in agricultural farms in Brazil. The data were collected in 502 personal interviews with farmers of soy, corn, cotton, coffee, beans, wheat, peanuts, fruits, sugarcane and cattle raising, The data gathering instrument used for the quantitative research was built based on the result of the qualitative study in combination with three theories: Diffusion of Innovation Theory (DOI), Technology-Organization-Environment Framework (TOE), and Interorganizational Relations (IORs). Structural Equations (SEM) methodology was used to analyze the data and hypothesis. The results indicate the significant drivers for Evaluation, Adoption, and Routinisation. Also, we analyzed the ERP impact on farm performance based on resource-based view (RBV). We hope this work can bring a theoretical and practical contribution for the agribusiness field and also increase debates about the platforms on cloud computer based on ERP, Enterprise 2,0 and Industry 4.0. The results this thesis provide information to agribusiness owners, managers and administrators to promote and incentivize the use of ERP

Repositório da Universidade Nova de Lisboa

Recommended from our members

Approaches to Using in Information Word Collocation Retrieval

Author: Vechtomova O.
Publication venue
Publication date
Field of study

The thesis explores long-span collocation and its application in information retrieval. The basic research question of the thesis is whether the use of long-span collocates can improve performance of a probabilistic model of IR. The model used in the project is the Robertson & Sparck Jones probabilistic model. The basic research question was explored by investigating three different ways of integrating collocation information with the probabilistic model: 1. Global collocation analysis. The method consists in expanding the original query with long-span global collocates of query terms. Global collocates of a query term are selected from large fixed-size windows around all occurrences of a term in the corpus and ranked by statistical measures of Mutual Information (MI) and Z score. A fixed number of top-ranked collocates is used in query expansion. Query expansion with global collocates did not show to be superior to the original queries, the possible reason being the fact that query terms often have a fairly broad meaning and, hence, a rather semantically heterogeneous pattern of occurrence. 2. Local collocation analysis. This method is a form of iterative query expansion following relevance or pseudo-relevance (blind) feedback. The original query is expanded with the query terms’ collocates which are extracted from the long-span windows around all occurrences of query terms in the known relevant documents, and selected using statistical measures of MI and Z. Some parameters whose effect was systematically studied in this experiment set are: window size, measure of collocation significance for collocate ranking, number of query expansion collocates and categories of terms in the expanded queries. Some results showed a tendency towards performance gain over relevance feedback in the probabilistic model, however it was not significant enough to conclude that this method is superior to the existing relevance feedback used in the model. 3. Lexical cohesion analysis using local collocations. This experiment set aimed to explore whether the level of lexical cohesion between query terms in a document can be linked to the document’s relevance property, and if so, whether it can be used to predict documents’ relevance to the query. Lexical cohesion between different query terms is estimated from the number of collocates they have in common. The experiments proved that there exists a statistically significant association between the level of lexical cohesion of the query terms in documents and relevance. Another set of experiments, aimed at using lexical cohesion to improve probabilistic document ranking, showed that sets re-ranked by their lexical cohesion scores have similar performance as the original ranking

City Research Online