Search CORE

89 research outputs found

Evaluating Entity Relationship Recommenders in a Complex Information Retrieval Context

Author: Thomas Jack
Publication venue: 'University of Waterloo'
Publication date: 01/01/2014
Field of study

Information Retrieval, as a field, has long subscribed to an orthodox evaluation approach known as the Cranfield paradigm. This approach and the assumptions that underpin it have been essential to building the traditional search engine infrastructure that drives today’s modern information economy. In order to build the information economy of tomorrow, however, we must be prepared to reexamine these assumptions and create new, more sophisticated standards of evaluation to match the more complex information retrieval systems on the horizon. In this thesis, we begin this introspective process and launch our own evaluation method for one of these complex IR systems, entity-relationship recommenders. We will begin building a new user model adapted to the needs of a different user experience. To support these endeavors, we will also conduct a study with a mockup of our complex system to collect real behavior data and evaluation results. By the end of this work, we shall present a new evaluative approach for one kind of entity-relationship system and point the way for other advanced systems to come

University of Waterloo's Institutional Repository

Evaluation in audio music similarity

Author: Urbano Merino Julian
Publication venue
Publication date: 01/01/2013
Field of study

Audio Music Similarity is a task within Music Information Retrieval that deals with systems that retrieve songs musically similar to a query song according to their audio content. Evaluation experiments are the main scientific tool in Information Retrieval to determine what systems work better and advance the state of the art accordingly. It is therefore essential that the conclusions drawn from these experiments are both valid and reliable, and that we can reach them at a low cost. This dissertation studies these three aspects of evaluation experiments for the particular case of Audio Music Similarity, with the general goal of improving how these systems are evaluated. The traditional paradigm for Information Retrieval evaluation based on test collections is approached as an statistical estimator of certain probability distributions that characterize how users employ systems. In terms of validity, we study how well the measured system distributions correspond to the target user distributions, and how this correspondence affects the conclusions we draw from an experiment. In terms of reliability, we study the optimal characteristics of test collections and statistical procedures, and in terms of effi ciency we study models and methods to greatly reduce the cost of running an evaluation experiment

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Sustained growth in small enterprises: a process management approach

Author: Rose T. J.
Publication venue: Cranfield University
Publication date: 01/03/2003
Field of study

This thesis illustrates that given the necessary resource and a structured Business Growth Framework, Small and Medium Enterprises can lay the foundation for sustained growth. The author investigated the essence of Small and Medium Enterprises, conducted a literature review in SME growth, and asserted the importance of the application of structure to business processes in achieving sustainable business growth. The author introduced the SME business process structure deficit, assessed its implications on business growth, and elaborated that the business process structure deficit can be addressed through the methodical application of six internationally accepted UK initiatives already available in the SME domain. The thesis establishes the characteristics of Business Growth for SMEs, leading to the development of a Business Growth Framework, based upon a defined set of business processes. This framework supports business growth. The framework provides diagnostic assessment of business process performance, process specific improvements embracing better practice through the innovative application of, for example DTI publications, and internal Benchmarking linking, if desired, to the UK Benchmarking Index. The resulting Business Growth Framework, along with the Business Growth Framework Implementation Methodology have evolved during this research and are the key tools for sustained business growth developed by the author and discussed in this thesis. The benefits of close integration of financial and manufacturing systems, like ERP, with Business Processes is discussed. The author demonstrated that Business Growth could successfully occur amongst Small and Medium Enterprises if approached through a structured methodology. Intentionally no new and complex business models have been proposed. The research showed that there is sufficient literature available in this area already

Cranfield CERES

What is the influence of genre during the perception of structured text for retrieval and search?

Author: Clark Malcolm John
Publication venue
Publication date: 31/10/2014
Field of study

This thesis presents an investigation into the high value of structured text (or form) in the context of genre within Information Retrieval. In particular, how are these structured texts perceived and why are they not more heavily used within Information Retrieval & Search communities? The main motivation is to show the features in which people can exploit genre within Information Search & Retrieval, in particular, categorisation and search tasks. To do this, it was vital to record and analyse how and why this was done during typical tasks. The literature review highlighted two previous studies (Toms & Campbell 1999a; Watt 2009) which have reported pilot studies consisting of genre categorisation and information searching. Both studies and other findings within the literature review inspired the work contained within this thesis. Genre is notoriously hard to define, but a very useful framework of Purpose and Form, developed by Yates & Orlikowski (1992), was utilised to design two user studies for the research reported within the thesis. The two studies consisted of, first, a categorisation task (e-mails), and second, a set of six simulated situations in Wikipedia, both of which collected quantitative data from eye tracking experiments as well as qualitative user data. The results of both studies showed the extent to which the participants utilised the form features of the stimuli presented, in particular, how these were used, which ocular behaviours (skimming or scanning) and actual features were used, and which were the most important. The main contributions to research made by this thesis were, first of all, that the task-based user evaluations employing simulated search scenarios revealed how and why users make decisions while interacting with the textual features of structure and layout within a discourse community, and, secondly, an extensive evaluation of the quantitative data revealed the features that were used by the participants in the user studies and the effects of the interpretation of genre in the search and categorisation process as well as the perceptual processes used in the various communities. This will be of benefit for the re-development of information systems. As far as is known, this is the first detailed and systematic investigation into the types of features, value of form, perception of features, and layout of genre using eye tracking in online communities, such as Wikipedia

Open Access Institutional Repository at Robert Gordon University

Identifying the critical success factors to improve information security incident reporting

Author: Humphrey M
Publication venue
Publication date: 01/01/2017
Field of study

There is a perception amongst security professionals that the true scale of information security incidents is unknown due to under reporting. This potentially leads to an absence of sufficient empirical incident report data to enable informed risk assessment and risk management judgements. As a result, there is a real possibility that decisions related to resourcing and expenditure may be focussed only on what is believed to be occurring based on those incidents that are reported. There is also an apparent shortage of research into the subject of information security incident reporting. This research examines whether this assumption is valid and the potential reasons for such under reporting. It also examines the viability of re-using research into incident reporting conducted elsewhere, for example in the healthcare sector. Following a review of what security related incident reporting research existed together with incident reporting in general a scoping study, using a group of information security professionals from a range of business sectors, was undertaken. This identified a strong belief that security incidents were significantly under-reported and that research from other sectors did have the potential to be applied across sectors. A concept framework was developed upon which a proposal that incident reporting could be improved through the identification of Critical Success Factors (CSF’s). A Delphi study was conducted across two rounds to seek consensus from information security professionals on those CSF’s. The thesis confirms the concerns that there is under reporting and identifies through a Delphi study of information security professionals a set of CSF’s required to improve security incident reporting. An Incident Reporting Maturity Model was subsequently designed as a method for assisting organisations in judging their position against these factors and tested using the same Delphi participants as well as a control group. The thesis demonstrates a contribution to research through the rigorous testing of the applicability of incident reporting research from other sectors to support the identification of solutions to improve reporting in the information security sector. It also provides a practical novel approach to make use of a combination of CSF’s and an IRMM that allows organisations to judge where their level of maturity is set against each of the four CSF’s and make changes to strategy and process accordingly

Cranfield CERES

Filtering News from Document Streams: Evaluation Aspects and Modeled Stream Utility

Author: Baruah Gaurav Makhon
Publication venue: 'University of Waterloo'
Publication date: 02/08/2016
Field of study

Events like hurricanes, earthquakes, or accidents can impact a large number of people. Not only are people in the immediate vicinity of the event affected, but concerns about their well-being are shared by the local government and well-wishers across the world. The latest information about news events could be of use to government and aid agencies in order to make informed decisions on providing necessary support, security and relief. The general public avails of news updates via dedicated news feeds or broadcasts, and lately, via social media services like Facebook or Twitter. Retrieving the latest information about newsworthy events from the world-wide web is thus of importance to a large section of society. As new content on a multitude of topics is continuously being published on the web, specific event related information needs to be filtered from the resulting stream of documents. We present in this thesis, a user-centric evaluation measure for evaluating systems that filter news related information from document streams. Our proposed evaluation measure, Modeled Stream Utility (MSU), models users accessing information from a stream of sentences produced by a news update filtering system. The user model allows for simulating a large number of users with different characteristic stream browsing behavior. Through simulation, MSU estimates the utility of a system for an average user browsing a stream of sentences. Our results show that system performance is sensitive to a user population's stream browsing behavior and that existing evaluation metrics correspond to very specific types of user behavior. To evaluate systems that filter sentences from a document stream, we need a set of judged sentences. This judged set is a subset of all the sentences returned by all systems, and is typically constructed by pooling together the highest quality sentences, as determined by respective system assigned scores for each sentence. Sentences in the pool are manually assessed and the resulting set of judged sentences is then used to compute system performance metrics. In this thesis, we investigate the effect of including duplicates of judged sentences, into the judged set, on system performance evaluation. We also develop an alternative pooling methodology, that given the MSU user model, selects sentences for pooling based on the probability of a sentences being read by modeled users. Our research lays the foundation for interesting future work for utilizing user-models in different aspects of evaluation of stream filtering systems. The MSU measure enables incorporation of different user models. Furthermore, the applicability of MSU could be extended through calibration based on user behavior

University of Waterloo's Institutional Repository

Using flight data in Bayesian networks and other methods to quantify airline operational risks.

Author: Barry Simon
Publication venue: SATM
Publication date: 01/09/2019
Field of study

The risk assessment methods used in airline operations are usually qualitative rather than quantitative, despite the routine collection of vast amounts of safety data through programmes such as flight data monitoring (FDM). The overall objective of this research is to exploit airborne recorded flight data to provide enhanced operational safety knowledge and quantitative risk assessments. Runway veer-off at landing, accounting for over 10% of air transport incidents and accidents, is used as an example risk. Literature on FDM, risk assessment and veer-off accidents is reviewed, leading to the identification of three potential areas for further examination: variability in operational parameters as a measure of risk; measures of workload derived from flight data as a measure of risk; and Bayesian networks. Methods relating to variability and workload are briefly explored and preliminary results are presented, before the main methods of the thesis relating to Bayesian networks are introduced. The literature shows that Bayesian networks are a suitable method for quantifying risk and a causal network for lateral deviation at landing is developed based on accident investigation data. Flight data from over 300,000 flights is used to provide empirical probabilities for causal factors and data for some causal factors is modelled to estimate the probabilities of extreme events. As an alternative to predefining the Bayesian network structure from accident data, a series of networks are learnt from flight data and an assessment is made of the performance of different learning algorithms, such as Bayesian Search and Greedy Thick Thinning. Finally, a network with parameters and structure learnt from flight data is adapted to incorporate causal knowledge from accident data, and the performance of the resulting “combined” network is assessed. All three types of network were able to make use of flight data to calculate relative probabilities of a lateral deviation event, given different scenarios of causal factors present, and for different airports, however the “combined” approach is preferred due to the relative ease of running scenarios for different airports and the avoidance of the lengthy process of modelling data for causal factor nodes. The preferred method provides airlines with a practicable way to use their existing flight data to quantify operational risks. The resulting quantitative risk assessments could be used to provide pilots with enhanced pre-flight briefings and provide airlines with up-to-date risk information of operations to different airports, and enhanced safety oversight.PhD in Transport System

Cranfield CERES

Sustained growth in small enterprises : a process management approach

Author: Rose T J
Sackett P J
Publication venue
Publication date: 01/01/2003
Field of study

OpenGrey Repository

Scalability of findability: decentralized search and retrieval in large information networks

Author: Ke Weimao
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/08/2010
Field of study

Amid the rapid growth of information today is the increasing challenge for people to survive and navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web challenge information retrieval in these environments. Collection of information in advance and centralization of IR operations are hardly possible because systems are dynamic and information is distributed. While monolithic search systems continue to struggle with scalability problems of today, the future of search likely requires a decentralized architecture where many information systems can participate. As individual systems interconnect to form a global structure, finding relevant information in distributed environments transforms into a problem concerning not only information retrieval but also complex networks. Understanding network connectivity will provide guidance on how decentralized search and retrieval methods can function in these information spaces. The dissertation studies one aspect of scalability challenges facing classic information retrieval models and presents a decentralized, organic view of information systems pertaining to search in large scale networks. It focuses on the impact of network structure on search performance and investigates a phenomenon we refer to as the Clustering Paradox, in which the topology of interconnected systems imposes a scalability limit. Experiments involving large scale benchmark collections provide evidence on the Clustering Paradox in the IR context. In an increasingly large, distributed environment, decentralized searches for relevant information can continue to function well only when systems interconnect in certain ways. Relying on partial indexes of distributed systems, some level of network clustering enables very efficient and effective discovery of relevant information in large scale networks. Increasing or reducing network clustering degrades search performances. Given this specific level of network clustering, search time is well explained by a poly-logarithmic relation to network size, indicating a high scalability potential for searching in a continuously growing information space

Carolina Digital Repository