6 research outputs found
Recommended from our members
An experimental comparison of a genetic algorithm and a hill-climber for term selection
Purpose – The term selection problem for selecting query terms in information filtering and routing has been investigated using hill-climbers of various kinds, largely through the Okapi experiments in the TREC series of conferences. Although these are simple deterministic approaches which examine the effect of changing the weight of one term at a time, they have been shown to improve the retrieval effectiveness of filtering queries in these TREC experiments. Hill-climbers are, however, likely to get trapped in local optima, and the use of more sophisticated local search techniques for this problem that attempt to break out of these optima are worth investigating. To this end, we apply a genetic algorithm (GA) to the same problem.
Design/Methodology/Approach – We use a standard TREC test collection from the TREC-8 filtering track, recording mean average precision and recall measures to allow comparison between the hillclimber and GA algorithms. We also vary elements of the GA, such as probability of a word being included, probability of mutation and population size in order to measure the effect of these variables. Different strategies such as Elitist and Non-Elitist methods are used, as well as Roulette Wheel and Rank selection GA algorithms.
Findings – The results of tests suggest that both techniques are, on average, better than the baseline, but the implemented GA does not match the overall performance of a hill-climber. The Rank selection algorithm does better on average than the Roulette Wheel algorithm. There is no evidence in this study that varying word inclusion probability, mutation probability or Elitist method make much difference to the overall results. Small population sizes do not appear to be as effective as larger population sizes.
Research limitations/implications – The evidence provided here would suggest that being stuck in a local optima for the term selection optimization problem does not appear to be detrimental to the overall success of the hill-climber. The evidence from term rank order would appear to provide extra useful evidence which hill-climbers can use efficiently and effectively to narrow the search space.
Originality/Value – The paper represents the first attempt to compare hill-climbers with GAs on a problem of this type
Recommended from our members
Ontology Based Query Expansion with a Probabilistic Retrieval Model
This paper examines the use of ontologies for defining query context. The information retrieval system used is based on the probabilistic retrieval model. We extend the use of relevance feedback (RFB) and pseudo-relevance feedback (PF) query expansion techniques using information from a news domain ontology. The aim is to assess the impact of the ontology on the query expansion results with respect to recall and precision. We also tested the results for varying the relevance feedback parameters (number of terms or number of documents). The factors which influence the success of ontology based query expansion are outlined. Our findings show that ontology based query expansion has had mixed success. The use of the ontology has vastly increased the number of relevant documents retrieved, however, we conclude that for both types of query expansion, the PF results are better than the RFB results
Smart information retrieval: domain knowledge centric optimization approach
In the age of Internet of Things (IoT), online data has witnessed significant growth in terms of volume and diversity, and research into information retrieval has become one of the important research themes in the Internet oriented data science research. In information retrieval, machine-learning techniques have been widely adopted to automate the challenging process of relation extraction from text data, which is critical to the accuracy and efficiency of information retrieval-based applications including recommender systems and sentiment analysis. In this context, this paper introduces a novel, domain knowledge centric methodology aimed at improving the accuracy of using machine-learning methods for relation classification, and then utilise Genetic Algorithms (GAs) to optimise the feature selection for the learning algorithms. The proposed methodology makes significant contribution to the processes of domain knowledge-based relation extraction including interrogating Linked Open Datasets to generate the relation classification training-data, addressing the imbalanced classification in the training datasets, determining the probability threshold of the best learning algorithm, and establishing the optimum parameters for the genetic algorithm utilised in feature selection. The experimental evaluation of the proposed methodology reveals that the adopted machine-learning algorithms exhibit higher precision and recall in relation extraction in the reduced feature space optimised by the implementation. The considered machine learning includes Support Vector Machine, Perceptron Algorithm Uneven Margin and K-Nearest Neighbours. The outcome is verified by comparing against the Random Mutation Hill-Climbing optimisation algorithm using Wilcoxon signed-rank statistical analysis
Optimisation of hedging-integrated rule curves for reservoir operation
Reservoir managers use operational rule curves as guides for managing and operating reservoir systems. However, this approach saves no water for impending droughts, resulting in large shortages during such droughts. This problem can be tempered by integrating hedging with the rule curves to curtail the water releases during normal periods of operation and use the saved water to limit the amount and impact of water shortages during droughts. However, determining the timing and amount of hedging is a challenge.
This thesis presents the application of genetic algorithms (GA) for the optimisation of hedging-integrated reservoir rule curves. However, due to the challenge of establishing the boundary of feasible region in standard GA (SGA), a new development of the GA i.e. the dynamic GA (DGA), is proposed. Both the new development and its hedging policies were tested through extensive simulations of the Ubonratana reservoir (Thailand). The first observation was that the new DGA was faster and more efficient than the SGA in arriving at an optimal solution. Additionally, the derived hedging policies produced significant changes in reservoir performance when compared to no-hedging policies. The performance indices analysed were reliability (time and volume), resilience, vulnerability and sustainability; the results showed that the vulnerability (i.e. average single periods shortage) in particular was significantly reduced with the optimised hedging rules as compared to using the no-hedging rule curves.
This study also developed a monthly inflow forecasting model using artificial neural networks (ANN) to aid reservoir operational decision-making. Extensive testing of the model showed that it was able to provide inflow forecasts with reasonable accuracy. The simulated effect on reservoir performance of forecasted inflows vis-Ã -vis other assumed reservoir inflow knowledge situations showed that the ANN forecasts were superior, further reinforcing the importance of good inflow information for reservoir operation.
The ability of hedging to harness the inherent buffering capacity of existing water resources systems for tempering water shortage (or vulnerability) without the need for expensive new-builds is a major outcome of this study. Although applied to Ubonratana, the study has utility for other regions of the world, where e.g. climate and other environmental changes are stressing the water availability situation
Recommended from our members
Investigating ontology based query expansion using a probabilistic retrieval model
This research briefly outlines the problems of traditional information retrieval systems and discusses the different approaches to inferring context in document retrieval. By context we mean word disambiguation which is achieved by exploring the generalisation-specialisation hierarchies within a given ontology. Specifically, we examine the use of ontology based query expansion for defining query context. Query expansion can be done in many ways and in this work we consider the use of relevance feedback and pseudo-relevance feedback for query expansion. We examine relevance feedback and pseudo-relevance to ascertain the existence of performance differences between relevance feedback and pseudo-relevance feedback. The information retrieval system used is based on the probabilistic retrieval model and the query expansion method is extended using information from a news domain ontology. The aim of this project is to assess the impact of the use of the ontology on the query expansion results. Our results show that ontology based query expansion has resulted in a higher number of relevant documents being retrieved compared to the standard relevance feedback process. Overall, ontology based query expansion improves recall but does not produce any significant improvements for the precision results. Pseudo-relevance feedback has achieved better results than relevance feedback. We also found that reducing or increasing the relevance feedback parameters (number of terms or number of documents) does not correlate with the results. When comparing the effect of varying the number of terms parameter with the number of documents parameter, the former benefits the pseudo-relevance feedback results but the latter has an additional effect on the relevance feedback results. There are many factors which influence the success of ontology based query expansion. The thesis discusses these factors and gives some guidelines on using ontologies for the purpose of query expansion
Recommended from our members
A knowledge-based framework for information extraction and exploration
Harnessing insights from the colossal amount of online information requires the computerised processing of unstructured text in order to satisfy the information need of particular applications such as recommender systems and sentiment analysis. The increasing availability of online documents that describe domain-specific information provides an opportunity in employing a knowledge-based approach in extracting information from Web data.
In this thesis, a novel comprehensive knowledge-based framework is proposed to construct and exploit a domain-specific semantic knowledgebase. The proposed framework introduces a methodology for linking several components of different techniques and tools. It focuses on providing reusable and configurable data and application templates, which allow developers to apply it in diversity of domains. The objectives of this framework are: extracting information from unstructured data, constructing a semantic knowledgebase from the extracted information, enriching the resultant semantic knowledgebase by sourcing appropriate semi-structured and structured datasets, and consuming the resultant semantic knowledgebase to facilitate the intelligent exploration and search of information. For the purpose of investigating the challenges of extracting and modelling information in a specific domain, the financial domain was employed as a use-case in the context of a stock investment motivating scenario.
The developed knowledge-based approach exploits the semantic and syntactic characteristics of the problem domain knowledge in implementing a hybrid approach of Rule-based and Machine Learning based relation classification. The rule-based approach is adopted in the Natural Language Processing tasks associated with linguistic and structural features, Named Entity Recognition, instances labelling and feature generation processes. The results of these tasks are used to classify the relations between the named entities by employing the Machine Learning based relation classification. In addition, the domain knowledge is analysed to benefit knowledge modelling by translating the domain key concepts into a formal ontology. This ontology is employed in constructing semantic knowledgebase from unstructured online data of a specific domain, enriching the resulting semantic knowledgebase by sourcing semi-structured and structured online data sources and applying advanced classifications and inference technologies to infer new and interesting facts to improve the decision-making and intelligent exploration activities. However, most relations are non-binary in the problem domain knowledge because of its specific characteristic hence an appropriate N-ary relation patterns technique were adopted and investigated.
A serious of a novel experiments were conducted to implement and configure a Machine Learning based relation classification. The experimental evaluation evidenced that the developed knowledge-assisted ML relation classification model, which was further boosted by our implementation of GAs to reduce the feature space, has resulted in significant improvement in the process of relation extraction. The experimental results also indicate that amongst the implemented ML algorithms, SVM exhibited the best relation classification accuracy in the majority of the training datasets, while retaining acceptable levels of accuracy in the rest in the remaining training datasets.
Web Ontology Language (OWL) reasoning and rule-based reasoning on the resultant semantic knowledgebase were applied to derive stock investment specific recommendations. In addition, SPARQL query language was employed to explore the semantic knowledgebase. Moreover, taking into consideration the problem domain's requirements for modelling non-binary relations, a relation-as-class N-ary relations pattern was implemented, and the reasoning axioms and query language were adjusted to fit the intermediate resources in the N-ary relations requirements.
In this thesis also the experience on addressing the challenges of implementing the proposed knowledge-based framework for constructing and exploiting a semantic knowledgebase were summarised. These challenges can be considered by domain experts and knowledge engineers as a novel methodology for employing the Semantic Web Technologies for the knowledge user to intelligently exploit knowledge in similar problem domains.
The evaluation of knowledge accessibility by utilising Semantic Web Technologies in the developed application includes the ability of data retrieval to obtain either the entire or some portion of the data from the semantic knowledgebase for a particular use-case scenario. Investigating the tasks of reasoning, accessing and querying the semantic knowledgebase evidences that Semantic Web Technologies can perform an accurate and complex knowledge representation to share Knowledge from a diversity of data sources and, improve the decision‑making process and the intelligent exploration of the semantic knowledgebase