447 research outputs found

    Weighted Heuristic Ensemble of Filters

    Get PDF
    Feature selection has become increasingly important in data mining in recent years due to the rapid increase in the dimensionality of big data. However, the reliability and consistency of feature selection methods (filters) vary considerably on different data and no single filter performs consistently well under various conditions. Therefore, feature selection ensemble has been investigated recently to provide more reliable and effective results than any individual one but all the existing feature selection ensemble treat the feature selection methods equally regardless of their performance. In this paper, we present a novel framework which applies weighted feature selection ensemble through proposing a systemic way of adding different weights to the feature selection methods-filters. Also, we investigate how to determine the appropriate weight for each filter in an ensemble. Experiments based on ten benchmark datasets show that theoretically and intuitively adding more weight to ‘good filters’ should lead to better results but in reality it is very uncertain. This assumption was found to be correct for some examples in our experiment. However, for other situations, filters which had been assumed to perform well showed bad performance leading to even worse results. Therefore adding weight to filters might not achieve much in accuracy terms, in addition to increasing complexity, time consumption and clearly decreasing the stability

    Metasearch: Data Fusion for Document Retrieval

    Get PDF
    The metasearch problem is to optimally merge the ranked lists output by an arbitrary number of search systems into one ranked list. In this work: (1) We show that metasearch improves upon not just the raw performance of the input search engines, but also upon the consistency of the input search engines from query to query. (2) We experimentally prove that simply weighting input systems by their average performance can dramatically improve fusion results. (3) We show that score normalization is an important component of a metasearch engine, and that dependence upon statistical outliers appears to be the problem with the standard technique. (4) We propose a Bayesian model for metasearch that outperforms the best input system on average and has performance competetive with standard techniques. (5) We introduce the use of Social Choice Theory to the metasearch problem, modeling metasearch as a democratic election. We adapt a positional voting algorithm, the Borda Count, to create a metasearch algorithm, acheiving reasonable performance. (6) We propose a metasearch model adapted from a majoritarian voting procedure, the Condorcet algorithm. The resulting algorithm is the best performing algorithm in a number of situations. (7) We propose three upper bounds for the problem, each bounding a different class of algorithms. We present experimental results for each algorithm using two types of experiments on each of four data sets

    THE ECONOMICS OF NONLINEAR PRICING: EVIDENCE FROM AIRFARES AND GROCERY PRICES

    Get PDF
    Quantity discounts, characterised by unit prices falling as the quantity purchased rises, are a proliferate phenomenon that finds root in the economics of packaging. This paper reviews the key economic foundations of nonlinear pricing, introduces new pricing data and conducts an empirical investigation into airfares and grocery prices, which are shown to exhibit quantity discounts of an identical order of magnitude. The constancy of the quantity discount across distinct markets hints at the existence of a common force underlying the determination of prices.

    Methods for Ordinal Peer Grading

    Full text link
    MOOCs have the potential to revolutionize higher education with their wide outreach and accessibility, but they require instructors to come up with scalable alternates to traditional student evaluation. Peer grading -- having students assess each other -- is a promising approach to tackling the problem of evaluation at scale, since the number of "graders" naturally scales with the number of students. However, students are not trained in grading, which means that one cannot expect the same level of grading skills as in traditional settings. Drawing on broad evidence that ordinal feedback is easier to provide and more reliable than cardinal feedback, it is therefore desirable to allow peer graders to make ordinal statements (e.g. "project X is better than project Y") and not require them to make cardinal statements (e.g. "project X is a B-"). Thus, in this paper we study the problem of automatically inferring student grades from ordinal peer feedback, as opposed to existing methods that require cardinal peer feedback. We formulate the ordinal peer grading problem as a type of rank aggregation problem, and explore several probabilistic models under which to estimate student grades and grader reliability. We study the applicability of these methods using peer grading data collected from a real class -- with instructor and TA grades as a baseline -- and demonstrate the efficacy of ordinal feedback techniques in comparison to existing cardinal peer grading methods. Finally, we compare these peer-grading techniques to traditional evaluation techniques.Comment: Submitted to KDD 201

    On-line Metasearch, Pooling, and System Evaluation

    Get PDF
    This thesis presents a unified method for simultaneous solution of three problems in Information Retrieval--- metasearch (the fusion of ranked lists returned by retrieval systems to elicit improved performance), efficient system evaluation (the accurate evaluation of retrieval systems with small numbers of relevance judgements), and pooling or ``active sample selection (the selection of documents for manual judgement in order to develop sample pools of high precision or pools suitable for assessing system quality). The thesis establishes a unified theoretical framework for addressing these three problems and naturally generalizes their solution to the on-line context by incorporating feedback in the form of relevance judgements. The algorithm--- Rankhedge for on-line retrieval, metasearch and system evaluation--- is the first to address these three problems simultaneously and also to generalize their solution to the on-line context. Optimality of the Rankhedge algorithm is developed via Bayesian and maximum entropy interpretations. Results of the algorithm prove to be significantly superior to previous methods when tested over a range of TREC (Text REtrieval Conference) data. In the absence of feedback, the technique equals or exceeds the performance of benchmark metasearch algorithms such as CombMNZ and Condorcet. The technique then dramatically improves on this performance during the on-line metasearch process. In addition, the technique generates pools of documents which include more relevant documents and produce more accurate system evaluations than previous techniques. The thesis includes an information-theoretic examination of the original Hedge algorithm as well as its adaptation to the context of ranked lists. The work also addresses the concept of information-theoretic similarity within the Rankhedge context and presents a method for decorrelating the predictor set to improve worst case performance. Finally, an information-theoretically optimal method for probabilistic ``active sampling is presented with possible application to a broad range of practical and theoretical contexts

    A usability study of online library systems: A case of Sultanah Bahiyah Library, Universiti Utara Malaysia

    Get PDF
    The purpose of this study was to investigate usability of online library systems in Universiti Utara Malaysia (UUM). This study evaluated the usability of Sultanah Bahiyah Library’s web based systems by investigating the aspects of simplicity, comfort, user friendliness, control, readability, information adequacy/task match, navigability, recognition, access time, relevancy, consistency and visual presentation. This study examined user’s views about the usability of digital libraries whereas current and perceived importance. A sample of 45 students of Master of Business Administration (MBA) has been chosen. The Sultanah Bahiyah Library’s web based systems is very important especially for students and academic staffs of Universiti Utara Malaysia. The usability of the Library’s web based systems makes students easy to connect and for that the website should be helpful and attractive within good contents. The result found that the parallel nature of the users’ current views about the usability of digital libraries and users’ perceived importance of digital library usability allows direct comparison of all usability properties. The overall results yielded significant difference for the variables of user’s current views and perceived importance

    Study of result presentation and interaction for aggregated search

    Get PDF
    The World Wide Web has always attracted researchers and commercial search engine companies due to the enormous amount of information available on it. "Searching" on web has become an integral part of today's world, and many people rely on it when looking for information. The amount and the diversity of information available on the Web has also increased dramatically. Due to which, the researchers and the search engine companies are making constant efforts in order to make this information accessible to the people effectively. Not only there is an increase in the amount and diversity of information available online, users are now often seeking information on broader topics. Users seeking information on broad topics, gather information from various information sources (e.g, image, video, news, blog, etc). For such information requests, not only web results but results from different document genre and multimedia contents are also becoming relevant. For instance, users' looking for information on "Glasgow" might be interested in web results about Glasgow, Map of Glasgow, Images of Glasgow, News of Glasgow, and so on. Aggregated search aims to provide access to this diverse information in a unified manner by aggregating results from different information sources on a single result page. Hence making information gathering process easier for broad topics. This thesis aims to explore the aggregated search from the users' perspective. The thesis first and foremost focuses on understanding and describing the phenomena related to the users' search process in the context of the aggregated search. The goal is to participate in building theories and in understanding constraints, as well as providing insights into the interface design space. In building this understanding, the thesis focuses on the click-behavior, information need, source relevance, dynamics of search intents. The understanding comes partly from conducting users studies and, from analyzing search engine log data. While the thematic (or topical) relevance of documents is important, this thesis argues that the "source type" (source-orientation) may also be an important dimension in the relevance space for investigating in aggregated search. Therefore, relevance is multi-dimensional (topical and source-orientated) within the context of aggregated search. Results from the study suggest that the effect of the source-orientation was a significant factor in an aggregated search scenario. Hence adds another dimension to the relevance space within the aggregated search scenario. The thesis further presents an effective method which combines rule base and machine learning techniques to identify source-orientation behind a user query. Furthermore, after analyzing log-data from a search engine company and conducting user study experiments, several design issues that may arise with respect to the aggregated search interface are identified. In order to address these issues, suitable design guidelines that can be beneficial from the interface perspective are also suggested. To conclude, aim of this thesis is to explore the emerging aggregated search from users' perspective, since it is a very important for front-end technologies. An additional goal is to provide empirical evidence for influence of aggregated search on users searching behavior, and identify some of the key challenges of aggregated search. During this work several aspects of aggregated search will be uncovered. Furthermore, this thesis will provide a foundations for future research in aggregated search and will highlight the potential research directions
    • …
    corecore