339 research outputs found

    Probabilistic models of information retrieval based on measuring the divergence from randomness

    Get PDF
    We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model

    Washington meets Wall Street: a closer examination of the presidential cycle puzzle

    Get PDF
    We show that average excess returns during the last two years of the presidential cycle are significantly higher than during the first two years: 9.8 percent over the period 1948 – 2008. This pattern in returns cannot be explained by business-cycle variables capturing time-varying risk premia, differences in risk levels, or by consumer and investor sentiment. In this paper, we formally test the presidential election cycle (PEC) hypothesis as the alternative explanation found in the literature for explaining the presidential cycle anomaly. PEC states that incumbent parties and presidents have an incentive to manipulate the economy (via budget expansions and taxes) to remain in power. We formulate eight empirically testable propositions relating to the fiscal, monetary, tax, unexpected inflation and political implications of the PEC hypothesis. We do not find statistically significant evidence confirming the PEC hypothesis as a plausible explanation for the presidential cycle effect. The existence of the presidential cycle effect in U.S. financial markets thus remains a puzzle that cannot be easily explained by politicians employing their economic influence to remain in power. JEL Classification: E32; G14; P16 Keywords: Political Economy, Market Efficiency, Anomalies, Calendar Effect

    Feature subset selection in text-learning

    Full text link

    ICMR 2014: 4th ACM International Conference on Multimedia Retrieval

    Get PDF
    ICMR was initially started as a workshop on challenges in image retrieval (in Newcastle in 1998 ) and later transformed into the Conference on Image and Video Retrieval (CIVR) series. In 2011 the CIVR and the ACM Workshop on Multimedia Information Retrieval were combined into a single conference that now forms the ICMR series. The 4th ACM International Conference on Multimedia Retrieval took place in Glasgow, Scotland, from 1 – 4 April 2014. This was the largest edition of ICMR to date with approximately 170 attendees from 25 different countries. ICMR is one of the premier scientific conference for multimedia retrieval held worldwide, with the stated mission “to illuminate the state of the art in multimedia retrieval by bringing together researchers and practitioners in the field of multimedia retrieval .” According to the Chinese Computing Federation Conference Ranking (2013), ACM ICMR is the number one multimedia retrieval conference worldwide and the number four conference in the category of multimedia and graphics. Although ICMR is about multimedia retrieval, in a wider sense, it is also about automated multimedia understanding. Much of the work in that area involves the analysis of media on a pixel, voxel, and wavelet level, but it also involves innovative retrieval, visualisation and interaction paradigms utilising the nature of the multimedia — be it video, images, speech, or more abstract (sensor) data. The conference aims to promote intellectual exchanges and interactions among scientists, engineers, students, and multimedia researchers in academia as well as industry through various events, including a keynote talk, oral, special and poster sessions focused on re search challenges and solutions, technical and industrial demonstrations of prototypes, tutorials, research, and an industrial panel. In the remainder of this report we will summarise the events that took place at the 4th ACM ICMR conference

    Evolving text classification rules with genetic programming

    Get PDF
    We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

    Neighbor removal increases forager longevity, slows progression through temporal castes (Pogonomyrmex badius)

    Get PDF
    Sustainability standards like Fair Trade (FT) or Utz certified (Utz) are widely regarded as a promising way of improving smallholder coffee farmer welfare. As yet, the impact of certification remains poorly understood. This chapter presents the findings of the study regarding the impact of FT and Utz in Kenya. The study was carried out in the Kiambu and Nyeri districts of Kenya (Figure 3.1). The study is based on two waves of data collection carried out in 2009 and 2013 with farmers belonging to six cooperative societies: Ndumberi, Tekangu, Kiambaa, Mikari, Rugi and Kiama. This chapter aims to answer the following central research question: What is the impact of FT/Utz involvement at producer and producer organisation level in Kenya

    From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions

    Get PDF
    Recent technological breakthroughs allow the quantification of hundreds of thousands of genetic interactions (GIs) in Saccharomyces cerevisiae. The interpretation of these data is often difficult, but it can be improved by the joint analysis of GIs along with complementary data types. Here, we describe a novel methodology that integrates genetic and physical interaction data. We use our method to identify a collection of functional modules related to chromosomal biology and to investigate the relations among them. We show how the resulting map of modules provides clues for the elucidation of function both at the level of individual genes and at the level of functional modules

    Semantic distillation: a method for clustering objects by their contextual specificity

    Full text link
    Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence, Springer-Verla

    Evaluating implicit feedback models using searcher simulations

    Get PDF
    In this article we describe an evaluation of relevance feedback (RF) algorithms using searcher simulations. Since these algorithms select additional terms for query modification based on inferences made from searcher interaction, not on relevance information searchers explicitly provide (as in traditional RF), we refer to them as implicit feedback models. We introduce six different models that base their decisions on the interactions of searchers and use different approaches to rank query modification terms. The aim of this article is to determine which of these models should be used to assist searchers in the systems we develop. To evaluate these models we used searcher simulations that afforded us more control over the experimental conditions than experiments with human subjects and allowed complex interaction to be modeled without the need for costly human experimentation. The simulation-based evaluation methodology measures how well the models learn the distribution of terms across relevant documents (i.e., learn what information is relevant) and how well they improve search effectiveness (i.e., create effective search queries). Our findings show that an implicit feedback model based on Jeffrey's rule of conditioning outperformed other models under investigation

    Learning Pretopological Spaces for Lexical Taxonomy Acquisition

    Full text link
    International audienceIn this paper, we propose a new methodology for semi-supervised acquisition of lexical taxonomies from a list of existing terms. Our approach is based on the theory of pretopology that offers a powerful formalism to model semantic relations and transform a list of terms into a structured term space by combining different discriminant criteria. In order to learn a parameterized pretopological space, we define the Learning Pretopological Spaces strategy based on genetic algorithms. The rare but accurate pieces of knowledge given by an expert (semi-supervision) or automatically extracted with existing linguistic patterns (auto-supervision) are used to parameterize the different features defining the pretopological term space. Then, a structuring algorithm is used to transform the pretopological space into a lexical taxonomy, i.e. a direct acyclic graph. Results over three standard datasets (two from WordNet and one from UMLS) evidence improved performances against existing associative and pattern-based state-of-the-art approaches
    corecore