1,117 research outputs found

    Info Navigator: A visualization tool for document searching and browsing

    Get PDF
    In this paper we investigate the retrieval performance of monophonic and polyphonic queries made on a polyphonic music database. We extend the n-gram approach for full-music indexing of monophonic music data to polyphonic music using both rhythm and pitch information. We define an experimental framework for a comparative and fault-tolerance study of various n-gramming strategies and encoding levels. For monophonic queries, we focus in particular on query-by-humming systems, and for polyphonic queries on query-by-example. Error models addressed in several studies are surveyed for the fault-tolerance study. Our experiments show that different n-gramming strategies and encoding precision differ widely in their effectiveness. We present the results of our study on a collection of 6366 polyphonic MIDI-encoded music pieces

    Relevance thresholds in system evaluations

    Get PDF
    We introduce and explore the concept of an individual's relevance threshold as a way of reconciling differences in outcomes between batch and user experiments

    Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization

    Get PDF
    In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures

    ArborZ: Photometric Redshifts Using Boosted Decision Trees

    Full text link
    Precision photometric redshifts will be essential for extracting cosmological parameters from the next generation of wide-area imaging surveys. In this paper we introduce a photometric redshift algorithm, ArborZ, based on the machine-learning technique of Boosted Decision Trees. We study the algorithm using galaxies from the Sloan Digital Sky Survey and from mock catalogs intended to simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single "best estimate" and error, and also provides a photo-z quality figure-of-merit for each galaxy that can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future work.Comment: 10 pages, 13 figures, submitted to Ap

    PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval

    Get PDF
    Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these algorithms to related document networks comprised of automatically-generated content-similarity links. Specifically, this work tackles the problem of document retrieval in the biomedical domain, in the context of the PubMed search engine. A series of reranking experiments demonstrate that incorporating evidence extracted from link structure yields significant improvements in terms of standard ranked retrieval metrics. These results extend the applicability of link analysis algorithms to different environments

    A Novel ILP Framework for Summarizing Content with High Lexical Variety

    Full text link
    Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201
    corecore