182 research outputs found
Kernelized Cost-Sensitive Listwise Ranking
This thesis research aims to conduct a study on a cost-sensitive listwise approach to learning to rank.
Learning to Rank is an area of application in machine learning, typically supervised, to build ranking models for Information Retrieval systems. The training data consists of lists of items with some partial order specified induced by an ordinal score or a binary judgment (relevant/not relevant). The model purpose is to produce a permutation of the items in this list in a way which is close to the rankings in the training data. This technique has been
successfully applied to ranking, and several approaches have been proposed since then, including the listwise approach.
A cost-sensitive version of that is an adaptation of this framework which treats the documents within a list with different probabilities, i.e. attempt to impose weights for the documents with higher cost. We then take this algorithm to the next level by kernelizing the loss and exploring the optimization in different spaces.
Among the different existing likelihood algorithms, we choose ListMLE as primary focus of experimentation, since it has been shown to be the approach with the best empirical performance. The theoretical framework is given along with its mathematical properties.
Experimentation is done on the benchmark LETOR dataset. They contain queries and some characteristics of the retrieved documents and its human judgments on the relevance of the documents on the queries.
Based on that we will show how the Kernel Cost-Sensitive ListMLE performs compared to the baseline Plain Cost-Sensitive ListMLE, ListNet, and RankSVM and show different aspects of the proposed loss function within different families of kernels
Annual Scholars/Donors Luncheon (2013)
April 3, 2013https://digitalcommons.nyls.edu/scholars_donors_luncheon/1001/thumbnail.jp
Annual Scholars/Donors Luncheon (2012)
https://digitalcommons.nyls.edu/scholars_donors_luncheon/1000/thumbnail.jp
The Fascinating History of the Early Botanical Exploration and Investigations in Southern California
Information on plant collectors in southern California is scattered through a number of publications, some of them obscure or not well known to botanists. This paper gives a selective account of major collectors from 1793 to 1930. The appendix lists the plant collectors with references to biographical material concerning each. It is hoped that this preliminary account will stimulate further historical studies
Search Queries in an Information Retrieval System for Arabic-Language Texts
Information retrieval aims to extract from a large collection of data a subset of information that is relevant to user’s needs. In this study, we are interested in information retrieval in Arabic-Language text documents. We focus on the Arabic language, its morphological features that potentially impact the implementation and performance of an information retrieval system and its unique characters that are absent in the Latin alphabet and require specialized approaches. Specifically, we report on the design, implementation and evaluation of the search functionality using the Vector Space Model with several weighting schemes. Our implementation uses the ISRI stemming algorithms as the underlying stemming technique and the general Arabic stop word list for building inverted indices for Arabic-language documents. We evaluate our implementation on a corpus consisting of selected technical papers published in Arabic-language journals. We use the Open Journal Systems (OJS) from the Public Knowledge Project as a repository for the corpus used in the evaluation. We evaluate the performance of our implementation of the search using a classic recall/precision approach and compare it to one of the default multilingual search functions supported in the OJS. Our experimental analysis suggests that stemming is an effective technique for searches in Arabic-language texts that improves the quality of the information retrieval system
Machine Learning-Based Ontology Mapping Tool to Enable Interoperability in Coastal Sensor Networks
In today’s world, ontologies are being widely used for data integration tasks and solving information heterogeneity problems on the web because of their capability in providing explicit meaning to the information. The growing need to resolve the heterogeneities between different information systems within a domain of interest has led to the rapid development of individual ontologies by different organizations. These ontologies designed for a particular task could be a unique representation of their project needs. Thus, integrating distributed and heterogeneous ontologies by finding semantic correspondences between their concepts has become the key point to achieve interoperability among different representations. In this thesis, an advanced instance-based ontology matching algorithm has been proposed to enable data integration tasks in ocean sensor networks, whose data are highly heterogeneous in syntax, structure, and semantics. This provides a solution to the ontology mapping problem in such systems based on machine-learning methods and string-based methods
Infometrics : history ans trends
Numa releitura da história das metrias da informação em todas suas variantes, o presente Capítulo resgata a contribuição de numerosos pesquisadores da Ïndia, bem
como da Europa Oriental e da antiga União Soviética, estes últimos notadamente no domínio da cientometria. O Interesse pelos estudos infométricos no Brasil, e mais
particularmente pela bibliometria, nos anos 70-80 do passado século, experimentou posteriormente um declínio significativo, para renascer com nova pujança nos
últimos anos, emnumerosas aplicações. A intenção deste longo Capítulo é mostrar, com o auxílio de exemplos concretos, a variedade de aplicações das metrias da
informação e, o que é mais importante, ―como fazer‖. Sob uma variedade de nomes – bibliometria, infometria, cientometria, webmetria, etc. – as técnicas infométricas
abrem à ciência da informação um brilhante leque de aplicações nos procesos informacionais de representação, organização, gestão, recuperação, planejamento,
inferência, tomada de decisão, competitividade, inovação, e todos os desdobramentos políticos, sociais, econômicos, educativos e culturais.In a new reading of the history of infometrics in its whole variety, this Capter uncovers the contribution of a number of Indian, as well as East-European and Russian
researchers, the last ones mainly in the domain of scientometrics. The interest, in Brazil, on infometrics, and more precisely in bibliometrics, in the decades of the s
seventies and eighties of the last century suffered later on a significant decrease by a recent and strong revival in numerous issues. Special attention is paid in this lon
Chapter to show, with the support of numerous examples, to the diversity of infometrics uses and, more important, to ―how to do it‖.Under a variety of names –
bibliometrics, infometrics, scientometrics, webmetrics, and so one – infometrics opens a wide and briklliant diversity of actual applications in information recording,
organizining, managing, processing, retrieving, forecasting, innovating, decision-making, as well as founding social, economic, cuktural and educationa policies
Retrieve: An Engineering Tool for Searching Remote Sensing and Environmental Engineering Databases
The design and development of a semi-automatic information retrieval system which features manual indexing, and an inverted file structure is presented. The system requires manual indexing done by an expert in the subject field to ensure high-precision searching. High-recall is achieved through the implementation of the inverted file. The system provides an interactive environment, a thesaurus for normalization of the indexing language, ranking of retrieved documents, and flexible output specifications. The purpose of this thesis is to present the design and development of in-house search-aid software for small document collections intended for Remote Sensing and Environmental Engineering users
- …