3,579 research outputs found

    A Simple Linear Ranking Algorithm Using Query Dependent Intercept Variables

    Full text link
    The LETOR website contains three information retrieval datasets used as a benchmark for testing machine learning ideas for ranking. Algorithms participating in the challenge are required to assign score values to search results for a collection of queries, and are measured using standard IR ranking measures (NDCG, precision, MAP) that depend only the relative score-induced order of the results. Similarly to many of the ideas proposed in the participating algorithms, we train a linear classifier. In contrast with other participating algorithms, we define an additional free variable (intercept, or benchmark) for each query. This allows expressing the fact that results for different queries are incomparable for the purpose of determining relevance. The cost of this idea is the addition of relatively few nuisance parameters. Our approach is simple, and we used a standard logistic regression library to test it. The results beat the reported participating algorithms. Hence, it seems promising to combine our approach with other more complex ideas.Comment: 5 page

    Eye Tracking: A Perceptual Interface for Content Based Image Retrieval

    Get PDF
    In this thesis visual search experiments are devised to explore the feasibility of an eye gaze driven search mechanism. The thesis first explores gaze behaviour on images possessing different levels of saliency. Eye behaviour was predominantly attracted by salient locations, but appears to also require frequent reference to non-salient background regions which indicated that information from scan paths might prove useful for image search. The thesis then specifically investigates the benefits of eye tracking as an image retrieval interface in terms of speed relative to selection by mouse, and in terms of the efficiency of eye tracking mechanisms in the task of retrieving target images. Results are analysed using ANOVA and significant findings are discussed. Results show that eye selection was faster than a computer mouse and experience gained during visual tasks carried out using a mouse would benefit users if they were subsequently transferred to an eye tracking system. Results on the image retrieval experiments show that users are able to navigate to a target image within a database confirming the feasibility of an eye gaze driven search mechanism. Additional histogram analysis of the fixations, saccades and pupil diameters in the human eye movement data revealed a new method of extracting intentions from gaze behaviour for image search, of which the user was not aware and promises even quicker search performances. The research has two implications for Content Based Image Retrieval: (i) improvements in query formulation for visual search and (ii) new methods for visual search using attentional weighting. Futhermore it was demonstrated that users are able to find target images at sufficient speeds indicating that pre-attentive activity is playing a role in visual search. A current review of eye tracking technology, current applications, visual perception research, and models of visual attention is discussed. A review of the potential of the technology for commercial exploitation is also presented

    Ranking relations using analogies in biological and information networks

    Get PDF
    Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects S={A(1):B(1),A(2):B(2),,A(N):B(N)}\mathbf{S}=\{A^{(1)}:B^{(1)},A^{(2)}:B^{(2)},\ldots,A^{(N)}:B ^{(N)}\}, measures how well other pairs A:B fit in with the set S\mathbf{S}. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in S\mathbf{S}? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions between pairs of proteins is discussed in detail, where we show that our approach can work in practice even if a small set of protein pairs is provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Chronic Risk and Disease Management Model Using Structured Query Language and Predictive Analysis

    Get PDF
    Individuals with chronic conditions are the ones who use health care most frequently and more than 50% of top ten causes of death are chronic diseases in United States and these members always have health high risk scores. In the field of population health management, identifying high risk members is very important in terms of patient health care, disease management and cost management. Disease management program is very effective way of monitoring and preventing chronic disease and health related complications and risk management allows physicians and healthcare companies to reduce patient’s health risk, help identifying members for care/disease management along with help in managing financial risk. The main objective of this research is to introduce efficient and accurate risk assessment model maintaining the accuracy of risk scores compared to existing model and based on calculated risk scores identify the members for disease management programs using structured query language. For the experimental purpose we have used data and information from different sources like CMS, NCQA, existing models and Healthcare Insurance Industry. In this approach, base principle is used from chronic and disability payment system (CDPS), based on this model weight of chronic disease is defined to calculate risk of each patient. Also to be more focused, three chronic diseases have been selected as a part of analysis. They are breast cancer, diabetes and congestive heart failure. Different sets of diagnosis, electronic medical records, and member pharmacy information are key source. Industry standard database system have been in taken in consideration while implementing the logic, which makes implementation of model more efficient since data is warehoused where model is built. We obtained experimental result from total 4761 relevant medical records taken from Molina Healthcare’s data warehouse. We tested proposed model using risk score data from State of Illinois using multiple linear regression method and implemented proposed logic in health plan data, based on which we calculated p-value and confidence level of our variables and also as second validation process we ran same data source through original risk model. In next step we showed that risk scores of members are highly contributing in member selection process for disease management program. To validate member selection criteria we used fast and frugal decision tree algorithm and confusion matrix result is used to measure the performance of member selection process for disease management program. The results show that the proposed model achieved overall risk assessment confidence level of 99%, with R-squared value of 98% and on disease management member identification we achieved 99% of sensitivity, 89% of accuracy and 74% of specificity. The experimental result from proposed model shows that if risk assessment model is taken one step further not only risk of member can be determined but it can help in disease management approach by identifying and prioritizing members for disease management. The resulting chronic risk and disease management method is very promising method for any patient, insurance companies, provider groups, claims processing organizations and physician groups to more accurately and effectively manage their members in terms of member health risk and enrolling them under required care management programs. Methods and design used in this research contributes to business analytics approach, overall member risk and disease management approach using predictive analytics based on member’s medical diagnosis, pharmacy utilization and member demographics

    A Time-Aware Approach to Improving Ad-hoc Information Retrieval from Microblogs

    Get PDF
    There is an immense number of short-text documents produced as the result of microblogging. The content produced is growing as the number of microbloggers grows, and as active microbloggers continue to post millions of updates. The range of topics discussed is so vast, that microblogs provide an abundance of useful information. In this work, the problem of retrieving the most relevant information in microblogs is addressed. Interesting temporal patterns were found in the initial analysis of the study. Therefore the focus of the current work is to first exploit a temporal variable in order to see how effectively it can be used to predict the relevance of the tweets and, then, to include it in a retrieval weighting model along with other tweet-specific features. Generalized Linear Mixed-effect Models (GLMMs) are used to analyze the features and to propose two re-ranking models. These two models were developed through an exploratory process on a training set and then were evaluated on a test set
    corecore