587 research outputs found
Using Learning to Rank Approach to Promoting Diversity for Biomedical Information Retrieval with Wikipedia
In most of the traditional information retrieval (IR) models, the independent
relevance assumption is taken, which assumes the relevance of a document is
independent of other documents. However, the pitfall of this is the high redundancy
and low diversity of retrieval result. This has been seen in many scenarios, especially
in biomedical IR, where the information need of one query may refer to different
aspects. Promoting diversity in IR takes the relationship between documents into
account. Unlike previous studies, we tackle this problem in the learning to rank
perspective. The main challenges are how to find salient features for biomedical data
and how to integrate dynamic features into the ranking model. To address these
challenges, Wikipedia is used to detect topics of documents for generating diversity
biased features. A combined model is proposed and studied to learn a diversified
ranking result. Experiment results show the proposed method outperforms baseline
models
Statistical Modeling to Information Retrieval for Searching from Big Text Data and Higher Order Inference for Reliability
This thesis examined two research projects: probabilistic information retrieval modeling and third-order inference on reliability.
In the first part of this dissertation, two research topics in the information retrieval are carried out and experimented on large-scale text data set. First, we conduct an in-depth study of relationship between information of document length and document relevance to user need. Two statistical methods are proposed which incorporates document length as a substantial weighting factor to achieve higher retrieval performance. Second, we utilize the property of survival function to propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval, and to model the proximity between query terms to improve retrieval performance. Through extensive experiments on standard TREC collections, our proposed models perform significantly better than the classical probabilistic information retrieval models.
In the second part of this dissertation, a small sample asymptotic method is proposed for higher order inference in the stress-strength reliability model, R=P(Y<X), where X and Y are independently distributed. A penalized likelihood method is proposed to handle the numerical complications of maximizing the constrained likelihood model. Simulation studies are conducted on two distributions: Burr type X distribution and exponentiated exponential distribution. Results from simulation studies show that the proposed method is very accurate even when the sample sizes are small
ISBIS 2016: Meeting on Statistics in Business and Industry
This Book includes the abstracts of the talks presented at the 2016 International Symposium on Business and Industrial Statistics, held at Barcelona, June 8-10, 2016, hosted at the Universitat Politècnica de Catalunya - Barcelona TECH, by the Department of Statistics and Operations Research. The location of the meeting was at ETSEIB Building (Escola Tecnica Superior d'Enginyeria Industrial) at Avda Diagonal 647.
The meeting organizers celebrated the continued success of ISBIS and ENBIS society, and the meeting draw together the international community of statisticians, both academics and industry professionals, who share the goal of making statistics the foundation for decision making in business and related applications. The Scientific Program Committee was constituted by:
David Banks, Duke University
AmÃlcar Oliveira, DCeT - Universidade Aberta and CEAUL
Teresa A. Oliveira, DCeT - Universidade Aberta and CEAUL
Nalini Ravishankar, University of Connecticut
Xavier Tort Martorell, Universitat Politécnica de Catalunya, Barcelona TECH
Martina Vandebroek, KU Leuven
Vincenzo Esposito Vinzi, ESSEC Business Schoo
Integrating Medical Ontology and Pseudo Relevance Feedback For Medical Document Retrieval
The purpose of this thesis is to undertake and improve the accuracy of locating the relevant documents from a large amount of Electronic Medical Data (EMD). The unique goal of this research is to propose a new idea for using medical ontology to find an easy and more reliable approach for patients to have a better understanding of their diseases and also help doctors to find and further improve the possible methods of diagnosis and treatments. The empirical studies were based on the dataset provided by CLEF focused on health care data. In this research, I have used Information Retrieval to find and obtain relevant information within the large amount of data sets provided by CLEF. I then used ranking functionality on the Terrier platform to calculate and evaluate the matching documents in the collection of data sets. BM25 was used as the base normalization method to retrieve the results and Pseudo Relevance Feedback weighting model to retrieve the information regarding patients health history and medical records in order to find more accurate results. I then used Unified Medical Language System to develop indexing of the queries while searching on the Internet and looking for health related documents. UMLS software was actually used to link the computer system with the health and biomedical terms and vocabularies into classify tools; it works as a dictionary for the patients by translating the medical terms. Later I would like to work on using medical ontology to create a relationship between the documents regarding the medical data and my retrieved results
Semantic concept extraction from electronic medical records for enhancing information retrieval performance
With the healthcare industry increasingly using EMRs, there emerges an opportunity for knowledge discovery within the healthcare domain that was not possible with paper-based medical records. One such opportunity is to discover UMLS concepts from EMRs. However, with opportunities come challenges that need to be addressed. Medical verbiage is very different from common English verbiage and it is reasonable to assume extracting any information from medical text requires different protocols than what is currently used in common English text. This thesis proposes two new semantic matching models: Term-Based Matching and CUI-Based Matching. These two models use specialized biomedical text mining tools that extract medical concepts from EMRs. Extensive experiments to rank the extracted concepts are conducted on the University of Pittsburgh BLULab NLP Repository for the TREC 2011 Medical Records track dataset that consists of 101,711 EMRs that contain concepts in 34 predefined topics. This thesis compares the proposed semantic matching models against the traditional weighting equations and information retrieval tools used in the academic world today
WEIBULL DISTRIBUTION BASED ON EDUCATION PARTLY INTERVAL CENSORED DATA
The work in this project is concerned with the applying of techniques for the assessment of survival analysis in data that include censored observations. Survival analysis has a lot of achievement in the medical, engineering, economic, education and other fields and it also known as failure time analysis. Partly Interval Censoring (PIC) is one of the techniques of the censoring that used in the survival analysis and it can help to treat many types of data especially the incomplete data. One of the most commonly lifetime distribution used in the reliability applications is Weibull distribution. In this project we use Weibull model based on modified education partly interval censored data as well as medical data and simulation data. Based on the medical data, we found that our model is comparable with Turnbull method. From the education data and simulation study for this particular case, we can conclude that our proposed distribution describes well the nature of the model as compared to the Turnbull method in terms of the value of scale and shape parameter estimates. Plots of survival distribution function against failure time are used to examine the predicted survival patterns for the two types of failures
- …