3,690 research outputs found

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop

    Get PDF

    Data-Driven Implementation To Filter Fraudulent Medicaid Applications

    Get PDF
    There has been much work to improve IT systems for managing and maintaining health records. The U.S government is trying to integrate different types of health care data for providers and patients. Health care fraud detection research has focused on claims by providers, physicians, hospitals, and other medical service providers to detect fraudulent billing, abuse, and waste. Data-mining techniques have been used to detect patterns in health care fraud and reduce the amount of waste and abuse in the health care system. However, less attention has been paid to implementing a system to detect fraudulent applications, specifically for Medicaid. In this study, a data-driven system using layered architecture to filter fraudulent applications for Medicaid was proposed. The Medicaid Eligibility Application System utilizes a set of public and private databases that contain individual asset records. These asset records are used to determine the Medicaid eligibility of applicants using a scoring model integrated with a threshold algorithm. The findings indicated that by using the proposed data-driven approach, the state Medicaid agency could filter fraudulent Medicaid applications and save over $4 million in Medicaid expenditures

    Data-Driven Implementation To Filter Fraudulent Medicaid Applications

    Get PDF
    There has been much work to improve IT systems for managing and maintaining health records. The U.S government is trying to integrate different types of health care data for providers and patients. Health care fraud detection research has focused on claims by providers, physicians, hospitals, and other medical service providers to detect fraudulent billing, abuse, and waste. Data-mining techniques have been used to detect patterns in health care fraud and reduce the amount of waste and abuse in the health care system. However, less attention has been paid to implementing a system to detect fraudulent applications, specifically for Medicaid. In this study, a data-driven system using layered architecture to filter fraudulent applications for Medicaid was proposed. The Medicaid Eligibility Application System utilizes a set of public and private databases that contain individual asset records. These asset records are used to determine the Medicaid eligibility of applicants using a scoring model integrated with a threshold algorithm. The findings indicated that by using the proposed data-driven approach, the state Medicaid agency could filter fraudulent Medicaid applications and save over $4 million in Medicaid expenditures

    Advanced Methods for Entity Linking in the Life Sciences

    Get PDF
    The amount of knowledge increases rapidly due to the increasing number of available data sources. However, the autonomy of data sources and the resulting heterogeneity prevent comprehensive data analysis and applications. Data integration aims to overcome heterogeneity by unifying different data sources and enriching unstructured data. The enrichment of data consists of different subtasks, amongst other the annotation process. The annotation process links document phrases to terms of a standardized vocabulary. Annotated documents enable effective retrieval methods, comparability of different documents, and comprehensive data analysis, such as finding adversarial drug effects based on patient data. A vocabulary allows the comparability using standardized terms. An ontology can also represent a vocabulary, whereas concepts, relationships, and logical constraints additionally define an ontology. The annotation process is applicable in different domains. Nevertheless, there is a difference between generic and specialized domains according to the annotation process. This thesis emphasizes the differences between the domains and addresses the identified challenges. The majority of annotation approaches focuses on the evaluation of general domains, such as Wikipedia. This thesis evaluates the developed annotation approaches with case report forms that are medical documents for examining clinical trials. The natural language provides different challenges, such as similar meanings using different phrases. The proposed annotation method, AnnoMap, considers the fuzziness of natural language. A further challenge is the reuse of verified annotations. Existing annotations represent knowledge that can be reused for further annotation processes. AnnoMap consists of a reuse strategy that utilizes verified annotations to link new documents to appropriate concepts. Due to the broad spectrum of areas in the biomedical domain, different tools exist. The tools perform differently regarding a particular domain. This thesis proposes a combination approach to unify results from different tools. The method utilizes existing tool results to build a classification model that can classify new annotations as correct or incorrect. The results show that the reuse and the machine learning-based combination improve the annotation quality compared to existing approaches focussing on the biomedical domain. A further part of data integration is entity resolution to build unified knowledge bases from different data sources. A data source consists of a set of records characterized by attributes. The goal of entity resolution is to identify records representing the same real-world entity. Many methods focus on linking data sources consisting of records being characterized by attributes. Nevertheless, only a few methods can handle graph-structured knowledge bases or consider temporal aspects. The temporal aspects are essential to identify the same entities over different time intervals since these aspects underlie certain conditions. Moreover, records can be related to other records so that a small graph structure exists for each record. These small graphs can be linked to each other if they represent the same. This thesis proposes an entity resolution approach for census data consisting of person records for different time intervals. The approach also considers the graph structure of persons given by family relationships. For achieving qualitative results, current methods apply machine-learning techniques to classify record pairs as the same entity. The classification task used a model that is generated by training data. In this case, the training data is a set of record pairs that are labeled as a duplicate or not. Nevertheless, the generation of training data is a time-consuming task so that active learning techniques are relevant for reducing the number of training examples. The entity resolution method for temporal graph-structured data shows an improvement compared to previous collective entity resolution approaches. The developed active learning approach achieves comparable results to supervised learning methods and outperforms other limited budget active learning methods. Besides the entity resolution approach, the thesis introduces the concept of evolution operators for communities. These operators can express the dynamics of communities and individuals. For instance, we can formulate that two communities merged or split over time. Moreover, the operators allow observing the history of individuals. Overall, the presented annotation approaches generate qualitative annotations for medical forms. The annotations enable comprehensive analysis across different data sources as well as accurate queries. The proposed entity resolution approaches improve existing ones so that they contribute to the generation of qualitative knowledge graphs and data analysis tasks

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    • …
    corecore