1,102 research outputs found

    Terminologia Anatomica; Considered from the Perspective of Next-Generation Knowledge Sources

    Get PDF
    This report examines the semantic structure of Terminologia Anatomica, taking one randomly selected page as an example. The focus of analysis is the meaning imparted to an anatomical term by virtue of its location within the structured list. Terminologia’s structure expressed through hierarchies of headings, varied typographical styles, indentations and an alphanumeric code implies specific relationships between the terms embedded in the list. Together, terms and relationships can potentially capture essential elements of anatomical knowledge. The analysis focuses on these knowledge elements and evaluates the consistency and logic in their representation. Most critical of these elements are class inclusion and part-whole relationships, which are implied, rather than explicitly modeled by Terminologia. This limits the use of the term list to those who have some knowledge of anatomy and excludes computer programs from navigating through the terminology. Assuring consistency in the explicit representation of anatomical relationships would facilitate adoption of Terminologia as the anatomical standard by the various controlled medical terminology (CMT) projects. These projects are motivated by the need for computerizing the patient record, and their aim is to generate machineunderstandable representations of biomedical concepts, including anatomy. Because of the lack of a consistent and explicit representation of anatomy, each of these CMTs has generated it own anatomy model. None of these models is compatible with each other, yet each is consistent with textbook descriptions of anatomy. The analysis of the semantic structure of Terminologia Anatomica leads to some suggestions for enhancing the term list in ways that would facilitate its adoption as the standard for anatomical knowledge representation in biomedical informatics

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    The devices, experimental scaffolds, and biomaterials ontology (DEB): a tool for mapping, annotation, and analysis of biomaterials' data

    Get PDF
    The size and complexity of the biomaterials literature makes systematic data analysis an excruciating manual task. A practical solution is creating databases and information resources. Implant design and biomaterials research can greatly benefit from an open database for systematic data retrieval. Ontologies are pivotal to knowledge base creation, serving to represent and organize domain knowledge. To name but two examples, GO, the gene ontology, and CheBI, Chemical Entities of Biological Interest ontology and their associated databases are central resources to their respective research communities. The creation of the devices, experimental scaffolds, and biomaterials ontology (DEB), an open resource for organizing information about biomaterials, their design, manufacture, and biological testing, is described. It is developed using text analysis for identifying ontology terms from a biomaterials gold standard corpus, systematically curated to represent the domain's lexicon. Topics covered are validated by members of the biomaterials research community. The ontology may be used for searching terms, performing annotations for machine learning applications, standardized meta-data indexing, and other cross-disciplinary data exploitation. The input of the biomaterials community to this effort to create data-driven open-access research tools is encouraged and welcomed.Preprin

    Doctor of Philosophy

    Get PDF
    dissertationThe objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often overwhelmed by excessive, irrelevant data. Scientists have applied natural language processing technologies to improve retrieval. Text summarization, a natural language processing approach, simplifies bibliographic data while filtering it to address a user's need. Traditional text summarization can necessitate the use of multiple software applications to accommodate diverse processing refinements known as "points-of-view." A new, statistical approach to text summarization can transform this process. Combo, a statistical algorithm comprised of three individual metrics, determines which elements within input data are relevant to a user's specified information need, thus enabling a single software application to summarize text for many points-of-view. In this dissertation, I describe this algorithm, and the research process used in developing and testing it. Four studies comprised the research process. The goal of the first study was to create a conventional schema accommodating a genetic disease etiology point-of-view, and an evaluative reference standard. This was accomplished through simulating the task of secondary genetic database curation. The second study addressed the development iv and initial evaluation of the algorithm, comparing its performance to the conventional schema using the previously established reference standard, again within the task of secondary genetic database curation. The third and fourth studies evaluated the algorithm's performance in accommodating additional points-of-view in a simulated clinical decision support task. The third study explored prevention, while the fourth evaluated performance for prevention and drug treatment, comparing results to a conventional treatment schema's output. Both summarization methods identified data that were salient to their tasks. The conventional genetic disease etiology and treatment schemas located salient information for database curation and decision support, respectively. The Combo algorithm located salient genetic disease etiology, treatment, and prevention data, for the associated tasks. Dynamic text summarization could potentially serve additional purposes, such as consumer health information delivery, systematic review creation, and primary research. This technology may benefit many user groups

    An open source infrastructure for managing knowledge and finding potential collaborators in a domain-specific subset of PubMed, with an example from human genome epidemiology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying relevant research in an ever-growing body of published literature is becoming increasingly difficult. Establishing domain-specific knowledge bases may be a more effective and efficient way to manage and query information within specific biomedical fields. Adopting controlled vocabulary is a critical step toward data integration and interoperability in any information system. We present an open source infrastructure that provides a powerful capacity for managing and mining data within a domain-specific knowledge base. As a practical application of our infrastructure, we presented two applications – Literature Finder and Investigator Browser – as well as a tool set for automating the data curating process for the human genome published literature database. The design of this infrastructure makes the system potentially extensible to other data sources.</p> <p>Results</p> <p>Information retrieval and usability tests demonstrated that the system had high rates of recall and precision, 90% and 93% respectively. The system was easy to learn, easy to use, reasonably speedy and effective.</p> <p>Conclusion</p> <p>The open source system infrastructure presented in this paper provides a novel approach to managing and querying information and knowledge from domain-specific PubMed data. Using the controlled vocabulary UMLS enhanced data integration and interoperability and the extensibility of the system. In addition, by using MVC-based design and Java as a platform-independent programming language, this system provides a potential infrastructure for any domain-specific knowledge base in the biomedical field.</p

    MedTextus: An Ontology-enhanced Medical Portal

    Get PDF
    In this paper we describe MedTextus, an online medical search portal with dynamic search and browse tools. To search for information, MedTextus lets users request synonyms and related terms specifically tailored to their query. A mapping algorithm dynamically builds the query context based on the UMLS ontology and then selects thesaurus terms that fit this context. Users can add these terms to their query and meta-search five medical databases. To facilitate browsing, the search results can be reviewed as a list of documents per database, as a set of folders into which all the documents are automatically categorized based on their content, and as a map that is built on the fly. We designed a user study to compare these dynamic support tools with the static query support of NLM Gateway and report on initial results for the search task. The users used NLM Gateway more effectively, but used MedTextus more efficiently and preferred its query formation tools

    PadChest: A large chest x-ray image dataset with multi-label annotated reports

    Get PDF
    We present a labeled large-scale, high resolution chest x-ray dataset for the automated exploration of medical images along with their associated reports. This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan Hospital (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography. The reports were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and mapped onto standard Unified Medical Language System (UMLS) terminology. Of these reports, 27% were manually annotated by trained physicians and the remaining set was labeled using a supervised method based on a recurrent neural network with attention mechanisms. The labels generated were then validated in an independent test set achieving a 0.93 Micro-F1 score. To the best of our knowledge, this is one of the largest public chest x-ray database suitable for training supervised models concerning radiographs, and the first to contain radiographic reports in Spanish. The PadChest dataset can be downloaded from http://bimcv.cipf.es/bimcv-projects/padchest/

    Information Systems and Health Care IX: Accessing Tacit Knowledge and Linking It to the Peer-Reviewed Literature

    Get PDF
    Clinical decision-making can be improved if healthcare practitioners are able to leverage both the tacit and explicit modalities of healthcare knowledge, yet at present there do not exist knowledge management systems that support any active and direct mapping between these two knowledge modalities. In this paper, we present a healthcare knowledge-mapping framework that maps (a) the tacit knowledge captured in terms of email-based discussions between pediatric pain practitioners through a Pediatric Pain Mailing List (PPML), to (b) explicit knowledge represented in terms of peer-reviewed healthcare literature available at PubMed. We report our knowledge mapping strategy that involves methods to establish discussion threads, organize the discussion threads in terms of topic-specific taxonomy, formulate an optimal search query based on the content of a discussion thread, submit the search query to PubMed and finally to retrieve and present the search results to the user

    An automatic method to generate domain-specific investigator networks using PubMed abstracts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Collaboration among investigators has become critical to scientific research. This includes ad hoc collaboration established through personal contacts as well as formal consortia established by funding agencies. Continued growth in online resources for scientific research and communication has promoted the development of highly networked research communities. Extending these networks globally requires identifying additional investigators in a given domain, profiling their research interests, and collecting current contact information. We present a novel strategy for building investigator networks dynamically and producing detailed investigator profiles using data available in PubMed abstracts.</p> <p>Results</p> <p>We developed a novel strategy to obtain detailed investigator information by automatically parsing the affiliation string in PubMed records. We illustrated the results by using a published literature database in human genome epidemiology (HuGE Pub Lit) as a test case. Our parsing strategy extracted country information from 92.1% of the affiliation strings in a random sample of PubMed records and in 97.0% of HuGE records, with accuracies of 94.0% and 91.0%, respectively. Institution information was parsed from 91.3% of the general PubMed records (accuracy 86.8%) and from 94.2% of HuGE PubMed records (accuracy 87.0). We demonstrated the application of our approach to dynamic creation of investigator networks by creating a prototype information system containing a large database of PubMed abstracts relevant to human genome epidemiology (HuGE Pub Lit), indexed using PubMed medical subject headings converted to Unified Medical Language System concepts. Our method was able to identify 70–90% of the investigators/collaborators in three different human genetics fields; it also successfully identified 9 of 10 genetics investigators within the PREBIC network, an existing preterm birth research network.</p> <p>Conclusion</p> <p>We successfully created a web-based prototype capable of creating domain-specific investigator networks based on an application that accurately generates detailed investigator profiles from PubMed abstracts combined with robust standard vocabularies. This approach could be used for other biomedical fields to efficiently establish domain-specific investigator networks.</p
    • …
    corecore