22 research outputs found

    The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid

    Get PDF
    BACKGROUND: The Cancer Biomedical Informatics Grid (caBIGℱ) is a network of individuals and institutions, creating a world wide web of cancer research. An important aspect of this informatics effort is the development of consistent practices for data standards development, using a multi-tier approach that facilitates semantic interoperability of systems. The semantic tiers include (1) information models, (2) common data elements, and (3) controlled terminologies and ontologies. The College of American Pathologists (CAP) cancer protocols and checklists are an important reporting standard in pathology, for which no complete electronic data standard is currently available. METHODS: In this manuscript, we provide a case study of Cancer Common Ontologic Representation Environment (caCORE) data standard implementation of the CAP cancer protocols and checklists model – an existing and complex paper based standard. We illustrate the basic principles, goals and methodology for developing caBIGℱ models. RESULTS: Using this example, we describe the process required to develop the model, the technologies and data standards on which the process and models are based, and the results of the modeling effort. We address difficulties we encountered and modifications to caCORE that will address these problems. In addition, we describe four ongoing development projects that will use the emerging CAP data standards to achieve integration of tissue banking and laboratory information systems. CONCLUSION: The CAP cancer checklists can be used as the basis for an electronic data standard in pathology using the caBIGℱ semantic modeling methodology

    The equivalence of four extensions of context-free grammars

    Get PDF
    There is currently considerable interest among computational linguists in grammatical formalisms with highly restricted generative power. This paper concerns the relationship between the class of string languages generated by several such formalisms, namely, combinatory categorial grammars, head grammars, linear indexed grammars, and tree adjoining grammars. Each of these formalisms is known to generate a larger class of languages than context-free grammars. The four formalisms under consideration were developed independently and appear superficially to be quite different from one another. The result presented in this paper is that all four of the formalisms under consideration generate exactly the same class of string languages

    Semi-automatic conversion of BioProp semantic annotation to PASBio annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Semantic role labeling (SRL) is an important text analysis technique. In SRL, sentences are represented by one or more predicate-argument structures (PAS). Each PAS is composed of a predicate (verb) and several arguments (noun phrases, adverbial phrases, etc.) with different semantic roles, including main arguments (agent or patient) as well as adjunct arguments (time, manner, or location). PropBank is the most widely used PAS corpus and annotation format in the newswire domain. In the biomedical field, however, more detailed and restrictive PAS annotation formats such as PASBio are popular. Unfortunately, due to the lack of an annotated PASBio corpus, no publicly available machine-learning (ML) based SRL systems based on PASBio have been developed. In previous work, we constructed a biomedical corpus based on the PropBank standard called BioProp, on which we developed an ML-based SRL system, BIOSMILE. In this paper, we aim to build a system to convert BIOSMILE's BioProp annotation output to PASBio annotation. Our system consists of BIOSMILE in combination with a BioProp-PASBio rule-based converter, and an additional semi-automatic rule generator.</p> <p>Results</p> <p>Our first experiment evaluated our rule-based converter's performance independently from BIOSMILE performance. The converter achieved an F-score of 85.29%. The second experiment evaluated combined system (BIOSMILE + rule-based converter). The system achieved an F-score of 69.08% for PASBio's 29 verbs.</p> <p>Conclusion</p> <p>Our approach allows PAS conversion between BioProp and PASBio annotation using BIOSMILE alongside our newly developed semi-automatic rule generator and rule-based converter. Our system can match the performance of other state-of-the-art domain-specific ML-based SRL systems and can be easily customized for PASBio application development.</p

    An integrated model for quantifying accessibility-benefits in developing countries

    No full text
    The interaction between accessibility and rural development is a subject of current concern. The degree of accessibility determines the ability of individuals to participate in development and other social activities. The paper describes the development of a numerical method for quantifying accessibility-benefits suitable for application in developing countries. The methodology provides an integrated approach to analysing accessibility by considering all constraints faced by individuals, particularly their income. The method can be used to evaluate different accessibility-enhancing strategies, and to quantify the benefits derived by different groups of individuals under various states of socio-economic development in rural areas of developing countries. The application of the accessibility-benefits model is demonstrated using two hypothetical case studies featuring the effects of improving intermediate means of transport and improving temporal strategies. The purpose of the case studies is to demonstrate how different accessibility-enhancing strategies can be related to the key model parameters, and to show the likely magnitude of the benefits, in monetary terms, that can be attained by individuals with different income levels.

    Mining Micro-Blogs: Opportunities and Challenges

    No full text
    Summary. This chapter investigates whether and how micro-messaging technologies such as Twitter messages can be harnessed to obtain valuable information. The interesting characteristics of micro-blogging services, such as being user oriented, provide opportunities for different applications to use the content of these sites to their advantage. However, the same characteristics become the weakness of these sites when it comes to data modeling and analysis of the messages. These sites contains very large amount of unstructured, noisy with false or missing data which make the task of data mining difficult. This chapter first reviews some of the potential applications of the micro-messaging services and then provides some insight into different challenges faced by data mining applications. Later in the chapter, characteristics of a real-data collected from the Twitter are analysed. At the end of chapter, application of micro-blogging services is shown by three different case studies.

    The way we write

    No full text
    Country-specific variations of the English language in the biomedical literatur
    corecore