71,598 research outputs found

    Large-scale event extraction from literature with multi-level gene normalization

    Get PDF
    Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons -Attribution - Share Alike (CC BY-SA) license

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Integrating sensor streams in pHealth networks

    Get PDF
    Personal Health (pHealth) sensor networks are generally used to monitor the wellbeing of both athletes and the general public to inform health specialists of future and often serious ailments. The problem facing these domain experts is the scale and quality of data they must search in order to extract meaningful results. By using peer-to-peer sensor architectures and a mechanism for reducing the search space, we can, to some extent, address the scalability issue. However, synchronisation and normalisation of distributed sensor streams remains a problem in many networks. In the case of pHealth sensor networks, it is crucial for experts to align multiple sensor readings before query or data mining activities can take place. This paper presents a system for clustering and synchronising sensor streams in preparation for user queries

    CASP-DM: Context Aware Standard Process for Data Mining

    Get PDF
    We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs

    A core ontology for business process analysis

    Get PDF
    Business Process Management (BPM) aims at supporting the whole life-cycle necessary to deploy and maintain business processes in organisations. An important step of the BPM life-cycle is the analysis of the processes deployed in companies. However, the degree of automation currently achieved cannot support the level of adaptation required by businesses. Initial steps have been performed towards including some sort of automated reasoning within Business Process Analysis (BPA) but this is typically limited to using taxonomies. We present a core ontology aimed at enhancing the state of the art in BPA. The ontology builds upon a Time Ontology and is structured around the process, resource, and object perspectives as typically adopted when analysing business processes. The ontology has been extended and validated by means of an Events Ontology and an Events Analysis Ontology aimed at capturing the audit trails generated by Process-Aware Information Systems and deriving additional knowledge

    Pemilihan kerjaya di kalangan pelajar aliran perdagangan sekolah menengah teknik : satu kajian kes

    Get PDF
    This research is a survey to determine the career chosen of form four student in commerce streams. The important aspect of the career chosen has been divided into three, first is information about career, type of career and factor that most influence students in choosing a career. The study was conducted at Sekolah Menengah Teknik Kajang, Selangor Darul Ehsan. Thirty six form four students was chosen by using non-random sampling purpose method as respondent. All information was gather by using questionnaire. Data collected has been analyzed in form of frequency, percentage and mean. Results are performed in table and graph. The finding show that information about career have been improved in students career chosen and mass media is the main factor influencing students in choosing their career
    corecore