49 research outputs found

    Automated Knowledge Base Quality Assessment and Validation based on Evolution Analysis

    Get PDF
    In recent years, numerous efforts have been put towards sharing Knowledge Bases (KB) in the Linked Open Data (LOD) cloud. These KBs are being used for various tasks, including performing data analytics or building question answering systems. Such KBs evolve continuously: their data (instances) and schemas can be updated, extended, revised and refactored. However, unlike in more controlled types of knowledge bases, the evolution of KBs exposed in the LOD cloud is usually unrestrained, what may cause data to suffer from a variety of quality issues, both at a semantic level and at a pragmatic level. This situation affects negatively data stakeholders – consumers, curators, etc. –. Data quality is commonly related to the perception of the fitness for use, for a certain application or use case. Therefore, ensuring the quality of the data of a knowledge base that evolves is vital. Since data is derived from autonomous, evolving, and increasingly large data providers, it is impractical to do manual data curation, and at the same time, it is very challenging to do a continuous automatic assessment of data quality. Ensuring the quality of a KB is a non-trivial task since they are based on a combination of structured information supported by models, ontologies, and vocabularies, as well as queryable endpoints, links, and mappings. Thus, in this thesis, we explored two main areas in assessing KB quality: (i) quality assessment using KB evolution analysis, and (ii) validation using machine learning models. The evolution of a KB can be analyzed using fine-grained “change” detection at low-level or using “dynamics” of a dataset at high-level. In this thesis, we present a novel knowledge base quality assessment approach using evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. However, the first step in building the quality assessment approach was to identify the quality characteristics. Using high-level change detection as measurement functions, in this thesis we present four quality characteristics: Persistency, Historical Persistency, Consistency and Completeness. Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty Nice. However, high-level changes, being coarse-grained, cannot capture all possible quality issues. In this context, we present a validation strategy whose rationale is twofold. First, using manual validation from qualitative analysis to identify causes of quality issues. Then, use RDF data profiling information to generate integrity constraints. The validation approach relies on the idea of inducing RDF shape by exploiting SHALL constraint components. In particular, this approach will learn, what are the integrity constraints that can be applied to a large KB by instructing a process of statistical analysis, which is followed by a learning model. We illustrate the performance of our validation approach by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. The techniques of quality assessment and validation developed during this work are automatic and can be applied to different knowledge bases independently of the domain. Furthermore, the measures are based on simple statistical operations that make the solution both flexible and scalable

    A systematic literature review of open data quality in practice

    Get PDF
    Context: The main objective of open data initiatives is to make information freely available through easily accessible mechanisms and facilitate exploitation. In practice openness should be accompanied with a certain level of trustwor- thiness or guarantees about the quality of data. Traditional data quality is a thoroughly researched field with several benchmarks and frameworks to grasp its dimensions. However, quality assessment in open data is a complicated process as it consists of stakeholders, evaluation of datasets as well as the publishing platform. Objective: In this work, we aim to identify and synthesize various features of open data quality approaches in practice. We applied thematic synthesis to identify the most relevant research problems and quality assessment methodologies. Method: We undertook a systematic literature review to summarize the state of the art on open data quality. The review process starts by developing the review protocol in which all steps, research questions, inclusion and exclusion criteria and analysis procedures are included. The search strategy retrieved 9323 publications from four scientific digital libraries. The selected papers were published between 2005 and 2015. Finally, through a discussion between the authors, 63 paper were included in the final set of selected papers. Results: Open data quality, in general, is a broad concept, and it could apply to multiple areas. There are many quality issues concerning open data hindering their actual usage for real-world applications. The main ones are unstruc- tured metadata, heterogeneity of data formats, lack of accuracy, incompleteness and lack of validation techniques. Furthermore, we collected the existing quality methodologies from selected papers and synthesized under a unifying classification schema. Also, a list of quality dimensions and metrics from selected paper is reported. Conclusion: In this research, we provided an overview of the methods related to open data quality, using the instru- ment of systematic literature reviews. Open data quality methodologies vary depending on the application domain. Moreover, the majority of studies focus on satisfying specific quality criteria. With metrics based on generalized data attributes a platform can be created to evaluate all possible open dataset. Also, the lack of methodology validation remains a major problem. Studies should focus on validation techniques

    A case of spotted fever group rickettsiosis imported into the United Kingdom and treated with ciprofloxacin: a case report

    Get PDF
    <p>Abstract</p> <p>Introduction</p> <p>Spotted fever group rickettsioses are an interesting group of infections, which are increasing in incidence worldwide.</p> <p>Case presentation</p> <p>Here we describe an imported case to the United Kingdom occurring in a patient who had recently visited Kruger National Park in South Africa – a highly endemic area for <it>Rickettsia </it>infections. Initial treatment with doxycycline failed but the patient made a prompt recovery after commencement of ciprofloxacin.</p> <p>Conclusion</p> <p>This finding raises the possibility that there are resistant strains of <it>Rickettsia </it>present.</p

    Mood Classification of Bangla Songs Based on Lyrics

    Full text link
    Music can evoke various emotions, and with the advancement of technology, it has become more accessible to people. Bangla music, which portrays different human emotions, lacks sufficient research. The authors of this article aim to analyze Bangla songs and classify their moods based on the lyrics. To achieve this, this research has compiled a dataset of 4000 Bangla song lyrics, genres, and used Natural Language Processing and the Bert Algorithm to analyze the data. Among the 4000 songs, 1513 songs are represented for the sad mood, 1362 for the romantic mood, 886 for happiness, and the rest 239 are classified as relaxation. By embedding the lyrics of the songs, the authors have classified the songs into four moods: Happy, Sad, Romantic, and Relaxed. This research is crucial as it enables a multi-class classification of songs' moods, making the music more relatable to people's emotions. The article presents the automated result of the four moods accurately derived from the song lyrics.Comment: Presented at International Conference on. Inventive Communication and Computational Technologies 202

    A Quality Assessment Approach for Evolving Knowledge Bases

    Get PDF
    Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence.Assessing the quality of a Knowledge Base (KB) is a complex task as it often means measuring the quality of structured information, ontologies and vocabularies, and queryable endpoints. Popular knowledge bases such as DBpedia, YAGO2, and Wikidata have chosen the RDF data model to represent their data due to its capabilities for semantically rich knowledge representation. Despite its advantages, there are challenges in using RDF data model, for example, data quality assessment and validation. In thispaper, we present a novel knowledge base quality assessment approach that relies on evolution analysis. The proposed approachuses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. Our quality characteristics are based on the KB evolution analysis and we used high-level change detection for measurement functions. In particular, we propose four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness.Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency andcompleteness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty. The capability of Persistency and Consistency characteristics to detect quality issues varies significantly between the two case studies. Persistency measure gives observational results for evolving KBs. It is highly effective in case of KBwith periodic updates such as 3cixty KB. The Completeness characteristic is extremely effective and was able to achieve 95%precision in error detection for both use cases. The measures are based on simple statistical operations that make the solution both flexible and scalabl

    Aspergillus fumigatus allergen expression is coordinately regulated in response to hydrogen peroxide and cyclic AMP

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>A. fumigatus </it>has been associated with a wide spectrum of allergic disorders such as ABPA or SAFS. It is poorly understood what allergens in particular are being expressed during fungal invasion and which are responsible for stimulation of immune responses. Study of the dynamics of allergen production by fungi may lead to insights into how allergens are presented to the immune system.</p> <p>Methods</p> <p>Expression of 17 <it>A. fumigatus </it>allergen genes was examined in response to various culture conditions and stimuli as well as in the presence of macrophages in order to mimic conditions encountered in the lung.</p> <p>Results</p> <p>Expression of 14/17 allergen genes was strongly induced by oxidative stress caused by hydrogen peroxide (Asp f 1, -2, -4, -5, -6, -7, -8, -10, -13, -17 and -18, all >10-fold and Asp f 11, -12, and -22, 5-10-fold) and 16/17 allergen genes were repressed in the presence of cAMP. The 4 protease allergen genes (Asp f -5, -10, -13 and -18) were expressed at very low levels compared to the comparator (<it>β</it>-tubulin) under all other conditions examined. Mild heat shock, anoxia, lipid and presence of macrophages did not result in coordinated changes in allergen gene expression. Growth on lipid as sole carbon source contributed to the moderate induction of most of the allergen genes. Heat shock (37°C > 42°C) caused moderate repression in 11/17 genes (Asp f 1, -2, -4, -5, -6, -9, -10, -13, -17, -18 and -23) (2- to 9-fold), which was mostly evident for Asp f 1 and -9 (~9-fold). Anaerobic stress led to moderate induction of 13/17 genes (1.1 to 4-fold) with one, Asp f 8 induced over 10-fold when grown under mineral oil. Complex changes were seen in gene expression during co-culture of <it>A. fumigatus </it>with macrophages.</p> <p>Conclusions</p> <p>Remarkable coordination of allergen gene expression in response to a specific condition (oxidative stress or the presence of cAMP) has been observed, implying that a single biological stimulus may play a role in allergen gene regulation. Interdiction of a putative allergen expression induction signalling pathway might provide a novel therapy for treatment of fungal allergy.</p

    Completeness and Consistency Analysis for Evolving Knowledge Bases

    Full text link
    Assessing the quality of an evolving knowledge base is a challenging task as it often requires to identify correct quality assessment procedures. Since data is often derived from autonomous, and increasingly large data sources, it is impractical to manually curate the data, and challenging to continuously and automatically assess their quality. In this paper, we explore two main areas of quality assessment related to evolving knowledge bases: (i) identification of completeness issues using knowledge base evolution analysis, and (ii) identification of consistency issues based on integrity constraints, such as minimum and maximum cardinality, and range constraints. For completeness analysis, we use data profiling information from consecutive knowledge base releases to estimate completeness measures that allow predicting quality issues. Then, we perform consistency checks to validate the results of the completeness analysis using integrity constraints and learning models. The approach has been tested both quantitatively and qualitatively by using a subset of datasets from both DBpedia and 3cixty knowledge bases. The performance of the approach is evaluated using precision, recall, and F1 score. From completeness analysis, we observe a 94% precision for the English DBpedia KB and 95% precision for the 3cixty Nice KB. We also assessed the performance of our consistency analysis by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. We observed that the best performing model in our experimental setup is the Random Forest, reaching an F1 score greater than 90% for minimum and maximum cardinality and 84% for range constraints.Comment: Accepted for Journal of Web Semantic

    A Multi Constrained Transformer-BiLSTM Guided Network for Automated Sleep Stage Classification from Single-Channel EEG

    Full text link
    Sleep stage classification from electroencephalogram (EEG) is significant for the rapid evaluation of sleeping patterns and quality. A novel deep learning architecture, ``DenseRTSleep-II'', is proposed for automatic sleep scoring from single-channel EEG signals. The architecture utilizes the advantages of Convolutional Neural Network (CNN), transformer network, and Bidirectional Long Short Term Memory (BiLSTM) for effective sleep scoring. Moreover, with the addition of a weighted multi-loss scheme, this model is trained more implicitly for vigorous decision-making tasks. Thus, the model generates the most efficient result in the SleepEDFx dataset and outperforms different state-of-the-art (IIT-Net, DeepSleepNet) techniques by a large margin in terms of accuracy, precision, and F1-score

    Invasive pulmonary aspergillosis 10 years post bone marrow transplantation: a case report

    Get PDF
    Abstract Introduction Invasive pulmonary aspergillosis is a leading cause of mortality and morbidity in bone marrow transplant recipients. Establishing the diagnosis remains a challenge for clinicians working in acute care setting. However, prompt diagnosis and treatment can lead to favourable outcomes Case presentation We report a case of invasive aspergillosis occurring in a 39-year-old Caucasian female 10 years after an allogeneic haematopoietic bone marrow transplant, and 5 years after stopping all immunosuppression. Possible risk factors include bronchiolitis obliterans and exposure to building dust (for example, handling her husband's dusty overalls). There are no similar case reports in the literature at this time. Conclusion High clinical suspicion, especially in the setting of failure to respond to broad-spectrum antibiotics, should alert clinicians to the possibility of invasive pulmonary aspergillosis, which, in this case, responded to antifungal therapy.</p
    corecore