51 research outputs found

    The knowledge graph lifecycle in NTT DATA

    Get PDF
    The Semantic Business Unit (SEMBU) in NTT DATA aims to increase the semantic interoper ability and accessibility of European institutions’ data projects by following Linked Open Data (LOD) principles to build controlled vocabularies and produce Knowledge Graphs (KGs). One of its most notable projects revolves around the CORDIS portal1, which publishes information about research and innovation projects funded by the European Commission. SEMBU pursues two main goals: (i) expose semantic data related to CORDIS via a SPARQL endpoint that facilitates access and reuse of quality scientific-related data, and (ii) design an efficient, incremental, and automated KG lifecycle to be used as a reference in other data projects. To that end, we have adopted state-of-the-art semantic technologies to support the creation and management of the KG with the goal of centralizing knowledge and providing an overall view of data assets that improve data governance, maintenance, and external interaction by data consumers. We have also identified some of their limitations which are tackled via an industrial PhD. This paper reports our experience, the obstacles, and proposals for generating and maintaining the CORDIS KG.This work was partly funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RBI00 (DOGO4ML). Javier Flores is supported by contract 2020-DI-027 of the Industrial Doctorate Program of the Government of Catalonia and CONACYT’s scholarship. Sergi Nadal is partly supported by the Spanish Ministerio de Ciencia e Innovación, as well as the European Union - NextGenerationEU, under project FJC2020-045809-I /AEI/10.13039/501100011033.Peer ReviewedPostprint (published version

    Incremental schema integration for data wrangling via knowledge graphs

    Get PDF
    Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present NextiaDI, a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.This work was partly supported by the DOGO4ML project, funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00, and D3M project, funded by the Spanish Agencia Estatal de Investigación (AEI) under project PDC2021-121195-I00. Javier Flores is supported by contract 2020-DI-027 of the Industrial Doctorate Program of the Government of Catalonia and Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico). Sergi Nadal is partly supported by the Spanish Ministerio de Ciencia e Innovación, as well as the European Union – NextGenerationEU, under project FJC2020-045809-I.Peer ReviewedPostprint (published version

    A Comparative Study of Physical and Chemical Processes for Removal of Biomass in Biofilters

    Get PDF
    After 6 months of operation a long-term biofilter was stopped for two weeks and then it was started up again for a second experimental period of almost 1.3 years, with high toluene loads and submitted to several physical and chemical treatments in order to remove excess biomass that could affect the reactor’s performance due to clogging, whose main effect is a high pressure drop. Elimination capacity and removal efficiency were determined after each treatment. The methods applied were: filling with water and draining, backwashing, and air sparging. Different flows and temperatures (20, 30, 45 and 60 °C) were applied, either with distilled water or with different chemicals in aqueous solutions. Treatments with chemicals caused a decrease of the biofilter performance, requiring periods of 1 to 2 weeks to recover previous values. The results indicate that air sparging with pure distilled water as well as with solutions of NaOH (0.01% w/v) and NaOCl (0.01% w/v) were the treatments that removed more biomass, working either at 20, 30 or 45 °C and at relatively low flow rates (below 320 L h−1), but with a high biodegradation inhibition after the treatments. Dry biomass (g VS) content was determined at three different heights of the biofilter in order to carry out each experiment under the same conditions. The same amount of dry biomass when applying a treatment was established so it could be considered that the biofilm conditions were identical. Wet biomass was used as a control of the biofilter’s water content during treatments. Several batch assays were performed to support and quantify the observed inhibitory effects of the different chemicals and temperatures applied

    Quid: observatorio de medios

    Get PDF
    El informe está dividido en cuatro apartados: “Derecho a la información y transparencia”, “La televisión mexicana”, “Empresas y prácticas periodísticas” y “Los que se fueron”. En el primero de ellos se presenta un texto que ayuda a entender cuál es el momento en el que se encuentran las propuestas legislativas para regular a los medios y las telecomunicaciones en México, y una evaluación de los primeros cinco años del Instituto de Transparencia e Información Pública de Jalisco. El segundo apartado del informe es ecléctico, pues se compone de artículos que trabajan distintas temáticas de la televisión:la estructura y oferta de la televisión en nuestro país (en particular en la ciudad de Guadalajara), la televisión por cable (enfatizando el caso de Megacable), un recuento de cómo se gestó el Canal 44 y de sus prospectivas en 2011, y los mundiales de futbol. La tercera parte del informe documenta algunas de las situaciones más importantes que se viven en el periodismo local: estos trabajos presentan sistemas en crisis (alta vulnerabilidad de los periodistas mexicanos ante un clima de violencia que lejos de disminuir va en aumento, y la participación, por acción u omisión, del Estado mexicano en la sistemática violación de los derechos de quienes dedican su vida al trabajo periodístico. Los siguientes artículos tratan sobre las transformaciones de las empresas periodísticas, particularmente las del sector de la prensa escrita: la rápida e inexorable desaparición de los suplementos culturales, y una radiografía sobre las formas de producción de algunas secciones internacionales de los periódicos tapatíos. Al final se presentan las semblanzas de José Galindo, Raúl Mora Lomelí, S.J., Tomás Eloy Martínez y Juan Pablo Rosell.ITESO, A.C

    Bioinformatics analysis of mutations in SARSCoV- 2 and clinical phenotypes

    Get PDF
    1 p.-1 fig.-8 tab.Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), initially reported in Wuhan (China) hasspread worldwide. Like other viruses, SARS-CoV-2 accumulates mutations with each cycle of replication by continuously evolving a viral strain with one or more single nucleotide variants (SNVs). However, SNVs that cause severe COVID-19 or lead to immune escape or vaccine failure are not well understood. We aim to identify SNVs associated with severe clinical phenotypes.Methods: In this study, 27429 whole-genome aligned consensus sequences of SARS-CoV-2 were collected from genomic epidemiology of SARS-CoV-2 project in Spain (SeqCOVID) [1]. These samples were obtained from patients who required hospitalization and/or intensive care unit admission (ICU), excluding those registered in the first pandemic wave.Besides, 248 SARS-CoV-2 genomes were isolated from COVID-19 hospitalized patients from Gregorio Marañon General University Hospital (GMH) of which 142 were fully vaccinated. Bioinformatics tools using R and Python programming languages were developed and implemented comparing those to SARS-CoV-2 Wuhan-Hu-1 (reference genome).Results: Using a selection threshold mutational frequency 10%, 27 SNVs were expected to have association with hospitalization and ICU risk. The reference haplotype differing at the SNV coding for lysine at the residue 203 (N:R203K) was found to have negative association with COVID-19 hospitalization risk (p = 5.37 x 10-04). Similarly, a negative association was observed when the residue at 501 is replaced by tyrosine (S:N501Y) (p = 1.33 x 10-02). The application of a Chi-square test suggested that SNV-haplotypes coding for mutants residues such as (S:A222V, N:A220V, ORF10:V30L) and (ORF1a:T1001I, ORF1a:I2230T, S:N501Y, S:T716S, S:S982A, ORF8:Q27*, N:R203K, N:S235F) have negative associations with COVID-19 hospitalization risk (p = 6.58 x 10-07 and p = 2.27 x 10-16, respectively) and COVID-19 ICU risk (p = 1.15 x 10-02 and p = 2.51 x 10-02, respectively). Focusing on the SNV-haplotype coding the mutations (S:A222V, N:A220V, N:D377Y, ORF10:V30L) were observed to increase the risk of COVID-19 hospitalization (p = 2.71 x 10-04). Results from SARS-CoV-2 genomes analysis from GMH showed 63 coding SNVs which met the established threshold value. Applying a Chi-square test, the SNV-haplotype carrying coding variants for mutant residues in 5 ORF proteins and surface and membrane glycoprotein and nucleocapsid phosphoprotein was significantly associated with vaccine failure in hospitalized COVID-19 patients (p = 7.91 x 10-04).Conclusions: SNV-haplotypes carrying variants lead to non-synonymous mutations located along SARS-CoV-2 wholeproteome may influence COVID-19 severity and vaccine failure suggesting a functional role in the clinical outcome for COVID-19 patients.This research work was funded by the European Commission-NextGenerationEU (Regulation EU 2020/2094), through CSIC’s Global Health Platform (PTI Salud Global)Peer reviewe

    Clonal chromosomal mosaicism and loss of chromosome Y in elderly men increase vulnerability for SARS-CoV-2

    Full text link
    The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, COVID-19) had an estimated overall case fatality ratio of 1.38% (pre-vaccination), being 53% higher in males and increasing exponentially with age. Among 9578 individuals diagnosed with COVID-19 in the SCOURGE study, we found 133 cases (1.42%) with detectable clonal mosaicism for chromosome alterations (mCA) and 226 males (5.08%) with acquired loss of chromosome Y (LOY). Individuals with clonal mosaic events (mCA and/or LOY) showed a 54% increase in the risk of COVID-19 lethality. LOY is associated with transcriptomic biomarkers of immune dysfunction, pro-coagulation activity and cardiovascular risk. Interferon-induced genes involved in the initial immune response to SARS-CoV-2 are also down-regulated in LOY. Thus, mCA and LOY underlie at least part of the sex-biased severity and mortality of COVID-19 in aging patients. Given its potential therapeutic and prognostic relevance, evaluation of clonal mosaicism should be implemented as biomarker of COVID-19 severity in elderly people. Among 9578 individuals diagnosed with COVID-19 in the SCOURGE study, individuals with clonal mosaic events (clonal mosaicism for chromosome alterations and/or loss of chromosome Y) showed an increased risk of COVID-19 lethality

    Impact of COVID-19 on cardiovascular testing in the United States versus the rest of the world

    Get PDF
    Objectives: This study sought to quantify and compare the decline in volumes of cardiovascular procedures between the United States and non-US institutions during the early phase of the coronavirus disease-2019 (COVID-19) pandemic. Background: The COVID-19 pandemic has disrupted the care of many non-COVID-19 illnesses. Reductions in diagnostic cardiovascular testing around the world have led to concerns over the implications of reduced testing for cardiovascular disease (CVD) morbidity and mortality. Methods: Data were submitted to the INCAPS-COVID (International Atomic Energy Agency Non-Invasive Cardiology Protocols Study of COVID-19), a multinational registry comprising 909 institutions in 108 countries (including 155 facilities in 40 U.S. states), assessing the impact of the COVID-19 pandemic on volumes of diagnostic cardiovascular procedures. Data were obtained for April 2020 and compared with volumes of baseline procedures from March 2019. We compared laboratory characteristics, practices, and procedure volumes between U.S. and non-U.S. facilities and between U.S. geographic regions and identified factors associated with volume reduction in the United States. Results: Reductions in the volumes of procedures in the United States were similar to those in non-U.S. facilities (68% vs. 63%, respectively; p = 0.237), although U.S. facilities reported greater reductions in invasive coronary angiography (69% vs. 53%, respectively; p < 0.001). Significantly more U.S. facilities reported increased use of telehealth and patient screening measures than non-U.S. facilities, such as temperature checks, symptom screenings, and COVID-19 testing. Reductions in volumes of procedures differed between U.S. regions, with larger declines observed in the Northeast (76%) and Midwest (74%) than in the South (62%) and West (44%). Prevalence of COVID-19, staff redeployments, outpatient centers, and urban centers were associated with greater reductions in volume in U.S. facilities in a multivariable analysis. Conclusions: We observed marked reductions in U.S. cardiovascular testing in the early phase of the pandemic and significant variability between U.S. regions. The association between reductions of volumes and COVID-19 prevalence in the United States highlighted the need for proactive efforts to maintain access to cardiovascular testing in areas most affected by outbreaks of COVID-19 infection

    Towards scalable data discovery

    Get PDF
    We study the problem of discovering joinable datasets at scale. We approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the schemata and data values of datasets, which can be efficiently extracted in a distributed and parallel fashion. Profiles are then compared, to predict the quality of a join operation among a pair of attributes from different datasets. In contrast to the state-of-the-art, we define a novel notion of join quality that relies on a metric considering both the containment and cardinality proportion between join candidate attributes. We implement our approach in a system called NextiaJD, and present experiments to show the predictive performance and computational efficiency of our method. Our experiments show that NextiaJD obtains similar predictive performance to that of hash-based methods, yet we are able to scale-up to larger volumes of data. Also, NextiaJD generates a considerably less amount of false positives, which is a desirable feature at scale.This work is partly supported by Barcelona’s City Council under grant agreement 20S08704. Javier Flores is supported by contract 2020-DI-027 of the Industrial Doctorate Program of the Government of Catalonia and Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico).Peer ReviewedPostprint (published version
    corecore