227 research outputs found

    Best of Both Worlds – Relational Databases and Statistics

    Get PDF
    Statistics software packages and relational database systems possess considerable overlap in the area of data loading, handling, and transformation. However, only databases are mainly optimized towards high performance in this area. In this paper, we present our approach on bringing the best of these two worlds together. We integrate the analytics-optimized database MonetDB and the R environment for statistical computing in a non-obtrusive, transparent and compatible way

    Column Stores as an IR Prototyping Tool

    Get PDF
    . We make the suggestion that instead of implementing custom index structures and query evaluation algorithms, IR researchers should simply store document representations in a column-oriented relational database and write ranking models using SQL. For rapid prototyping, this is particularly advantageous since researchers can explore new ranking functions and features by simply issuing SQL queries, without needing to write imperative code. We demonstrate the feasibility of this approach by an implementation of conjunctive BM25 using MonetDB on a part of the ClueWeb12 collection

    Verification and Validation of Semantic Annotations

    Full text link
    In this paper, we propose a framework to perform verification and validation of semantically annotated data. The annotations, extracted from websites, are verified against the schema.org vocabulary and Domain Specifications to ensure the syntactic correctness and completeness of the annotations. The Domain Specifications allow checking the compliance of annotations against corresponding domain-specific constraints. The validation mechanism will detect errors and inconsistencies between the content of the analyzed schema.org annotations and the content of the web pages where the annotations were found.Comment: Accepted for the A.P. Ershov Informatics Conference 2019(the PSI Conference Series, 12th edition) proceedin

    Weaving the Web(VTT) of Data

    Get PDF
    International audienceVideo has become a first class citizen on the Web with broad support in all common Web browsers. Where with struc- tured mark-up on webpages we have made the vision of the Web of Data a reality, in this paper, we propose a new vi- sion that we name the Web(VTT) of Data, alongside with concrete steps to realize this vision. It is based on the evolving standards WebVTT for adding timed text tracks to videos and JSON-LD, a JSON-based format to serial- ize Linked Data. Just like the Web of Data that is based on the relationships among structured data, the Web(VTT) of Data is based on relationships among videos based on WebVTT files, which we use as Web-native spatiotemporal Linked Data containers with JSON-LD payloads. In a first step, we provide necessary background information on the technologies we use. In a second step, we perform a large- scale analysis of the 148 terabyte size Common Crawl corpus in order to get a better understanding of the status quo of Web video deployment and address the challenge of integrat- ing the detected videos in the Common Crawl corpus into the Web(VTT) of Data. In a third step, we open-source an online video annotation creation and consumption tool, targeted at videos not contained in the Common Crawl cor- pus and for integrating future video creations, allowing for weaving the Web(VTT) of Data tighter, video by video

    BL Lac Contribution to the Extragalactic Gamma-Ray Background

    Get PDF
    Very high energy gamma-rays from blazars traversing cosmological distances through the metagalactic radiation field can convert into electron-positron pairs in photon-photon collisions. The converted gamma-rays initiate electromagnetic cascades driven by inverse-Compton scattering off the microwave background photons. Using a model for the time-dependent metagalactic radiation field consistent with all currently available far-infrared-to-optical data, we calculate the cascade contribution from faint, unresolved high- and low-peaked blazars to the extragalactic gamma-ray background as measured by EGRET. For low-peaked blazars, we adopt a spectral index consistent with the mean spectral index of EGRET detected blazars, and the luminosity function determined by Chiang and Mukherjee (1998). For high-peaked blazars, we adopt template spectra matching prototype sources observed with air-Cherenkov telescopes up to 30 TeV, and a luminosity function based on X-ray measurements. The low number of about 20 for nearby high-peaked blazars with a flux exceeding 10^-11 cm^-2 s^-1 above 300 GeV inferred from the luminosity function is consistent with the results from air-Cherenkov telescope observations. Including the cascade emission from higher redshifts, the total high-peaked blazar contribution to the observed gamma-ray background at GeV energies can account up to about 30.Comment: 8 pages, 7 figures, accepted by A&A, final versio

    Two New Loci for Body-Weight Regulation Identified in a Joint Analysis of Genome-Wide Association Studies for Early-Onset Extreme Obesity in French and German Study Groups

    Get PDF
    Meta-analyses of population-based genome-wide association studies (GWAS) in adults have recently led to the detection of new genetic loci for obesity. Here we aimed to discover additional obesity loci in extremely obese children and adolescents. We also investigated if these results generalize by estimating the effects of these obesity loci in adults and in population-based samples including both children and adults. We jointly analysed two GWAS of 2,258 individuals and followed-up the best, according to lowest p-values, 44 single nucleotide polymorphisms (SNP) from 21 genomic regions in 3,141 individuals. After this DISCOVERY step, we explored if the findings derived from the extremely obese children and adolescents (10 SNPs from 5 genomic regions) generalized to (i) the population level and (ii) to adults by genotyping another 31,182 individuals (GENERALIZATION step). Apart from previously identified FTO, MC4R, and TMEM18, we detected two new loci for obesity: one in SDCCAG8 (serologically defined colon cancer antigen 8 gene; p = 1.85610 x 10(-8) in the DISCOVERY step) and one between TNKS (tankyrase, TRF1-interacting ankyrin-related ADP-ribose polymerase gene) and MSRA (methionine sulfoxide reductase A gene; p = 4.84 x 10(-7)), the latter finding being limited to children and adolescents as demonstrated in the GENERALIZATION step. The odds ratios for early-onset obesity were estimated at similar to 1.10 per risk allele for both loci. Interestingly, the TNKS/MSRA locus has recently been found to be associated with adult waist circumference. In summary, we have completed a meta-analysis of two GWAS which both focus on extremely obese children and adolescents and replicated our findings in a large followed-up data set. We observed that genetic variants in or near FTO, MC4R, TMEM18, SDCCAG8, and TNKS/MSRA were robustly associated with early-onset obesity. We conclude that the currently known major common variants related to obesity overlap to a substantial degree between children and adults

    Decentralizing the Social Web

    Get PDF
    International audienceFor over a decade, standards bodies like the IETF and W3C have attempted to prevent the centralization of the Web via the use of open standards for 'permission-less innovation.' Yet today, these standards , from OAuth to RSS, seem to have failed to prevent the massive centralization of the Web at the hands of a few major corporations like Google and Facebook. We'll delve deep into the lessons of failed attempts to replace DNS like XRIs, identity systems like OpenID, and metadata formats like the Semantic Web, all of which were recuperated by centralized platforms like Facebook as Facebook Connect and the "Like" Button. Learning from the past, a new generation of blockchain standards and governance mechanisms may be our last, best chance to save the Web
    • …
    corecore