227 research outputs found
Best of Both Worlds – Relational Databases and Statistics
Statistics software packages and relational database systems possess
considerable overlap in the area of data loading, handling, and
transformation. However, only databases are mainly optimized
towards high performance in this area. In this paper, we present
our approach on bringing the best of these two worlds together.
We integrate the analytics-optimized database MonetDB and the R
environment for statistical computing in a non-obtrusive, transparent
and compatible way
Column Stores as an IR Prototyping Tool
. We make the suggestion that instead of implementing custom
index structures and query evaluation algorithms, IR researchers should
simply store document representations in a column-oriented relational
database and write ranking models using SQL. For rapid prototyping, this
is particularly advantageous since researchers can explore new ranking
functions and features by simply issuing SQL queries, without needing to
write imperative code. We demonstrate the feasibility of this approach
by an implementation of conjunctive BM25 using MonetDB on a part of
the ClueWeb12 collection
Verification and Validation of Semantic Annotations
In this paper, we propose a framework to perform verification and validation
of semantically annotated data. The annotations, extracted from websites, are
verified against the schema.org vocabulary and Domain Specifications to ensure
the syntactic correctness and completeness of the annotations. The Domain
Specifications allow checking the compliance of annotations against
corresponding domain-specific constraints. The validation mechanism will detect
errors and inconsistencies between the content of the analyzed schema.org
annotations and the content of the web pages where the annotations were found.Comment: Accepted for the A.P. Ershov Informatics Conference 2019(the PSI
Conference Series, 12th edition) proceedin
Weaving the Web(VTT) of Data
International audienceVideo has become a first class citizen on the Web with broad support in all common Web browsers. Where with struc- tured mark-up on webpages we have made the vision of the Web of Data a reality, in this paper, we propose a new vi- sion that we name the Web(VTT) of Data, alongside with concrete steps to realize this vision. It is based on the evolving standards WebVTT for adding timed text tracks to videos and JSON-LD, a JSON-based format to serial- ize Linked Data. Just like the Web of Data that is based on the relationships among structured data, the Web(VTT) of Data is based on relationships among videos based on WebVTT files, which we use as Web-native spatiotemporal Linked Data containers with JSON-LD payloads. In a first step, we provide necessary background information on the technologies we use. In a second step, we perform a large- scale analysis of the 148 terabyte size Common Crawl corpus in order to get a better understanding of the status quo of Web video deployment and address the challenge of integrat- ing the detected videos in the Common Crawl corpus into the Web(VTT) of Data. In a third step, we open-source an online video annotation creation and consumption tool, targeted at videos not contained in the Common Crawl cor- pus and for integrating future video creations, allowing for weaving the Web(VTT) of Data tighter, video by video
BL Lac Contribution to the Extragalactic Gamma-Ray Background
Very high energy gamma-rays from blazars traversing cosmological distances
through the metagalactic radiation field can convert into electron-positron
pairs in photon-photon collisions. The converted gamma-rays initiate
electromagnetic cascades driven by inverse-Compton scattering off the microwave
background photons. Using a model for the time-dependent metagalactic radiation
field consistent with all currently available far-infrared-to-optical data, we
calculate the cascade contribution from faint, unresolved high- and low-peaked
blazars to the extragalactic gamma-ray background as measured by EGRET. For
low-peaked blazars, we adopt a spectral index consistent with the mean spectral
index of EGRET detected blazars, and the luminosity function determined by
Chiang and Mukherjee (1998). For high-peaked blazars, we adopt template spectra
matching prototype sources observed with air-Cherenkov telescopes up to 30 TeV,
and a luminosity function based on X-ray measurements. The low number of about
20 for nearby high-peaked blazars with a flux exceeding 10^-11 cm^-2 s^-1 above
300 GeV inferred from the luminosity function is consistent with the results
from air-Cherenkov telescope observations. Including the cascade emission from
higher redshifts, the total high-peaked blazar contribution to the observed
gamma-ray background at GeV energies can account up to about 30.Comment: 8 pages, 7 figures, accepted by A&A, final versio
Two New Loci for Body-Weight Regulation Identified in a Joint Analysis of Genome-Wide Association Studies for Early-Onset Extreme Obesity in French and German Study Groups
Meta-analyses of population-based genome-wide association studies (GWAS) in adults have recently led to the detection of new genetic loci for obesity. Here we aimed to discover additional obesity loci in extremely obese children and adolescents. We also investigated if these results generalize by estimating the effects of these obesity loci in adults and in population-based samples including both children and adults. We jointly analysed two GWAS of 2,258 individuals and followed-up the best, according to lowest p-values, 44 single nucleotide polymorphisms (SNP) from 21 genomic regions in 3,141 individuals. After this DISCOVERY step, we explored if the findings derived from the extremely obese children and adolescents (10 SNPs from 5 genomic regions) generalized to (i) the population level and (ii) to adults by genotyping another 31,182 individuals (GENERALIZATION step). Apart from previously identified FTO, MC4R, and TMEM18, we detected two new loci for obesity: one in SDCCAG8 (serologically defined colon cancer antigen 8 gene; p = 1.85610 x 10(-8) in the DISCOVERY step) and one between TNKS (tankyrase, TRF1-interacting ankyrin-related ADP-ribose polymerase gene) and MSRA (methionine sulfoxide reductase A gene; p = 4.84 x 10(-7)), the latter finding being limited to children and adolescents as demonstrated in the GENERALIZATION step. The odds ratios for early-onset obesity were estimated at similar to 1.10 per risk allele for both loci. Interestingly, the TNKS/MSRA locus has recently been found to be associated with adult waist circumference. In summary, we have completed a meta-analysis of two GWAS which both focus on extremely obese children and adolescents and replicated our findings in a large followed-up data set. We observed that genetic variants in or near FTO, MC4R, TMEM18, SDCCAG8, and TNKS/MSRA were robustly associated with early-onset obesity. We conclude that the currently known major common variants related to obesity overlap to a substantial degree between children and adults
Decentralizing the Social Web
International audienceFor over a decade, standards bodies like the IETF and W3C have attempted to prevent the centralization of the Web via the use of open standards for 'permission-less innovation.' Yet today, these standards , from OAuth to RSS, seem to have failed to prevent the massive centralization of the Web at the hands of a few major corporations like Google and Facebook. We'll delve deep into the lessons of failed attempts to replace DNS like XRIs, identity systems like OpenID, and metadata formats like the Semantic Web, all of which were recuperated by centralized platforms like Facebook as Facebook Connect and the "Like" Button. Learning from the past, a new generation of blockchain standards and governance mechanisms may be our last, best chance to save the Web
- …