4,868 research outputs found

    ViTOR: Learning to Rank Webpages Based on Visual Features

    Get PDF
    The visual appearance of a webpage carries valuable information about its quality and can be used to improve the performance of learning to rank (LTR). We introduce the Visual learning TO Rank (ViTOR) model that integrates state-of-the-art visual features extraction methods by (i) transfer learning from a pre-trained image classification model, and (ii) synthetic saliency heat maps generated from webpage snapshots. Since there is currently no public dataset for the task of LTR with visual features, we also introduce and release the ViTOR dataset, containing visually rich and diverse webpages. The ViTOR dataset consists of visual snapshots, non-visual features and relevance judgments for ClueWeb12 webpages and TREC Web Track queries. We experiment with the proposed ViTOR model on the ViTOR dataset and show that it significantly improves the performance of LTR with visual featuresComment: In Proceedings of the 2019 World Wide Web Conference (WWW 2019), May 2019, San Francisc

    On Constructing Persistent Identifiers with Persistent Resolution Targets

    Get PDF
    Persistent Identifiers (PID) are the foundation referencing digital assets in scientific publications, books, and digital repositories. In its realization, PIDs contain metadata and resolving targets in form of URLs that point to data sets located on the network. In contrast to PIDs, the target URLs are typically changing over time; thus, PIDs need continuous maintenance -- an effort that is increasing tremendously with the advancement of e-Science and the advent of the Internet-of-Things (IoT). Nowadays, billions of sensors and data sets are subject of PID assignment. This paper presents a new approach of embedding location independent targets into PIDs that allows the creation of maintenance-free PIDs using content-centric network technology and overlay networks. For proving the validity of the presented approach, the Handle PID System is used in conjunction with Magnet Link access information encoding, state-of-the-art decentralized data distribution with BitTorrent, and Named Data Networking (NDN) as location-independent data access technology for networks. Contrasting existing approaches, no green-field implementation of PID or major modifications of the Handle System is required to enable location-independent data dissemination with maintenance-free PIDs.Comment: Published IEEE paper of the FedCSIS 2016 (SoFAST-WS'16) conference, 11.-14. September 2016, Gdansk, Poland. Also available online: http://ieeexplore.ieee.org/document/7733372

    WSN infrastructure for green campus development

    Get PDF
    A system providing accurate environmental data for campus stakeholders to formulate and evaluate policies of the sustainable campus development is needed. This paper presents the design of WSN infrastructure capable of providing accurate, real-time and reliable environment data, namely PM2.5, SO2, CO, O3, NO2, temperature, humidity, soil moisture and light intensity to be analyzed and presented by servers. This infrastructure is composed of fixed sensor nodes, mobile sensor nodes, display nodes and server nodes. The sensor node provides environment raw data to the server using an RF transceiver. The server processes, stores and presents environment information to public users through Internet and mobile network. This infrastructure can be used as a platform to provide environmental data to decision support system for campus stakeholders, so that a recommendation can be made

    Robust identification of local adaptation from allele frequencies

    Full text link
    Comparing allele frequencies among populations that differ in environment has long been a tool for detecting loci involved in local adaptation. However, such analyses are complicated by an imperfect knowledge of population allele frequencies and neutral correlations of allele frequencies among populations due to shared population history and gene flow. Here we develop a set of methods to robustly test for unusual allele frequency patterns, and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv. Using this model, we calculate a set of `standardized allele frequencies' that allows investigators to apply tests of their choice to multiple populations, while accounting for sampling and covariance due to population history. We illustrate this first by showing that these standardized frequencies can be used to calculate powerful tests to detect non-parametric correlations with environmental variables, which are also less prone to spurious results due to outlier populations. We then demonstrate how these standardized allele frequencies can be used to construct a test to detect SNPs that deviate strongly from neutral population structure. This test is conceptually related to FST but should be more powerful as we account for population history. We also extend the model to next-generation sequencing of population pools, which is a cost-efficient way to estimate population allele frequencies, but it implies an additional level of sampling noise. The utility of these methods is demonstrated in simulations and by re-analyzing human SNP data from the HGDP populations. An implementation of our method will be available from http://gcbias.org.Comment: 27 pages, 7 figure

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    Compressão de imagem médica para arquivos de alto desempenho

    Get PDF
    Information systems and the medical subject are two widespread topics that have interwoven so that medical help could become more efficient. This relation has bred the PACS and the international standard DICOM directed to the organization of digital medical information. The concept of image compression is applied to most images throughout the web. The compression formats used for medical imaging have become outdated. The new formats that have been developed in the past few years are candidates for replacing the old ones in such contexts, possibly enhancing the process. Before they are adopted, an evaluation should be carried out that validates their admissibility. This dissertation reviews the state of the art of medical imaging information systems, namely PACS systems and the DICOM standard. Furthermore, some topics of image compression are covered, such as the metrics for evaluating the algorithms’ performance, finalizing with a survey of four modern formats: JPEG XL, AVIF, and WebP. Two software projects were developed, where the first one carries out an analysis of the formats based on the metrics, using DICOM datasets and producing results that can be used for creating recommendations on the format’s use. The second consists of an application that encodes and decodes medical images with the formats covered in this dissertation. This proof-of-concept works as a medical imaging archive for the storage, distribution, and visualization of compressed data.Os sistemas de informação e o assunto médico são dois temas difundidos que se entrelaçam para que a ajuda médica se torne mais eficiente. Essa relação deu origem ao PACS e ao padrão internacional DICOM direcionado à organização da informação médica digital. O conceito de compressão de imagem é aplicado à maioria das imagens em toda a web. Os formatos de compressão usados para imagens médicas tornaram-se desatualizados. Os novos formatos desenvolvidos nos últimos anos são candidatos a substituir os antigos nesses contextos, possivelmente potencializando o processo. Antes de serem adotados, deve ser realizada uma avaliação que valide sua admissibilidade. Esta dissertação revisa o estado da arte dos sistemas de informação de imagens médicas, nomeadamente os sistemas PACS e a norma DICOM. Além disso, são abordados alguns tópicos de compressão de imagens, como as métricas para avaliação do desempenho dos algoritmos, finalizando com um levantamento de três formatos modernos: JPEG XL, AVIF e WebP. Foram desenvolvidos dois projetos de software, onde o primeiro realiza uma análise dos formatos com base nas métricas, utilizando conjuntos de dados DICOM e produzindo resultados que podem ser utilizados para a criação de recomendações sobre o uso do formato. A segunda consiste numa aplicação capaz de codificar e descodificar imagens médicas com os formatos abordados nesta dissertação. Essa prova de conceito funciona como um arquivo de imagens médicas para armazenamento, distribuição e visualização de dados compactados.Mestrado em Engenharia de Computadores e Telemátic

    Associations of NINJ2 sequence variants with incident ischemic stroke in the Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) consortium

    Get PDF
    Background<p></p> Stroke, the leading neurologic cause of death and disability, has a substantial genetic component. We previously conducted a genome-wide association study (GWAS) in four prospective studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium and demonstrated that sequence variants near the NINJ2 gene are associated with incident ischemic stroke. Here, we sought to fine-map functional variants in the region and evaluate the contribution of rare variants to ischemic stroke risk.<p></p> Methods and Results<p></p> We sequenced 196 kb around NINJ2 on chromosome 12p13 among 3,986 European ancestry participants, including 475 ischemic stroke cases, from the Atherosclerosis Risk in Communities Study, Cardiovascular Health Study, and Framingham Heart Study. Meta-analyses of single-variant tests for 425 common variants (minor allele frequency [MAF] ≥ 1%) confirmed the original GWAS results and identified an independent intronic variant, rs34166160 (MAF = 0.012), most significantly associated with incident ischemic stroke (HR = 1.80, p = 0.0003). Aggregating 278 putatively-functional variants with MAF≤ 1% using count statistics, we observed a nominally statistically significant association, with the burden of rare NINJ2 variants contributing to decreased ischemic stroke incidence (HR = 0.81; p = 0.026).<p></p> Conclusion<p></p> Common and rare variants in the NINJ2 region were nominally associated with incident ischemic stroke among a subset of CHARGE participants. Allelic heterogeneity at this locus, caused by multiple rare, low frequency, and common variants with disparate effects on risk, may explain the difficulties in replicating the original GWAS results. Additional studies that take into account the complex allelic architecture at this locus are needed to confirm these findings
    • …
    corecore