84,787 research outputs found

    Efficient filtering of adult content using textual information

    Full text link
    Nowadays adult content represents a non negligible proportion of the Web content. It is of the utmost importance to protect children from this content. Search engines, as an entry point for Web navigation are ideally placed to deal with this issue. In this paper, we propose a method that builds a safe index i.e. adult-content free for search engines. This method is based on a filter that uses only textual information from the web page and the associated URL

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    GenomeVIP: A cloud platform for genomic variant discovery and interpretation

    Get PDF
    Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional “download and analyze” paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.</jats:p

    Recommender Systems

    Get PDF
    The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

    Improving Reachability and Navigability in Recommender Systems

    Full text link
    In this paper, we investigate recommender systems from a network perspective and investigate recommendation networks, where nodes are items (e.g., movies) and edges are constructed from top-N recommendations (e.g., related movies). In particular, we focus on evaluating the reachability and navigability of recommendation networks and investigate the following questions: (i) How well do recommendation networks support navigation and exploratory search? (ii) What is the influence of parameters, in particular different recommendation algorithms and the number of recommendations shown, on reachability and navigability? and (iii) How can reachability and navigability be improved in these networks? We tackle these questions by first evaluating the reachability of recommendation networks by investigating their structural properties. Second, we evaluate navigability by simulating three different models of information seeking scenarios. We find that with standard algorithms, recommender systems are not well suited to navigation and exploration and propose methods to modify recommendations to improve this. Our work extends from one-click-based evaluations of recommender systems towards multi-click analysis (i.e., sequences of dependent clicks) and presents a general, comprehensive approach to evaluating navigability of arbitrary recommendation networks
    • 

    corecore