84,787 research outputs found
Efficient filtering of adult content using textual information
Nowadays adult content represents a non negligible proportion of the Web
content. It is of the utmost importance to protect children from this content.
Search engines, as an entry point for Web navigation are ideally placed to deal
with this issue.
In this paper, we propose a method that builds a safe index i.e.
adult-content free for search engines. This method is based on a filter that
uses only textual information from the web page and the associated URL
BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology
This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software
GenomeVIP: A cloud platform for genomic variant discovery and interpretation
Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional âdownload and analyzeâ paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.</jats:p
Recommender Systems
The ongoing rapid expansion of the Internet greatly increases the necessity
of effective recommender systems for filtering the abundant information.
Extensive research for recommender systems is conducted by a broad range of
communities including social and computer scientists, physicists, and
interdisciplinary researchers. Despite substantial theoretical and practical
achievements, unification and comparison of different approaches are lacking,
which impedes further advances. In this article, we review recent developments
in recommender systems and discuss the major challenges. We compare and
evaluate available algorithms and examine their roles in the future
developments. In addition to algorithms, physical aspects are described to
illustrate macroscopic behavior of recommender systems. Potential impacts and
future directions are discussed. We emphasize that recommendation has a great
scientific depth and combines diverse research fields which makes it of
interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a userâs topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
Improving Reachability and Navigability in Recommender Systems
In this paper, we investigate recommender systems from a network perspective
and investigate recommendation networks, where nodes are items (e.g., movies)
and edges are constructed from top-N recommendations (e.g., related movies). In
particular, we focus on evaluating the reachability and navigability of
recommendation networks and investigate the following questions: (i) How well
do recommendation networks support navigation and exploratory search? (ii) What
is the influence of parameters, in particular different recommendation
algorithms and the number of recommendations shown, on reachability and
navigability? and (iii) How can reachability and navigability be improved in
these networks? We tackle these questions by first evaluating the reachability
of recommendation networks by investigating their structural properties.
Second, we evaluate navigability by simulating three different models of
information seeking scenarios. We find that with standard algorithms,
recommender systems are not well suited to navigation and exploration and
propose methods to modify recommendations to improve this. Our work extends
from one-click-based evaluations of recommender systems towards multi-click
analysis (i.e., sequences of dependent clicks) and presents a general,
comprehensive approach to evaluating navigability of arbitrary recommendation
networks
- âŠ