Search CORE

31,912 research outputs found

Blog feed search with a post index

Author: C. Manning
C. Zhai
D. J. C. Mackay
J. He
K. Balog
Krisztian Balog
Maarten de Rijke
Wouter Weerkamp
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

BlogForever D2.4: Weblog spider prototype and associated methodology

Author: Banos V.
Gulliksen M.
Joy M.
Manolopoulos I.
Rynning M.
Stepanyan K.
Tselepidis I.
Publication venue
Publication date: 25/10/2013
Field of study

The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype

ZENODO

Keeping Up To Date with IP News Services and Blogs: Drowning in a Sea Of Sameness?

Author: Cavicchi Jon R.
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2006
Field of study

It seems like so many IP related Websites you visit invite you to join their free email list to keep you up to date. Sources span a wide spectrum including governmental organizations, non-governmental organizations, educational institutions, consulting services, law firms, commercial publishers and more. These sources span the spectrum from free, to low fee to premium pricing. With all of this information overload and choices, how do you differentiate and choose news sources? The goals of this article are twofold. Goal one is to present a survey of types and categories of IP news tools available to IP researchers. Since these tools change with time, goal two is to present strategies and approaches to consider when assembling your portfolio of news sources. I use the term researcher to include anyone looking for news, including lawyers, paraprofessionals, academics, students, corporate searchers and more. Some of this material may be yesterday\u27s news for some and breaking news for others. My hope is that you will find value added in some tools and strategies. Before I present the survey of tools, I want to propose some initial general strategies that might be helpful to apply as the detail of the tools unfold

UNH Scholars' Repository

Coping with noise in a real-world weblog crawler and retrieval system

Author: Ferguson Paul
Lanagan James
O'Hare Neil
Smeaton Alan F.
Publication venue
Publication date: 01/05/2010
Field of study

In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise removal from blog pages, examining the difficulties encountered when crawling the blogosphere during the creation of a real-world corpus of blog pages. We introduce and evaluate a number of enhancements to the original DiffPost approach in order to increase the robustness of the algorithm. We then extend DiffPost by looking at the anchor-text to text ratio, and dis- cover that the time-interval between crawls is more impor- tant to the successful application of noise-removal algorithms within the blog context, than any additional improvements to the removal algorithm itself

Irish Universities

DCU Online Research Access Service

Design Patterns for Fusion-Based Object Retrieval

Author: C Macdonald
H Fang
M Shokouhi
W Weerkamp
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/08/2017
Field of study

We address the task of ranking objects (such as people, blogs, or verticals) that, unlike documents, do not have direct term-based representations. To be able to match them against keyword queries, evidence needs to be amassed from documents that are associated with the given object. We present two design patterns, i.e., general reusable retrieval strategies, which are able to encompass most existing approaches from the past. One strategy combines evidence on the term level (early fusion), while the other does it on the document level (late fusion). We demonstrate the generality of these patterns by applying them to three different object retrieval tasks: expert finding, blog distillation, and vertical ranking.Comment: Proceedings of the 39th European conference on Advances in Information Retrieval (ECIR '17), 201

arXiv.org e-Print Archive

Crossref

BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

Author: Banos Vangelis
Kasioumis Nikolaos
Kim Yunhyong
Kopidaki Stella
Ross Seamus
Rynning Morten
Stepanyan Karen
Publication venue: BlogForever
Publication date: 25/10/2013
Field of study

This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

ZENODO

Enlighten

Realization of Semantic Atom Blog

Author: Khuba Sidheshwar A.
Patel Dhiren R.
Publication venue
Publication date: 01/12/2009
Field of study

Web blog is used as a collaborative platform to publish and share information. The information accumulated in the blog intrinsically contains the knowledge. The knowledge shared by the community of people has intangible value proposition. The blog is viewed as a multimedia information resource available on the Internet. In a blog, information in the form of text, image, audio and video builds up exponentially. The multimedia information contained in an Atom blog does not have the capability, which is required by the software processes so that Atom blog content can be accessed, processed and reused over the Internet. This shortcoming is addressed by exploring OWL knowledge modeling, semantic annotation and semantic categorization techniques in an Atom blog sphere. By adopting these techniques, futuristic Atom blogs can be created and deployed over the Internet

arXiv.org e-Print Archive

IIT Gandhinagar

Academics' online presence guidelines: A four step guide to taking control of your visibility

Author: Edshare Edshare
Publication venue
Publication date: 01/05/2013
Field of study

OpenUCT published Academics' online presence guidelines: A four step guide to taking control of your visibility in 2012

EdShare

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY