268 research outputs found

    Automatic subscriptions in publish-subscribe systems

    Get PDF
    In this paper, we describe how to automate the process of subscribing to complex publish-subscribe systems. We present a proof-of-concept prototype, in which we analyze Web browsing history to generate zero-click subscriptions to Web feeds and video news stories. Our experience so far indicates that user attention data is a promising source of data for automating the subscription process

    BlogForever D2.4: Weblog spider prototype and associated methodology

    Get PDF
    The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype

    CLEAR: a credible method to evaluate website archivability

    Get PDF
    Web archiving is crucial to ensure that cultural, scientific and social heritage on the web remains accessible and usable over time. A key aspect of the web archiving process is optimal data extraction from target websites. This procedure is diïŹƒcult for such reasons as, website complexity, plethora of underlying technologies and ultimately the open-ended nature of the web. The purpose of this work is to establish the notion of Website Archivability (WA) and to introduce the Credible Live Evaluation of Archive Readiness (CLEAR) method to measure WA for any website. Website Archivability captures the core aspects of a website crucial in diagnosing whether it has the potentiality to be archived with completeness and accuracy. An appreciation of the archivability of a web site should provide archivists with a valuable tool when assessing the possibilities of archiving material and in- uence web design professionals to consider the implications of their design decisions on the likelihood could be archived. A prototype application, archiveready.com, has been established to demonstrate the viabiity of the proposed method for assessing Website Archivability

    Distributed Web-Scale Infrastructure For Crawling, Indexing And Search With Semantic Support

    Get PDF
    In this paper, we describe our work in progress in the scope of web-scale informationextraction and information retrieval utilizing distributed computing. Wepresent a distributed architecture built on top of the MapReduce paradigm forinformation retrieval, information processing and intelligent search supportedby spatial capabilities. Proposed architecture is focused on crawling documentsin several different formats, information extraction, lightweight semantic annotationof the extracted information, indexing of extracted information andfinally on indexing of documents based on the geo-spatial information foundin a document. We demonstrate the architecture on two use cases, where thefirst is search in job offers retrieved from the LinkedIn portal and the second issearch in BBC news feeds and discuss several problems we had to face duringthe implementation. We also discuss spatial search applications for both casesbecause both LinkedIn job offer pages and BBC news feeds contain a lot of spatialinformation to extract and process

    Mining News Content for Popularity Prediction

    Get PDF
    The problem of popularity prediction has been studied extensively in various previous research. The idea behind popularity prediction is that the attention users give to online items is unequally distributed, as only a small fraction of all the available content receives serious users attention. Researchers have been experimenting with different methods to find a way to predict that fraction. However, to the best of our knowledge, none of the previous work used the content for popularity prediction; instead, the research looked at other features such as early user reactions (number of views/shares/comments) of the first hours/days to predict the future popularity. These models are built to be easily generalized to all data types from videos (e.g. YouTube videos) and images, to news stories. However, they are not considered very efficient for the news domain as our research shows that most stories get 90% to 100% of the attention that they will ever get on the first day. Thus, it would be much more efficient to estimate the popularity even before an item is seen by the users. In this thesis, we plan to approach the problem in a way that accomplishes that goal. We will narrow our focus to the news domain, and concentrate on the content of news stories. We would like to investigate the ability to predict the popularity of news articles by finding the topics that interest the users and the estimated audience of each topic. Then, given a new news story, we would infer the topics from the story’s content, and based on those topics we would make a prediction for how popular it may become in the future even before it’s released to the public

    Towards the cloudification of the social networks analytics

    Get PDF
    In the last years, with the increase of the available data from social networks and the rise of big data technologies, social data has emerged as one of the most profitable market for companies to increase their benefits. Besides, social computation scientists see such data as a vast ocean of information to study modern human societies. Nowadays, enterprises and researchers are developing their own mining tools in house, or they are outsourcing their social media mining needs to specialised companies with its consequent economical cost. In this paper, we present the first cloud computing service to facilitate the deployment of social media analytics applications to allow data practitioners to use social mining tools as a service. The main advantage of this service is the possibility to run different queries at the same time and combine their results in real time. Additionally, we also introduce twearch, a prototype to develop twitter mining algorithms as services in the cloud.Peer ReviewedPostprint (author’s final draft

    Semantic Web Mining Review

    Get PDF
    This paper describes about Semantic Web Mining . The Purpose of this paper is to focus on how semantic web technologies can be used to mine the web , for relevant information extraction. Semantic Web Mining is about combining the two emerging research areas Semantic Web and Web Mining. Researchers work on improving the result off web mining by using semantic structure in the web and make use of Web Mining techniques for building the Semantic Web. In this manner both technologies are playing vital role to each other. Seman tic Web adds structure to the meaningful content of Web Pages ; hence information is given a well defined meaning; which is both human readable as well as machine - processable. This paper gives an overview of where the two areas meet today , and sketches ways of how a closer integration c ould be profitable
    • 

    corecore