485 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Social media analytics: a survey of techniques, tools and platforms

    Get PDF
    This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing

    Integrierte Informationsdienstleistungen fĂŒr die Afrikaforschung: Neuere Entwicklungen in Deutschland und Europa

    Get PDF
    New projects, services and collaborations have recently brought the infrastructural services for African Studies a big step forward. This report gives an account of new subject gateways and digitisation projects. It discusses recent European cooperation ventures in the field of librarianship. Additionally, new developments and services of the Africa Collection at Frankfurt University Library are presented, which help to address the changing needs of researchers and to handle information overload, while keeping up with the latest developments. Nevertheless, the fragmentation and compartmentalisation of the different services still hinder more integrated information services.Neue Projekte, Dienstleistungen und Kooperationen haben die Informationsversorgung der Afrikastudien einen großen Schritt vorangebracht. In diesem Bericht werden neue Fachportale und Digitalisierungsprojekte prĂ€sentiert; die in den vergangenen Jahren intensivierte europĂ€ische Zusammenarbeit der Afrika-Bibliotheken wird nachgezeichnet. Schließlich werden neue Dienstleistungen der Afrika-Sammlung der Frankfurter UniversitĂ€tsbibliothek vorgestellt, die verĂ€nderten BedĂŒrfnissen der Wissenschaftlerinnen und Wissenschaftler Rechnung tragen und es erlauben, die Informationsflut besser zu bewĂ€ltigen und gleichzeitig den Überblick ĂŒber aktuelle Entwicklungen zu behalten. Gleichwohl ist die Fragmentierung unterschiedlicher Dienstleistungsangebote noch nicht ĂŒberwunden

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    BlogForever D2.4: Weblog spider prototype and associated methodology

    Get PDF
    The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype

    Prototype Digital Forensics Repository

    Get PDF
    The explosive growth in technology has led to a new league of a crime involving identity theft, stealing trade secrets, malicious virus attacks, hacking of DVD players, etc. The law enforcement community which has been trained to deal with traditional form of crime, is now being trained in a new realm of Digital Forensics. Forensics investigators have realized that often the most valuable resource available to them is experience and knowledge of fellow investigators. But there is seldom an explicit mechanism for disseminating this knowledge. Hence the same problems and mistakes continue to resurface and the same solutions are re-invented. In this Thesis we design and create a knowledge base, a Digital Forensics Repository, to support the sharing of experiences about the Forensics Investigation Process. It offers capabilities such as submission of lessons, online search and retrieval which will provide a means of querying into an ever increasing knowledge base

    Keyword Search in Social Networks

    Get PDF
    People often tend to ask their friends whenever they want some information related to topics like events, restaurants, or movies as majority of the search engines do not yield the desired results which people are seeking [1]. At present, majority of the current Open Source search engines like those based on Nutch also do not yield desired or expected results. Popular search engine, Google recently incorporated the feature of providing information from your social circle but only limited to Google Plus in your search results. On the other hand, micro blogging site Twitter has emerged as a vital source of information with more than 140 million active users [2] and nearly 250 million new tweets every day [2]. People also like to see more results from the blogs or news websites they follow and generally subscribe to their Really Simple Syndicate(RSS) [3] feed service to get the data and have to use RSS reader to find them. A web search engine which can provide results from user’s social network content along with the indexed web results would be a great deal of help for people interested in results from their social circle. This project’s goal is to include results from your Social Networks (Twitter, RSS feeds) in Yioop! search results by using feeds database created from your Twitter account and RSS feeds you follow

    Prototype Digital Forensics Repository

    Get PDF
    The explosive growth in technology has led to a new league of a crime involving identity theft, stealing trade secrets, malicious virus attacks, hacking of DVD players, etc. The law enforcement community which has been trained to deal with traditional form of crime, is now being trained in a new realm of Digital Forensics. Forensics investigators have realized that often the most valuable resource available to them is experience and knowledge of fellow investigators. But there is seldom an explicit mechanism for disseminating this knowledge. Hence the same problems and mistakes continue to resurface and the same solutions are re-invented. In this Thesis we design and create a knowledge base, a Digital Forensics Repository, to support the sharing of experiences about the Forensics Investigation Process. It offers capabilities such as submission of lessons, online search and retrieval which will provide a means of querying into an ever increasing knowledge base

    BlogForever: D3.1 Preservation Strategy Report

    Get PDF
    This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design

    Semantic Knowledge Graphs for the News: A Review

    Get PDF
    ICT platforms for news production, distribution, and consumption must exploit the ever-growing availability of digital data. These data originate from different sources and in different formats; they arrive at different velocities and in different volumes. Semantic knowledge graphs (KGs) is an established technique for integrating such heterogeneous information. It is therefore well-aligned with the needs of news producers and distributors, and it is likely to become increasingly important for the news industry. This article reviews the research on using semantic knowledge graphs for production, distribution, and consumption of news. The purpose is to present an overview of the field; to investigate what it means; and to suggest opportunities and needs for further research and development.publishedVersio
    • 

    corecore