1,170 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Designing Systems that Support the Blogosphere for Deliberative Discourse

    Get PDF
    Web 2.0 has great potential to serve as a public sphere (Habermas, 1974; Habermas, 1989) – a distributed arena of voices where all who want to do so can participate. A well-functioning public sphere is important for pluralistic decision-making at many levels, ranging from small organizations to society at large. In this paper, we analyze the capability of the blogosphere in its current form to support such a role. This analysis leads to the identification of the principal issues that prevent the blogosphere from realizing its full potential as a public sphere. Most significantly, we propose that the sheer volume of content overwhelms blog readers, forcing them to restrict themselves to only a small subset of valuable content. This ultimately reduces their level of informedness. Based on past research on managing discourse, we propose four design artifacts that would alleviate these issues: a communal repository, textual clustering, visual cues, and a participation facility for blog users. We present a prototype system, called FeedWiz, which implements several of these design artifacts. Based on this initial design, we formulate a research agenda for the creation of new tools that effectively harness the potential of the growing body of user-generated content in the blogosphere and beyond

    Web archives: the future

    Get PDF
    T his report is structured first, to engage in some speculative thought about the possible futures of the web as an exercise in prom pting us to think about what we need to do now in order to make sure that we can reliably and fruitfully use archives of the w eb in the future. Next, we turn to considering the methods and tools being used to research the live web, as a pointer to the types of things that can be developed to help unde rstand the archived web. Then , we turn to a series of topics and questions that researchers want or may want to address using the archived web. In this final section, we i dentify some of the challenges individuals, organizations, and international bodies can target to increase our ability to explore these topi cs and answer these quest ions. We end the report with some conclusions based on what we have learned from this exercise

    The role of Web 2.0 tools in collaborative learning

    Get PDF
    Web 2.0 is a debatable term and draws much argument. In spite of one’s opinion towards the term, Web 2.0 tools such as blogs, wikis, podcasts and RSS feeds are enormously used in learning environments. In this sense, the overall purpose of this research was to investigate potential of using different Web 2.0 tools in collaborative learning as well as their advantage. Four interviews have been conducted with the user of Web 2.0 tools and number of documents has been taken as empirical data to analysis what Web 2.0 tools are preferred to use in collaborative learning and what will be the advantages of using Web 2.0 tools in education. This research work represents a framework for Web 2.0 tools through the assembly of literature and empirical data which describe the course of action in learning and benefits of these tools

    Farm 2.0 Using Wordpress to Manage Geocontent and Promote Regional Food Products

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies.Recent innovations in geospatial technology have dramatically increased the utility and ubiquity of cartographic interfaces and spatially-referenced content on the web. Capitalizing on these developments, the Farm2.0 system demonstrates an approach to manage user-generated geocontent pertaining to European protected designation of origin (PDO) food products.Wordpress, a popular open-source publishing platform, supplies the framework for a geographic content management system, or GeoCMS, to promote PDO products in the Spanish province of Valencia. The Wordpress platform is modified through a suite of plug-ins and customizations to create an extensible application that could be easily deployed in other regions and administrated cooperatively by distributed regulatory councils. Content, either regional recipes or map locations for vendors and farms, is available for syndication as a GeoRSS feed and aggregated with outside feeds in a dynamic web map

    A series of case studies to enhance the social utility of RSS

    Get PDF
    RSS (really simple syndication, rich site summary or RDF site summary) is a dialect of XML that provides a method of syndicating on-line content, where postings consist of frequently updated news items, blog entries and multimedia. RSS feeds, produced by organisations or individuals, are often aggregated, and delivered to users for consumption via readers. The semi-structured format of RSS also allows the delivery/exchange of machine-readable content between different platforms and systems. Articles on web pages frequently include icons that represent social media services which facilitate social data. Amongst these, RSS feeds deliver data which is typically presented in the journalistic style of headline, story and snapshot(s). Consequently, applications and academic research have employed RSS on this basis. Therefore, within the context of social media, the question arises: can the social function, i.e. utility, of RSS be enhanced by producing from it data which is actionable and effective? This thesis is based upon the hypothesis that the fluctuations in the keyword frequencies present in RSS can be mined to produce actionable and effective data, to enhance the technology's social utility. To this end, we present a series of laboratory-based case studies which demonstrate two novel and logically consistent RSS-mining paradigms. Our first paradigm allows users to define mining rules to mine data from feeds. The second paradigm employs a semi-automated classification of feeds and correlates this with sentiment. We visualise the outputs produced by the case studies for these paradigms, where they can benefit users in real-world scenarios, varying from statistics and trend analysis to mining financial and sporting data. The contributions of this thesis to web engineering and text mining are the demonstration of the proof of concept of our paradigms, through the integration of an array of open-source, third-party products into a coherent and innovative, alpha-version prototype software implemented in a Java JSP/servlet-based web application architecture

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    MusA: Using Indoor Positioning and Navigation to Enhance Cultural Experiences in a museum

    Get PDF
    In recent years there has been a growing interest into the use of multimedia mobile guides in museum environments. Mobile devices have the capabilities to detect the user context and to provide pieces of information suitable to help visitors discovering and following the logical and emotional connections that develop during the visit. In this scenario, location based services (LBS) currently represent an asset, and the choice of the technology to determine users' position, combined with the definition of methods that can effectively convey information, become key issues in the design process. In this work, we present MusA (Museum Assistant), a general framework for the development of multimedia interactive guides for mobile devices. Its main feature is a vision-based indoor positioning system that allows the provision of several LBS, from way-finding to the contextualized communication of cultural contents, aimed at providing a meaningful exploration of exhibits according to visitors' personal interest and curiosity. Starting from the thorough description of the system architecture, the article presents the implementation of two mobile guides, developed to respectively address adults and children, and discusses the evaluation of the user experience and the visitors' appreciation of these application
    • …
    corecore