905 research outputs found

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Provenance : from long-term preservation to query federation and grid reasoning

    Get PDF

    An Empirical Examination of the Associations between Social Tags and Web Queries

    Get PDF
    Introduction. We aim to discover the associations between social tags for a Web page and Web queries that would retrieve the same Webpage in three major search engines. Method. 4,827 query terms were submitted to the three major search engines to acquire search engine results pages. A series of Perl scripts were written to read search engine results pages and to identify, analyse, and extract organic links Analysis. Web pages from the organic links in search engine results pages were examined to see whether and how they had been tagged in Delicious. Only the Webpages tagged by at least 100 taggers were included in this study. The top thirty popular social tags used were harvested. The two sets of data were quantitatively analysed to investigate the research questions. Results. At least 60% of search engines\u27 query terms overlapped with social tags in Delicious; higher ranked social tags were more likely to be used as query terms for the same Web resources; and the co-occurring pattern of query terms and social tags over social ranking resembled a power law distribution. Conclusions. Socially tagged resources are likely to be highly ranked in search engine results pages. The findings can be applicable to the future study of Web resource related tasks such as Web searching and Web indexing

    Interoperability of semantics in news production

    Get PDF

    Expanding the Usage of Web Archives by Recommending Archived Webpages Using Only the URI

    Get PDF
    Web archives are a window to view past versions of webpages. When a user requests a webpage on the live Web, such as http://tripadvisor.com/where_to_t ravel/, the webpage may not be found, which results in an HyperText Transfer Protocol (HTTP) 404 response. The user then may search for the webpage in a Web archive, such as the Internet Archive. Unfortunately, if this page had never been archived, the user will not be able to view the page, nor will the user gain any information on other webpages that have similar content in the archive, such as the archived webpage http://classy-travel.net. Similarly, if the user requests the webpage http://hokiesports.com/football/ from the Internet Archive, the user will only find the requested webpage, and the user will not gain any information on other webpages that have similar content in the archive, such as the archived webpage http://techsideline.com. In this research, we will build a model for selecting and ranking possible recommended webpages at a Web archive. This is to enhance both HTTP 404 responses and HTTP 200 responses by surfacing webpages in the archive that the user may not know existed. First, we detect semantics in the requested Uniform Resource Identifier (URI). Next, we classify the URI using an ontology, such as DMOZ or any website directory. Finally, we filter and rank candidates based on several features, such as archival quality, webpage popularity, temporal similarity, and content similarity. We measure the performance of each step using different techniques, including calculating the F1 to measure of different tokenization methods and the classification. We tested the model using human evaluation to determine if we could classify and find recommendations for a sample of requests from the Internet Archive’s Wayback Machine access log. Overall, when selecting the full categorization, reviewers agreed with 80.3% of the recommendations, which is much higher than “do not agree” and “I do not know”. This indicates the reviewer is more likely to agree on the recommendations when selecting the full categorization. But when selecting the first level only, reviewers only agreed with 25.5% of the recommendations. This indicates that having deep level categorization improves the performance of finding relevant recommendations

    Role of Semantic web in the changing context of Enterprise Collaboration

    Get PDF
    In order to compete with the global giants, enterprises are concentrating on their core competencies and collaborating with organizations that compliment their skills and core activities. The current trend is to develop temporary alliances of independent enterprises, in which companies can come together to share skills, core competencies and resources. However, knowledge sharing and communication among multidiscipline companies is a complex and challenging problem. In a collaborative environment, the meaning of knowledge is drastically affected by the context in which it is viewed and interpreted; thus necessitating the treatment of structure as well as semantics of the data stored in enterprise repositories. Keeping the present market and technological scenario in mind, this research aims to propose tools and techniques that can enable companies to assimilate distributed information resources and achieve their business goals
    • 

    corecore