1,997 research outputs found

    Exploiting multimedia in creating and analysing multimedia Web archives

    No full text
    The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general

    Topic-dependent sentiment analysis of financial blogs

    Get PDF
    While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches

    Cloud service discovery and analysis: a unified framework

    Get PDF
    Over the past few years, cloud computing has been more and more attractive as a new computing paradigm due to high flexibility for provisioning on-demand computing resources that are used as services through the Internet. The issues around cloud service discovery have considered by many researchers in the recent years. However, in cloud computing, with the highly dynamic, distributed, the lack of standardized description languages, diverse services offered at different levels and non-transparent nature of cloud services, this research area has gained a significant attention. Robust cloud service discovery approaches will assist the promotion and growth of cloud service customers and providers, but will also provide a meaningful contribution to the acceptance and development of cloud computing. In this dissertation, we have proposed an automated cloud service discovery approach of cloud services. We have also conducted extensive experiments to validate our proposed approach. The results demonstrate the applicability of our approach and its capability of effectively identifying and categorizing cloud services on the Internet. Firstly, we develop a novel approach to build cloud service ontology. Cloud service ontology initially is built based on the National Institute of Standards and Technology (NIST) cloud computing standard. Then, we add new concepts to ontology by automatically analyzing real cloud services based on cloud service ontology Algorithm. We also propose cloud service categorization that use Term Frequency to weigh cloud service ontology concepts and calculate cosine similarity to measure the similarity between cloud services. The cloud service categorization algorithm is able to categorize cloud services to clusters for effective categorization of cloud services. In addition, we use Machine Learning techniques to identify cloud service in real environment. Our cloud service identifier is built by utilizing cloud service features extracted from the real cloud service providers. We determine several features such as similarity function, semantic ontology, cloud service description and cloud services components, to be used effectively in identifying cloud service on the Web. Also, we build a unified model to expose the cloud service’s features to a cloud service search user to ease the process of searching and comparison among a large amount of cloud services by building cloud service’s profile. Furthermore, we particularly develop a cloud service discovery Engine that has capability to crawl the Web automatically and collect cloud services. The collected datasets include meta-data of nearly 7,500 real-world cloud services providers and nearly 15,000 services (2.45GB). The experimental results show that our approach i) is able to effectively build automatic cloud service ontology, ii) is robust in identifying cloud service in real environment and iii) is more scalable in providing more details about cloud services.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    An Integrated Information Retrieval Framework for Managing the Digital Web Ecosystem

    Get PDF
    The information explosion makes the digital Web ecosystem exploration, as a valid web search tool challenging for retrieving relevant information and knowledge. The existing tools are not integrated, and search results are not well managed. In this article, we describe effective information retrieval services for users and agents in various digital ecosystem scenarios. A novel integrated information retrieval framework (IIRF) is proposed, which employs the Web search technologies and traditional database searching techniques to provide comprehensive, dynamic, personalized, and organization-oriented information retrieval services, ranging from the Internet, intranet, to personal desktop. Experiments are carried out demonstrating the improvements in the search process with an average precision of Web search results to standard 11 recall level, attaining improvement from 41.7% of a comparable system to 65.2% of search. A 23.5% precision improvement is achieved with the framework. The comparison made among search engines presents a similar development with satisfactory search results

    ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL Anthology

    Full text link
    The ACL Anthology is an online repository that serves as a comprehensive collection of publications in the field of natural language processing (NLP) and computational linguistics (CL). This paper presents a tool called ``ACL Anthology Helper''. It automates the process of parsing and downloading papers along with their meta-information, which are then stored in a local MySQL database. This allows for efficient management of the local papers using a wide range of operations, including "where," "group," "order," and more. By providing over 20 operations, this tool significantly enhances the retrieval of literature based on specific conditions. Notably, this tool has been successfully utilised in writing a survey paper (Tang et al.,2022a). By introducing the ACL Anthology Helper, we aim to enhance researchers' ability to effectively access and organise literature from the ACL Anthology. This tool offers a convenient solution for researchers seeking to explore the ACL Anthology's vast collection of publications while allowing for more targeted and efficient literature retrieval

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    Information retrieval in the Web: beyond current search engines

    Get PDF
    AbstractIn this paper we briefly explore the challenges to expand information retrieval (IR) on the Web, in particular other types of data, Web mining and issues related to crawling. We also mention the main relations of IR and soft computing and how these techniques address these challenges
    corecore