1,997 research outputs found
Exploiting multimedia in creating and analysing multimedia Web archives
The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general
Topic-dependent sentiment analysis of financial blogs
While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches
Cloud service discovery and analysis: a unified framework
Over the past few years, cloud computing has been more and more attractive as a new
computing paradigm due to high flexibility for provisioning on-demand computing
resources that are used as services through the Internet. The issues around cloud service
discovery have considered by many researchers in the recent years. However,
in cloud computing, with the highly dynamic, distributed, the lack of standardized
description languages, diverse services offered at different levels and non-transparent
nature of cloud services, this research area has gained a significant attention. Robust
cloud service discovery approaches will assist the promotion and growth of cloud
service customers and providers, but will also provide a meaningful contribution to
the acceptance and development of cloud computing. In this dissertation, we have
proposed an automated cloud service discovery approach of cloud services. We have
also conducted extensive experiments to validate our proposed approach. The results
demonstrate the applicability of our approach and its capability of effectively identifying
and categorizing cloud services on the Internet. Firstly, we develop a novel
approach to build cloud service ontology. Cloud service ontology initially is built
based on the National Institute of Standards and Technology (NIST) cloud computing
standard. Then, we add new concepts to ontology by automatically analyzing real
cloud services based on cloud service ontology Algorithm. We also propose cloud
service categorization that use Term Frequency to weigh cloud service ontology concepts
and calculate cosine similarity to measure the similarity between cloud services.
The cloud service categorization algorithm is able to categorize cloud services to clusters for effective categorization of cloud services. In addition, we use Machine
Learning techniques to identify cloud service in real environment. Our cloud service
identifier is built by utilizing cloud service features extracted from the real cloud service
providers. We determine several features such as similarity function, semantic
ontology, cloud service description and cloud services components, to be used effectively
in identifying cloud service on the Web. Also, we build a unified model to
expose the cloud service’s features to a cloud service search user to ease the process of
searching and comparison among a large amount of cloud services by building cloud
service’s profile. Furthermore, we particularly develop a cloud service discovery Engine
that has capability to crawl the Web automatically and collect cloud services.
The collected datasets include meta-data of nearly 7,500 real-world cloud services
providers and nearly 15,000 services (2.45GB). The experimental results show that
our approach i) is able to effectively build automatic cloud service ontology, ii) is
robust in identifying cloud service in real environment and iii) is more scalable in
providing more details about cloud services.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201
An Integrated Information Retrieval Framework for Managing the Digital Web Ecosystem
The information explosion makes the digital Web ecosystem exploration, as a valid web search tool challenging for retrieving relevant information and knowledge. The existing tools are not integrated, and search results are not well managed. In this article, we describe effective information retrieval services for users and agents in various digital ecosystem scenarios. A novel integrated information retrieval framework (IIRF) is proposed, which employs the Web search technologies and traditional database searching techniques to provide comprehensive, dynamic, personalized, and organization-oriented information retrieval services, ranging from the Internet, intranet, to personal desktop. Experiments are carried out demonstrating the improvements in the search process with an average precision of Web search results to standard 11 recall level, attaining improvement from 41.7% of a comparable system to 65.2% of search. A 23.5% precision improvement is achieved with the framework. The comparison made among search engines presents a similar development with satisfactory search results
ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL Anthology
The ACL Anthology is an online repository that serves as a comprehensive
collection of publications in the field of natural language processing (NLP)
and computational linguistics (CL). This paper presents a tool called ``ACL
Anthology Helper''. It automates the process of parsing and downloading papers
along with their meta-information, which are then stored in a local MySQL
database. This allows for efficient management of the local papers using a wide
range of operations, including "where," "group," "order," and more. By
providing over 20 operations, this tool significantly enhances the retrieval of
literature based on specific conditions. Notably, this tool has been
successfully utilised in writing a survey paper (Tang et al.,2022a). By
introducing the ACL Anthology Helper, we aim to enhance researchers' ability to
effectively access and organise literature from the ACL Anthology. This tool
offers a convenient solution for researchers seeking to explore the ACL
Anthology's vast collection of publications while allowing for more targeted
and efficient literature retrieval
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Information retrieval in the Web: beyond current search engines
AbstractIn this paper we briefly explore the challenges to expand information retrieval (IR) on the Web, in particular other types of data, Web mining and issues related to crawling. We also mention the main relations of IR and soft computing and how these techniques address these challenges
- …