944 research outputs found
WAQS : a web-based approximate query system
The Web is often viewed as a gigantic database holding vast stores of information and provides ubiquitous accessibility to end-users. Since its inception, the Internet has experienced explosive growth both in the number of users and the amount of content available on it. However, searching for information on the Web has become increasingly difficult. Although query languages have long been part of database management systems, the standard query language being the Structural Query Language is not suitable for the Web content retrieval.
In this dissertation, a new technique for document retrieval on the Web is presented. This technique is designed to allow a detailed retrieval and hence reduce the amount of matches returned by typical search engines. The main objective of this technique is to allow the query to be based on not just keywords but also the location of the keywords within the logical structure of a document. In addition, the technique also provides approximate search capabilities based on the notion of Distance and Variable Length Don\u27t Cares. The proposed techniques have been implemented in a system, called Web-Based Approximate Query System, which contains an SQL-like query language called Web-Based Approximate Query Language.
Web-Based Approximate Query Language has also been integrated with EnviroDaemon, an environmental domain specific search engine. It provides EnviroDaemon with more detailed searching capabilities than just keyword-based search. Implementation details, technical results and future work are presented in this dissertation
Design and Analysis of a Dynamically Configured Log-based Distributed Security Event Detection Methodology
Military and defense organizations rely upon the security of data stored in, and communicated through, their cyber infrastructure to fulfill their mission objectives. It is essential to identify threats to the cyber infrastructure in a timely manner, so that mission risks can be recognized and mitigated. Centralized event logging and correlation is a proven method for identifying threats to cyber resources. However, centralized event logging is inflexible and does not scale well, because it consumes excessive network bandwidth and imposes significant storage and processing requirements on the central event log server. In this paper, we present a flexible, distributed event correlation system designed to overcome these limitations by distributing the event correlation workload across the network of event-producing systems. To demonstrate the utility of the methodology, we model and simulate centralized, decentralized, and hybrid log analysis environments over three accountability levels and compare their performance in terms of detection capability, network bandwidth utilization, database query efficiency, and configurability. The results show that when compared to centralized event correlation, dynamically configured distributed event correlation provides increased flexibility, a significant reduction in network traffic in low and medium accountability environments, and a decrease in database query execution time in the high-accountability case
Cloud service discovery and analysis: a unified framework
Over the past few years, cloud computing has been more and more attractive as a new
computing paradigm due to high flexibility for provisioning on-demand computing
resources that are used as services through the Internet. The issues around cloud service
discovery have considered by many researchers in the recent years. However,
in cloud computing, with the highly dynamic, distributed, the lack of standardized
description languages, diverse services offered at different levels and non-transparent
nature of cloud services, this research area has gained a significant attention. Robust
cloud service discovery approaches will assist the promotion and growth of cloud
service customers and providers, but will also provide a meaningful contribution to
the acceptance and development of cloud computing. In this dissertation, we have
proposed an automated cloud service discovery approach of cloud services. We have
also conducted extensive experiments to validate our proposed approach. The results
demonstrate the applicability of our approach and its capability of effectively identifying
and categorizing cloud services on the Internet. Firstly, we develop a novel
approach to build cloud service ontology. Cloud service ontology initially is built
based on the National Institute of Standards and Technology (NIST) cloud computing
standard. Then, we add new concepts to ontology by automatically analyzing real
cloud services based on cloud service ontology Algorithm. We also propose cloud
service categorization that use Term Frequency to weigh cloud service ontology concepts
and calculate cosine similarity to measure the similarity between cloud services.
The cloud service categorization algorithm is able to categorize cloud services to clusters for effective categorization of cloud services. In addition, we use Machine
Learning techniques to identify cloud service in real environment. Our cloud service
identifier is built by utilizing cloud service features extracted from the real cloud service
providers. We determine several features such as similarity function, semantic
ontology, cloud service description and cloud services components, to be used effectively
in identifying cloud service on the Web. Also, we build a unified model to
expose the cloud service’s features to a cloud service search user to ease the process of
searching and comparison among a large amount of cloud services by building cloud
service’s profile. Furthermore, we particularly develop a cloud service discovery Engine
that has capability to crawl the Web automatically and collect cloud services.
The collected datasets include meta-data of nearly 7,500 real-world cloud services
providers and nearly 15,000 services (2.45GB). The experimental results show that
our approach i) is able to effectively build automatic cloud service ontology, ii) is
robust in identifying cloud service in real environment and iii) is more scalable in
providing more details about cloud services.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201
NewsView: A Recommender System for Usenet based on FAST Data Search
This thesis combines aspects from two approaches to information
access, information filtering and information retrieval, in an effort
to improve the signal to noise ratio in interfaces to conversational
data. These two ideas are blended into one system by augmenting a
search engine indexing Usenet messages with concepts and ideas from
recommender systems theory. My aim is to achieve a situation where
the overall result relevance is improved by exploiting the qualities
of both approaches. Important issues in this context are obtaining
ratings, evaluating relevance rankings and the application of useful
user profiles.
An architecture called NewsView has been designed as part of the work
on this thesis. NewsView describes a framework for interfaces to
Usenet with information retrieval and information filtering concepts
built into it, as well as extensive navigational possibilities within
the data. My aim with this framework is to provide a testbed for user
interface, information filtering and information retrieval issues,
and, most importantly, combinations of the three
Real-Time Detection System for Suspicious URLs
Twitter is prone to malicious tweets containing URLs for spam, phishing, and malware distribution. Conventional Twitter spam detection schemes utilize account features such as the ratio of tweets containing URLs and the account creation date, or relation features in the Twitter graph. These detection schemes are ineffective against feature fabrications or consume much time and resources. Conventional suspicious URL detection schemes utilize several features including lexical features of URLs, URL redirection, HTML content, and dynamic behavior. However, evading techniques such as time-based evasion and crawler evasion exist. In this paper, we propose WARNINGBIRD, a suspicious Real-Time URL detection system for Twitter. Our system investigates correlations of URL redirect chains extracted from several tweets. Because attackers have limited resources and usually reuse them, their URL redirect chains frequently share the same URLs. We develop methods to discover correlated URL redirect chains using the frequently shared URLs and to determine their suspiciousness. We collect numerous tweets from the Twitter public timeline and build a statistical classifier using them. Evaluation results show that our classifier accurately and efficiently detects suspicious URLs
Engineering an Open Web Syndication Interchange with Discovery and Recommender Capabilities
Web syndication has become a popular means of delivering relevant information to people online but the complexity of standards, algorithms and applications pose considerable challenges to engineers. This paper describes the design and development of a novel Web-based syndication intermediary called InterSynd and a simple Web client as a proof of concept. We developed format-neutral middleware that sits between content sources and the user. Additional objectives were to add feed discovery and recommendation components to the intermediary. A search-based feed discovery module helps users find relevant feed sources. Implicit collaborative recommendations of new feeds are also made to the user. The syndication software built uses open standard XML technologies and the free open source libraries. Extensibility and re-configurability were explicit goals. The experience shows that a modular architecture can combine open source modules to build state-of-the-art syndication middleware and applications. The data produced by software metrics indicate the high degree of modularity retained
Machining-based coverage path planning for automated structural inspection
The automation of robotically delivered nondestructive evaluation inspection shares many aims with traditional manufacture machining. This paper presents a new hardware and software system for automated thickness mapping of large-scale areas, with multiple obstacles, by employing computer-aided drawing (CAD)/computer-aided manufacturing (CAM)-inspired path planning to implement control of a novel mobile robotic thickness mapping inspection vehicle. A custom postprocessor provides the necessary translation from CAM numeric code through robotic kinematic control to combine and automate the overall process. The generalized steps to implement this approach for any mobile robotic platform are presented herein and applied, in this instance, to a novel thickness mapping crawler. The inspection capabilities of the system were evaluated on an indoor mock-inspection scenario, within a motion tracking cell, to provide quantitative performance figures for positional accuracy. Multiple thickness defects simulating corrosion features on a steel sample plate were combined with obstacles to be avoided during the inspection. A minimum thickness mapping error of 0.21 mm and a mean path error of 4.41 mm were observed for a 2 m² carbon steel sample of 10-mm nominal thickness. The potential of this automated approach has benefits in terms of repeatability of area coverage, obstacle avoidance, and reduced path overlap, all of which directly lead to increased task efficiency and reduced inspection time of large structural assets
- …