2,430 research outputs found
Impliance: A Next Generation Information Management Appliance
ably successful in building a large market and adapting to the changes of the
last three decades, its impact on the broader market of information management
is surprisingly limited. If we were to design an information management system
from scratch, based upon today's requirements and hardware capabilities, would
it look anything like today's database systems?" In this paper, we introduce
Impliance, a next-generation information management system consisting of
hardware and software components integrated to form an easy-to-administer
appliance that can store, retrieve, and analyze all types of structured,
semi-structured, and unstructured information. We first summarize the trends
that will shape information management for the foreseeable future. Those trends
imply three major requirements for Impliance: (1) to be able to store, manage,
and uniformly query all data, not just structured records; (2) to be able to
scale out as the volume of this data grows; and (3) to be simple and robust in
operation. We then describe four key ideas that are uniquely combined in
Impliance to address these requirements, namely the ideas of: (a) integrating
software and off-the-shelf hardware into a generic information appliance; (b)
automatically discovering, organizing, and managing all data - unstructured as
well as structured - in a uniform way; (c) achieving scale-out by exploiting
simple, massive parallel processing, and (d) virtualizing compute and storage
resources to unify, simplify, and streamline the management of Impliance.
Impliance is an ambitious, long-term effort to define simpler, more robust, and
more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
OBDI System for Fuzzy Web Data Table Integration Using an Ontological and Terminological Resource
When finding new product innovations or filling new patents, inventors have necessary to retrieve all the relevant pre-existing know-how or to exploit and enforce patents in the technological area. Since the OTR is at the important and heart of Semantic Ontology system, this team works on the ontology construction and evolution. Author present system architecture relies on an Ontological and the Terminological Resource (OTR) which is made up of two parts: on the one end, a generic set of concepts dedicated to data integration task, on the other hand, a specific set of concepts and terminology, to a given domain of application. The important objective of the semantic annotation method here is to identify which relations of OTR are represented in data table that simple concepts are called in the given simple target concepts. In order to annotate a column by a simple target concept, a score is computed for each of the simple target concept of the OTR, on a generic OTR expressed in OWL. Here the system allows XML data tables that have been taken from Web documents, to be annotated with fuzzy RDF descriptions and to be flexibly Ontology search engine. Ontology search engine allows for retrieve not only to exact answers compared with selection criteria but also semantically close answers and compare the this selection criteria expressed as fuzzy sets representing preferences with fuzzy annotations of data.
DOI: 10.17762/ijritcc2321-8169.15072
Clustering-Based Pre-Processing Approaches To Improve Similarity Join Techniques
Research on similarity join techniques is becoming one of the growing practical areas for study, especially with the increasing E-availability of vast amounts of digital data from more and more source systems. This research is focused on pre-processing clustering-based techniques to improve existing similarity join approaches.
Identifying and extracting the same real-world entities from different data sources is still a big challenge and a significant task in the digital information era. Dissimilar extracts may indeed represent the same real-world entity because of inconsistent values and naming conventions, incorrect or missing data values, or incomplete information. Therefore discovering efficient and accurate approaches to determine the similarity of data objects or values is of theoretical as well as practical significance.
Semantic problems are raised even on the concept of similarity regarding its usage and foundation. Existing similarity join approaches often have a very specific view of similarity measures and pre-defined predicates that represent a narrow focus on the context of similarity for a given scenario. The predicates have been assumed to be a group of clustering [MSW 72] related attributes on the join. To identify those entities for data integration purposes requires a broader view of similarity; for instance a number of generic similarity measures are useful in a given data integration systems.
This study focused on string similarity join, namely based on the Levenshtein or edit distance and Q-gram. Proposed effective and efficient pre-processing clustering-based techniques were the focus of this study to identify clustering related predicates based on either attribute value or data value that improve existing similarity join techniques in enterprise data integration scenarios
Recommended from our members
Analysis of spatio-social relations in a photographic archive (Flickr)
This thesis aims to study and analyse the complex spatio-social relations among social entities who interact together in a spatially structured social group. This aim is approached in three steps:
1. Collecting and classifying spatio-social data,
2. Disambiguating place names that people use to refer to their homes and
3. Analysis of data of this kind (numerical and visual).
The source of spatio-social data used in this work is Flickr. Flickr is a yahoo photo sharing site. Users have a social network of friends and a collection of photos on their profiles. According to available statistics1 the Flickr database contains more than three billion photos, out of which a hundred million are geo-tagged. In retrieving data from Flickr database two different samples have been explored. Initially a random collection of photos that have been uploaded in Flickr during the examined periods has been collected on a daily basis. This is followed by much narrower and more precise criteria for the second data sampling that resulted in Flickr sample GB data.
The thesis concludes that location dominates a significant pattern in online behavior of social entities who interact together via internet. The core contributions of this thesis are in the areas of:
1. Extracting indicative sample from very large data sets,
2. Disambiguation of place names that people use in their natural language to refer to their home locations and
3. Proposing potential new insights into behaviors of social entities with spatio-social relations.
Overall, the popularity of social networking sites and availability of data that can be obtained from the web (whether people provide voluntarily or can be retrieve as a consequence of online interactions) are likely to continue the increasing trend in future. In addition, the realm of spatio-social data analysis and its visualization also continue to expand, as do the types of maps that are achievable, the visualization packages that the maps can be built with, the number of map users and improved gazetteers with more comprehensive coverage of vague terms. Therefore, the developed methods, algorithm and applications in this study can be beneficial to researchers in social and e-social sciences, those who are interested in developing and maintaining social networking sites, geographers who work on disambiguation of fuzzy vernacular geographic terms, visualization and spatial data analysts in general and those who are looking for development and accommodation of better business strategies (i.e. localization and personalization).
1 (http://www.Flickr.com, retrieved 20/07/09
Recommended from our members
A knowledge based machine tool maintenance planning system Using case-based reasoning techniques
In advanced manufacturing systems, Computer Numerical Control (CNC) machine tools are important equipment to manufacture product components of high precision, whilst from equipment maintenance point of view, they are regarded as the âproductsâ provided by machine tool manufacturers. Therefore, the reliability of CNC machine tools affects not only the quality of the components they manufacture, but also the reputation and profits of equipment suppliers. This paper presents a novel knowledge-based maintenance planning system to facilitate information and knowledge sharing between all stakeholders including machine tool manufacturers, users (manufacturing systems), maintenance service providers and part suppliers (for machine tools), in the emerging âProduct-Serviceâ business model. Case Based Reasoning principles have been implemented to improve the efficiency of maintenance planning. Ontologies were adopted to represent field knowledge using adaptation guided retrievals based on semantic similarity and correlation. The adaption algorithm has been developed based on the Casual Theory and the dependence relationship to generate the solution for required maintenance problems. The proposed system was implemented using Content Management technologies, which proved to have advantages over traditional database systems in managing engineering knowledge, and has been verified using an example CNC machine tool. The results were commented by industrial collaborators as very promising and further exploitation in industry was recommended
Data Management for Dynamic Multimedia Analytics and Retrieval
Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems.
However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries.
Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures.
All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other
- âŠ