Search CORE

2,430 research outputs found

Impliance: A Next Generation Information Management Appliance

Author: Bhattacharjee Bishwaranjan
Ercegovac Vuk
Glider Joseph
Golding Richard
Lohman Guy
Markl Volke
Pirahesh Hamid
Rao Jun
Rees Robert
Reiss Frederick
Shekita Eugene
Swart Garret
Publication venue
Publication date: 22/12/2006
Field of study

ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

arXiv.org e-Print Archive

CiteSeerX

OBDI System for Fuzzy Web Data Table Integration Using an Ontological and Terminological Resource

Author: Akshaya Zantye, Prof. V.D.Thombre, Prof. Pallavi Yevale
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2015
Field of study

When finding new product innovations or filling new patents, inventors have necessary to retrieve all the relevant pre-existing know-how or to exploit and enforce patents in the technological area. Since the OTR is at the important and heart of Semantic Ontology system, this team works on the ontology construction and evolution. Author present system architecture relies on an Ontological and the Terminological Resource (OTR) which is made up of two parts: on the one end, a generic set of concepts dedicated to data integration task, on the other hand, a specific set of concepts and terminology, to a given domain of application. The important objective of the semantic annotation method here is to identify which relations of OTR are represented in data table that simple concepts are called in the given simple target concepts. In order to annotate a column by a simple target concept, a score is computed for each of the simple target concept of the OTR, on a generic OTR expressed in OWL. Here the system allows XML data tables that have been taken from Web documents, to be annotated with fuzzy RDF descriptions and to be flexibly Ontology search engine. Ontology search engine allows for retrieve not only to exact answers compared with selection criteria but also semantically close answers and compare the this selection criteria expressed as fuzzy sets representing preferences with fuzzy annotations of data. DOI: 10.17762/ijritcc2321-8169.15072

International Journal on Recent and Innovation Trends in Computing and Communication

Clustering-Based Pre-Processing Approaches To Improve Similarity Join Techniques

Author: Tan Yufen
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2010
Field of study

Research on similarity join techniques is becoming one of the growing practical areas for study, especially with the increasing E-availability of vast amounts of digital data from more and more source systems. This research is focused on pre-processing clustering-based techniques to improve existing similarity join approaches. Identifying and extracting the same real-world entities from different data sources is still a big challenge and a significant task in the digital information era. Dissimilar extracts may indeed represent the same real-world entity because of inconsistent values and naming conventions, incorrect or missing data values, or incomplete information. Therefore discovering efficient and accurate approaches to determine the similarity of data objects or values is of theoretical as well as practical significance. Semantic problems are raised even on the concept of similarity regarding its usage and foundation. Existing similarity join approaches often have a very specific view of similarity measures and pre-defined predicates that represent a narrow focus on the context of similarity for a given scenario. The predicates have been assumed to be a group of clustering [MSW 72] related attributes on the join. To identify those entities for data integration purposes requires a broader view of similarity; for instance a number of generic similarity measures are useful in a given data integration systems. This study focused on string similarity join, namely based on the Levenshtein or edit distance and Q-gram. Proposed effective and efficient pre-processing clustering-based techniques were the focus of this study to identify clustering related predicates based on either attribute value or data value that improve existing similarity join techniques in enterprise data integration scenarios

Digital Commons@Wayne State University

Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
McCrae John Philip
Montiel Elena
Publication venue
Publication date: 20/04/2023
Field of study

OPUS Augsburg

A Novel Data Mining and Knowledge Discovery Framework for Digital Library Recommendations System based on User’s Feedback and Personalization

Author: Almaghrabi Maram
Publication venue
Publication date: 01/01/2021
Field of study

University of Canberra Research Repository

Recommended from our members

Analysis of spatio-social relations in a photographic archive (Flickr)

Author: Khalili Shavarini Nazanin
Publication venue
Publication date
Field of study

This thesis aims to study and analyse the complex spatio-social relations among social entities who interact together in a spatially structured social group. This aim is approached in three steps: 1. Collecting and classifying spatio-social data, 2. Disambiguating place names that people use to refer to their homes and 3. Analysis of data of this kind (numerical and visual). The source of spatio-social data used in this work is Flickr. Flickr is a yahoo photo sharing site. Users have a social network of friends and a collection of photos on their profiles. According to available statistics1 the Flickr database contains more than three billion photos, out of which a hundred million are geo-tagged. In retrieving data from Flickr database two different samples have been explored. Initially a random collection of photos that have been uploaded in Flickr during the examined periods has been collected on a daily basis. This is followed by much narrower and more precise criteria for the second data sampling that resulted in Flickr sample GB data. The thesis concludes that location dominates a significant pattern in online behavior of social entities who interact together via internet. The core contributions of this thesis are in the areas of: 1. Extracting indicative sample from very large data sets, 2. Disambiguation of place names that people use in their natural language to refer to their home locations and 3. Proposing potential new insights into behaviors of social entities with spatio-social relations. Overall, the popularity of social networking sites and availability of data that can be obtained from the web (whether people provide voluntarily or can be retrieve as a consequence of online interactions) are likely to continue the increasing trend in future. In addition, the realm of spatio-social data analysis and its visualization also continue to expand, as do the types of maps that are achievable, the visualization packages that the maps can be built with, the number of map users and improved gazetteers with more comprehensive coverage of vague terms. Therefore, the developed methods, algorithm and applications in this study can be beneficial to researchers in social and e-social sciences, those who are interested in developing and maintaining social networking sites, geographers who work on disambiguation of fuzzy vernacular geographic terms, visualization and spatial data analysts in general and those who are looking for development and accommodation of better business strategies (i.e. localization and personalization). 1 (http://www.Flickr.com, retrieved 20/07/09

City Research Online

Contributions to the study and modeling of knowledge development systems

Author: Penubothu Maruti
Publication venue
Publication date: 01/07/2008
Field of study

Not availabl

Etheses - A Saurashtra University Library Service

Recommended from our members

A knowledge based machine tool maintenance planning system Using case-based reasoning techniques

Author: Gao James
Li Dongbo
Li Jing
Wan Shan
Publication venue: 'Elsevier BV'
Publication date: 01/08/2019
Field of study

In advanced manufacturing systems, Computer Numerical Control (CNC) machine tools are important equipment to manufacture product components of high precision, whilst from equipment maintenance point of view, they are regarded as the ‘products’ provided by machine tool manufacturers. Therefore, the reliability of CNC machine tools affects not only the quality of the components they manufacture, but also the reputation and profits of equipment suppliers. This paper presents a novel knowledge-based maintenance planning system to facilitate information and knowledge sharing between all stakeholders including machine tool manufacturers, users (manufacturing systems), maintenance service providers and part suppliers (for machine tools), in the emerging ‘Product-Service’ business model. Case Based Reasoning principles have been implemented to improve the efficiency of maintenance planning. Ontologies were adopted to represent field knowledge using adaptation guided retrievals based on semantic similarity and correlation. The adaption algorithm has been developed based on the Casual Theory and the dependence relationship to generate the solution for required maintenance problems. The proposed system was implemented using Content Management technologies, which proved to have advantages over traditional database systems in managing engineering knowledge, and has been verified using an example CNC machine tool. The results were commented by industrial collaborators as very promising and further exploitation in industry was recommended

Greenwich Academic Literature Archive

Data Management for Dynamic Multimedia Analytics and Retrieval

Author: Gasser Ralph Marc Philipp
Publication venue
Publication date: 01/01/2023
Field of study

Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems. However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries. Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures. All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other

edoc