8 research outputs found
Recommended from our members
A Systematic Performance Study of Object Database Management Systems
Many previous performance benchmarks for Object Database Management Systems (ODBMSs) have typically used arbitrary sets of tests based on what their designers felt were the characteristics of Engineering applications. Increasingly, however, ODBMSs are being used in non-engineering domains, such as Financial Trading, Clinical Healthcare, Telecommunications Network Management, etc. Part of the reason for this is that the technology has matured over the past few years and has become a less risky choice for organisations looking for better w'ays to manage complex data. However, the development of suitable application- or industry-specific benchmarks, based on actual performance studies, has not paralleled this growth.
The research reported here approaches performance evaluation of ODBMSs pragmatically. It uses a combination of case studies and benchmark experiments to investigate the performance characteristics of ODBMSs for particular applications, following the successful use of this approach by Youssef [Youss93] for studying the performance of On- Line Transaction Processing (OLTP) applications for Relational Database Management Systems (RDBMSs).
Six case studies at five organisations show’ that organisations consider a wide range of factors when undertaking their own performance studies or benchmarks. Furthermore, none of the studied organisations considered using any public benchmarks. Six current and derived benchmarks also highlight statistically significant performance differences between three major commercial products: Objectivity/DB, ObjectStore and UniSQL. These benchmarks indicate the suitability of the products tested for particular application domains.
The research could not find any evidence at this time to support the concept of a generic or canonical performance workload for ODBMSs. This is demonstrated by the case studies and supported by the benchmark experiments. However, the research shows that performance benchmarks serve a very useful role in ODBMS evaluations and can help identify architectural and quality problems with products that would not otherwise be observed until significant application or system development was already in progress
Enhanced Query Processing on Complex Spatial and Temporal Data
Innovative technologies in the area of multimedia and mechanical engineering as well as novel methods for data acquisition in different scientific subareas, including geo-science, environmental science, medicine, biology and astronomy, enable a more exact representation of the data, and thus, a more precise data analysis. The resulting quantitative and qualitative growth of specifically spatial and temporal data leads to new challenges for the management and processing of complex structured objects and requires the
employment of efficient and effective methods for data analysis.
Spatial data denote the description of objects in space by a well-defined extension, a specific location and by their
relationships to the other objects. Classical representatives of complex structured spatial objects are three-dimensional CAD data from the sector "mechanical engineering" and two-dimensional bounded regions from the area "geography". For industrial applications, efficient collision and intersection queries are of great
importance.
Temporal data denote data describing time dependent processes, as for instance the duration of specific events or the description of time varying attributes of objects. Time series belong to one of the
most popular and complex type of temporal data and are the most important form of description for time varying processes. An
elementary type of query in time series databases is the similarity query which serves as basic query for data mining applications.
The main target of this thesis is to develop an effective and efficient algorithm supporting collision queries on spatial data as well as similarity queries on temporal data, in particular, time
series. The presented concepts are based on the efficient management of interval sequences which are suitable for spatial and temporal data. The effective analysis of the underlying objects will be
efficiently supported by adequate access methods.
First, this thesis deals with collision queries on complex spatial objects which can be reduced to intersection queries on interval sequences. We introduce statistical methods for the grouping of
subsequences. Involving the concept of multi-step query processing, these methods enable the user to accelerate the query process drastically. Furthermore, in this thesis we will develop a cost
model for the multi-step query process of interval sequences in distributed systems. The proposed approach successfully supports a cost based query strategy.
Second, we introduce a novel similarity measure for time series. It allows the user to focus specific time series amplitudes for the similarity measurement. The new similarity model defines two time series to be similar iff they show similar temporal behavior w.r.t. being below or above a specific threshold. This type of query is
primarily required in natural science applications. The main goal of this new query method is the detection of anomalies and the adaptation to new claims in the area of data mining in time series
databases. In addition, a semi-supervised cluster analysis method will be presented which is based on the introduced similarity model for time series.
The efficiency and effectiveness of the proposed techniques will be extensively discussed and the advantages against existing methods experimentally proofed by means of datasets derived from real-world
applications
A semantic and agent-based approach to support information retrieval, interoperability and multi-lateral viewpoints for heterogeneous environmental databases
PhDData stored in individual autonomous databases often needs to be combined and
interrelated. For example, in the Inland Water (IW) environment monitoring domain,
the spatial and temporal variation of measurements of different water quality indicators
stored in different databases are of interest. Data from multiple data sources is more
complex to combine when there is a lack of metadata in a computation forin and when
the syntax and semantics of the stored data models are heterogeneous. The main types
of information retrieval (IR) requirements are query transparency and data
harmonisation for data interoperability and support for multiple user views. A
combined Semantic Web based and Agent based distributed system framework has
been developed to support the above IR requirements. It has been implemented using
the Jena ontology and JADE agent toolkits. The semantic part supports the
interoperability of autonomous data sources by merging their intensional data, using a
Global-As-View or GAV approach, into a global semantic model, represented in
DAML+OIL and in OWL. This is used to mediate between different local database
views. The agent part provides the semantic services to import, align and parse
semantic metadata instances, to support data mediation and to reason about data
mappings during alignment. The framework has applied to support information
retrieval, interoperability and multi-lateral viewpoints for four European environmental
agency databases.
An extended GAV approach has been developed and applied to handle queries that can
be reformulated over multiple user views of the stored data. This allows users to
retrieve data in a conceptualisation that is better suited to them rather than to have to
understand the entire detailed global view conceptualisation. User viewpoints are
derived from the global ontology or existing viewpoints of it. This has the advantage
that it reduces the number of potential conceptualisations and their associated
mappings to be more computationally manageable. Whereas an ad hoc framework
based upon conventional distributed programming language and a rule framework
could be used to support user views and adaptation to user views, a more formal
framework has the benefit in that it can support reasoning about the consistency,
equivalence, containment and conflict resolution when traversing data models. A
preliminary formulation of the formal model has been undertaken and is based upon
extending a Datalog type algebra with hierarchical, attribute and instance value
operators. These operators can be applied to support compositional mapping and
consistency checking of data views. The multiple viewpoint system was implemented
as a Java-based application consisting of two sub-systems, one for viewpoint
adaptation and management, the other for query processing and query result
adjustment
Systems between information and knowledge : In a memory management model of an extended enterprise
The research question of this thesis was how knowledge can be managed with information systems. Information systems can support but not replace knowledge management. Systems can mainly store epistemic organisational knowledge included in content, and process data and information.
Certain value can be achieved by adding communication technology to systems. All communication, however, can not be managed. A new layer between communication and manageable information was named as knowformation.
Knowledge management literature was surveyed, together with information species from philosophy, physics, communication theory, and information system science. Positivism, post-positivism, and critical theory were studied, but knowformation in extended organisational memory seemed to be socially constructed.
A memory management model of an extended enterprise (M3.exe) and knowformation concept were findings from iterative case studies, covering data, information and knowledge management systems. The cases varied from groups towards extended organisation. Systems were investigated, and administrators, users (knowledge workers) and managers interviewed. The model building required alternative sets of data, information and knowledge, instead of using the traditional pyramid. Also the explicit-tacit dichotomy was reconsidered. As human knowledge is the final aim of all data and information in the systems, the distinction between management of information vs. management of people was harmonised.
Information systems were classified as the core of organisational memory. The content of the systems is in practice between communication and presentation. Firstly, the epistemic criterion of knowledge is not required neither in the knowledge management literature, nor from the content of the systems. Secondly, systems deal mostly with containers, and the knowledge management literature with applied knowledge. Also the construction of reality based on the system content and communication supports the knowformation concept.
Knowformation belongs to memory management model of an extended enterprise (M3.exe) that is divided into horizontal and vertical key dimensions. Vertically, processes deal with content that can be managed, whereas communication can be supported, mainly by infrastructure. Horizontally, the right hand side of the model contains systems, and the left hand side content, which should be independent from each other. A strategy based on the model was defined.Tutkimuksen tavoitteena oli määrittää, miten tietojärjestelmiä voidaan käyttää organisaatioiden tietämyksen hallintaan. Johtopäätöksenä voidaan sanoa, että järjestelmillä voidaan tukea, mutta ei korvata tietojohtamista. Tietojärjestelmiä voidaan käyttää lähinnä organisaation episteemisen tiedon muistina, prosessoitavan tiedon varastointiin.
Oleellista lisäarvoa saadaan, jos viestintäteknologiaa käytetään tietojärjestelmien tukena. Kommunikaatiota ei kuitenkaan voida johtaa, sillä se ei perustu prosesseihin, vaan enintään työnkulkuun ja sitä vapaampaan viestintään. Hallitun informaation ja viestinnän välille syntyy knowformaatioksi nimetty kerros, lähinnä organisaatioiden lyhytkestoiseen muistiin.
Uusi knowformaatio-käsite on käytännön tapaustutkimusten tulos. Vastaavaa ei aiemmissa tietojohtamisen tutkimuksissa ole esitetty. Tietojohtamisen kirjallisuuden taustaksi tutkittiin fysiikan, filosofian, viestinnän ja tietojenkäsittelytieteen luokitukset. Tapaustutkimuksissa tarkasteltiin useita datan hallinnan, dokumentaation ja tietojohtamisen järjestelmiä organisaation sisäisissä ryhmissä, organisaation laajuisesti sekä organisaation yhteistyökumppaneiden kanssa. Tapauksissa tutkittiin niin järjestelmien ominaisuudet kuin myös eri sidosryhmien kokemukset.
Tutkimuksessa tietojärjestelmät luokiteltiin organisaation muistin ytimeen. Knowformaation kerrosta tarvitaan toisaalta koska filosofisen tiedon episteemistä kriteeriä ei edellytetä järjestelmien sisällöltä (eikä tietojohtamisen kirjallisuuden käsitemäärittelyissä) ja toisaalta koska tiedon uudelleenkonstruoinnissa merkitys muuttuu.
Tulevien järjestelmien suunnitteluun tarvitaan uusi näkökulma, koska data, informaatio ja knowledge tasojen hierarkia ei erotu eri järjestelmätyyppien käyttäjien sosiaalisesti konstruoidussa todellisuudessa. Tieteen filosofian skaala positivistisesta konstruktivistiseen oli mallin muodostuksessa oleellinen, ja sen validiuden todentamisen jälkeen eksplisiittinen piiloinen -dikotomia mallinettiin uudelleen knowformaatio-käsitteen avulla.
Uusi tietomalli ja knowformaatio-käsite tarvitaan työn päätuloksessa, jatketun organisaation muistin hallintamallissa. Sen ääripäihin kuluvat kommunikaatio, jota tuetaan, ja toisessa päässä prosessit, joita hallitaan. Kahden muun entiteetin, järjestelmien ja niiden sisällön, tulisi olla riippumattomia toisistaan. Knowformaatio elää näiden kokonaisuuksien implisiittisillä rajoilla, informaation ja tiedon välisellä harmaalla alueella