5,649 research outputs found

    Optimising Text Quality in Generation From Relational Databases

    Get PDF
    This paper outlines a text generation system suited to a large class of information sources, relational databases. We focus on one aspect of the problem: the additional information which needs to be speci ed to produce reasonable text quality when generating from relational databases. We outline how databases need to be prepared, and then describe various types of domain semantics which can be used to improve text quality

    A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance

    No full text
    This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics

    Intelligent Data Storage and Retrieval for Design Optimisation – an Overview

    Get PDF
    This paper documents the findings of a literature review conducted by the Sir Lawrence Wackett Centre for Aerospace Design Technology at RMIT University. The review investigates aspects of a proposed system for intelligent design optimisation. Such a system would be capable of efficiently storing (and compressing if required) a range of types of design data into an intelligent database. This database would be accessed by the system during subsequent design processes, allowing for search of relevant design data for re-use in later designs, allowing it to become very efficient in reducing the time for later designs as the database grows in size. Extensive research has been performed, in both theoretical aspects of the project, and practical examples of current similar systems. This research covers the areas of database systems, database queries, representation and compression of design data, geometric representation and heuristic methods for design applications.

    Intersection schemas as a dataspace integration technique

    Get PDF
    This paper introduces the concept of Intersection Schemas in the field of heterogeneous data integration and dataspaces. We introduce a technique for incrementally integrating heterogeneous data sources by specifying semantic overlaps between sets of extensional schemas using bidirectional schema transformations, and automatically combining them into a global schema at each iteration of the integration process. We propose an incremental data integration methodology that uses this technique and that aims to reduce the amount of up-front effort required. Such approaches to data integration are often described as pay-as-you-go. A demonstrator of our technique is described, which utilizes a new graphical user tool implemented using the AutoMed heterogeneous data integration system. A case study is also described, and our technique and integration methodology are compared with a classical data integration strategy

    Interactive tag maps and tag clouds for the multiscale exploration of large spatio-temporal datasets

    Get PDF
    'Tag clouds' and 'tag maps' are introduced to represent geographically referenced text. In combination, these aspatial and spatial views are used to explore a large structured spatio-temporal data set by providing overviews and filtering by text and geography. Prototypes are implemented using freely available technologies including Google Earth and Yahoo! 's Tag Map applet. The interactive tag map and tag cloud techniques and the rapid prototyping method used are informally evaluated through successes and limitations encountered. Preliminary evaluation suggests that the techniques may be useful for generating insights when visualizing large data sets containing geo-referenced text strings. The rapid prototyping approach enabled the technique to be developed and evaluated, leading to geovisualization through which a number of ideas were generated. Limitations of this approach are reflected upon. Tag placement, generalisation and prominence at different scales are issues which have come to light in this study that warrant further work

    IMPROVEMENT OF DATA ANALYSIS BASED ON K-MEANS ALGORITHM AND AKMCA

    Get PDF
    Data analysis is improved using the k-means algorithm and AKMCA. Data mining aims to extract information from a large data set and transform it into a functional structure. Exploratory data analysis and data mining applications rely heavily on clustering. Clustering is grouping a set of objects so that those in the same group (called a cluster) are more similar to those in other groups (clusters). There are various types of cluster models, such as connectivity models, distribution models, centroid models, and density models. Clustering is a technique in data mining in which the set of objects is classified as clusters. Clustering is the most important aspect of data mining. The algorithm makes use of the density number concept. The high-density number point set is extracted from the original data set as a new training set, and the point in the high-density number point set is chosen as the initial cluster centre point. The basic clustering technique and the most widely used algorithm is K-means clustering. K-Means, a partition-based clustering algorithm, is widely used in many fields due to its efficiency and simplicity. However, it is well known that the K-Means algorithm can produce suboptimal results depending on the initial cluster centre chosen. It is also referred to as Looking for the nearest neighbours. It simply divides the datasets into a specified number of clusters. Numerous efforts have been made to improve the K-means clustering algorithm’s performance. Advanced k-mean clustering algorithm (AKMCA) is used in data analysis to obtain useful knowledge of various optimisation and classification problems that can be used for processing massive amounts of raw and unstructured data. Knowledge discovery provides the tools needed to automate the entire data analysis and error reduction process, where their efficacy is investigated using experimental analysis of various datasets. The detailed experimental analysis and a comparison of proposed work with existing k-means clustering algorithms. Furthermore, it provides a clear and comprehensive understanding of the k-means algorithm and its various research directions
    corecore