44,944 research outputs found

    Field spectroradiometer data : acquisition, organisation, processing and analysis on the example of New Zealand native plants : a thesis presented in fulfilment of the requirements for the degree of Master of Philosophy in Earth Science at Massey University, Palmerston North, New Zealand

    Get PDF
    The purpose of this research was to investigate the acquisition, storage, processing and analysis of hyperspectral data for vegetation applications on the example of New Zealand native plants. Data covering the spectral range 350nm-2500nm were collected with a portable spectroradiometer. Hyperspectral data collection results in large datasets that need pre-processing before any analysis can be carried out. A review of the techniques used since the advent of hyperspectral field data showed the following general procedures were followed: 1. Removal of noisy or uncalibrated bands 2. Data smoothing 3. Reduction of dimensionality 4. Transformation into feature space 5. Analysis techniques Steps 1 to 4 which are concerned with the pre-processing of data were found to be repetitive procedures and thus had a high potential for automation. The pre-processing had a major impact on the results gained in the analysis stage. Finding the ideal pre-processing parameters involved repeated processing of the data. Hyperspectral field data should be stored in a structured way. The utilization of a relational database seemed a logical approach. A hierarchical data structure that reflected the real world and the setup of sampling campaigns was designed. This structure was transformed into a logical data model. Furthermore the database also held information needed for pre-processing and statistical analysis. This enabled the calculation of separability measurements such as the JM (Jeffries Matusila) distance or the application of discriminant analysis. Software was written to provide a graphical user interface to the database and implement pre-processing and analysis functionality. The acquisition, processing and analysis steps were applied to New Zealand native vegetation. A high degree of separability between species was achieved and using independent data a classification accuracy of 87.87% was reached. This outcome required smoothing, Hyperion synthesizing and principal components transformation to be applied to the data prior to the classification which used a generalized squared distance discriminant function. The mixed signature problem was addressed in experiments under controlled laboratory conditions and revealed that certain combinations of plants could not be unmixed successfully while mixtures of vegetation and artificial materials resulted in very good abundance estimations. The combination of a relational database with associated software for data processing was found to be highly efficient when dealing with hyperspectral field data

    OpenADAM: an open source genome-wide association data management system for Affymetrix SNP arrays

    Get PDF
    BACKGROUND: Large scale genome-wide association studies have become popular since the introduction of high throughput genotyping platforms. Efficient management of the vast array of data generated poses many challenges. DESCRIPTION: We have developed an open source web-based data management system for the large amount of genotype data generated from the Affymetrix GeneChip Mapping Array and Affymetrix Genome-Wide Human SNP Array platforms. The database supports genotype calling using DM, BRLMM, BRLMM-P or Birdseed algorithms provided by the Affymetrix Power Tools. The genotype and corresponding pedigree data are stored in a relational database for efficient downstream data manipulation and analysis, such as calculation of allele and genotype frequencies, sample identity checking, and export of genotype data in various file formats for analysis using commonly-available software. A novel method for genotyping error estimation is implemented using linkage disequilibrium information from the HapMap project. All functionalities are accessible via a web-based user interface. CONCLUSION: OpenADAM provides an open source database system for management of Affymetrix genome-wide association SNP data.published_or_final_versio

    Parallel Processing of Large Graphs

    Full text link
    More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. Also MapReduce extension based on map-side join usually noticeably presents better efficiency, although not as much as BSP. Nevertheless, MapReduce still remains the good alternative for enormous networks, whose data structures do not fit in local memories.Comment: Preprint submitted to Future Generation Computer System

    Context-Free Path Querying by Matrix Multiplication

    Full text link
    Graph data models are widely used in many areas, for example, bioinformatics, graph databases. In these areas, it is often required to process queries for large graphs. Some of the most common graph queries are navigational queries. The result of query evaluation is a set of implicit relations between nodes of the graph, i.e. paths in the graph. A natural way to specify these relations is by specifying paths using formal grammars over the alphabet of edge labels. An answer to a context-free path query in this approach is usually a set of triples (A, m, n) such that there is a path from the node m to the node n, whose labeling is derived from a non-terminal A of the given context-free grammar. This type of queries is evaluated using the relational query semantics. Another example of path query semantics is the single-path query semantics which requires presenting a single path from the node m to the node n, whose labeling is derived from a non-terminal A for all triples (A, m, n) evaluated using the relational query semantics. There is a number of algorithms for query evaluation which use these semantics but all of them perform poorly on large graphs. One of the most common technique for efficient big data processing is the use of a graphics processing unit (GPU) to perform computations, but these algorithms do not allow to use this technique efficiently. In this paper, we show how the context-free path query evaluation using these query semantics can be reduced to the calculation of the matrix transitive closure. Also, we propose an algorithm for context-free path query evaluation which uses relational query semantics and is based on matrix operations that make it possible to speed up computations by using a GPU.Comment: 9 pages, 11 figures, 2 table

    Model-driven performance evaluation for service engineering

    Get PDF
    Service engineering and service-oriented architecture as an integration and platform technology is a recent approach to software systems integration. Software quality aspects such as performance are of central importance for the integration of heterogeneous, distributed service-based systems. Empirical performance evaluation is a process of measuring and calculating performance metrics of the implemented software. We present an approach for the empirical, model-based performance evaluation of services and service compositions in the context of model-driven service engineering. Temporal databases theory is utilised for the empirical performance evaluation of model-driven developed service systems

    Automatic Unbounded Verification of Alloy Specifications with Prover9

    Full text link
    Alloy is an increasingly popular lightweight specification language based on relational logic. Alloy models can be automatically verified within a bounded scope using off-the-shelf SAT solvers. Since false assertions can usually be disproved using small counter-examples, this approach suffices for most applications. Unfortunately, it can sometimes lead to a false sense of security, and in critical applications a more traditional unbounded proof may be required. The automatic theorem prover Prover9 has been shown to be particularly effective for proving theorems of relation algebras [7], a quantifier-free (or point-free) axiomatization of a fragment of relational logic. In this paper we propose a translation from Alloy specifications to fork algebras (an extension of relation algebras with the same expressive power as relational logic) which enables their unbounded verification in Prover9. This translation covers not only logic assertions, but also the structural aspects (namely type declarations), and was successfully implemented and applied to several examples

    AiiDA: Automated Interactive Infrastructure and Database for Computational Science

    Full text link
    Computational science has seen in the last decades a spectacular rise in the scope, breadth, and depth of its efforts. Notwithstanding this prevalence and impact, it is often still performed using the renaissance model of individual artisans gathered in a workshop, under the guidance of an established practitioner. Great benefits could follow instead from adopting concepts and tools coming from computer science to manage, preserve, and share these computational efforts. We illustrate here our paradigm sustaining such vision, based around the four pillars of Automation, Data, Environment, and Sharing. We then discuss its implementation in the open-source AiiDA platform (http://www.aiida.net), that has been tuned first to the demands of computational materials science. AiiDA's design is based on directed acyclic graphs to track the provenance of data and calculations, and ensure preservation and searchability. Remote computational resources are managed transparently, and automation is coupled with data storage to ensure reproducibility. Last, complex sequences of calculations can be encoded into scientific workflows. We believe that AiiDA's design and its sharing capabilities will encourage the creation of social ecosystems to disseminate codes, data, and scientific workflows.Comment: 30 pages, 7 figure

    XQ2P: Efficient XQuery P2P Time Series Processing

    Full text link
    In this demonstration, we propose a model for the management of XML time series (TS), using the new XQuery 1.1 window operator. We argue that centralized computation is slow, and demonstrate XQ2P, our prototype of efficient XQuery P2P TS computation in the context of financial analysis of large data sets (>1M values)
    corecore