Search CORE

44,944 research outputs found

Field spectroradiometer data : acquisition, organisation, processing and analysis on the example of New Zealand native plants : a thesis presented in fulfilment of the requirements for the degree of Master of Philosophy in Earth Science at Massey University, Palmerston North, New Zealand

Author: Hueni Andreas
Publication venue: 'Massey University'
Publication date: 01/01/2006
Field of study

The purpose of this research was to investigate the acquisition, storage, processing and analysis of hyperspectral data for vegetation applications on the example of New Zealand native plants. Data covering the spectral range 350nm-2500nm were collected with a portable spectroradiometer. Hyperspectral data collection results in large datasets that need pre-processing before any analysis can be carried out. A review of the techniques used since the advent of hyperspectral field data showed the following general procedures were followed: 1. Removal of noisy or uncalibrated bands 2. Data smoothing 3. Reduction of dimensionality 4. Transformation into feature space 5. Analysis techniques Steps 1 to 4 which are concerned with the pre-processing of data were found to be repetitive procedures and thus had a high potential for automation. The pre-processing had a major impact on the results gained in the analysis stage. Finding the ideal pre-processing parameters involved repeated processing of the data. Hyperspectral field data should be stored in a structured way. The utilization of a relational database seemed a logical approach. A hierarchical data structure that reflected the real world and the setup of sampling campaigns was designed. This structure was transformed into a logical data model. Furthermore the database also held information needed for pre-processing and statistical analysis. This enabled the calculation of separability measurements such as the JM (Jeffries Matusila) distance or the application of discriminant analysis. Software was written to provide a graphical user interface to the database and implement pre-processing and analysis functionality. The acquisition, processing and analysis steps were applied to New Zealand native vegetation. A high degree of separability between species was achieved and using independent data a classification accuracy of 87.87% was reached. This outcome required smoothing, Hyperion synthesizing and principal components transformation to be applied to the data prior to the classification which used a generalized squared distance discriminant function. The mixed signature problem was addressed in experiments under controlled laboratory conditions and revealed that certain combinations of plants could not be unmixed successfully while mixtures of vegetation and artificial materials resulted in very good abundance estimations. The combination of a relational database with associated software for data processing was found to be highly efficient when dealing with hyperspectral field data

Massey Research Online

OpenADAM: an open source genome-wide association data management system for Affymetrix SNP arrays

Author: Chan A SW
Cherny S S
Sham P C
Yeung J MY
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

BACKGROUND: Large scale genome-wide association studies have become popular since the introduction of high throughput genotyping platforms. Efficient management of the vast array of data generated poses many challenges. DESCRIPTION: We have developed an open source web-based data management system for the large amount of genotype data generated from the Affymetrix GeneChip Mapping Array and Affymetrix Genome-Wide Human SNP Array platforms. The database supports genotype calling using DM, BRLMM, BRLMM-P or Birdseed algorithms provided by the Affymetrix Power Tools. The genotype and corresponding pedigree data are stored in a relational database for efficient downstream data manipulation and analysis, such as calculation of allele and genotype frequencies, sample identity checking, and export of genotype data in various file formats for analysis using commonly-available software. A novel method for genotyping error estimation is implemented using linkage disequilibrium information from the HapMap project. All functionalities are accessible via a web-based user interface. CONCLUSION: OpenADAM provides an open source database system for management of Affymetrix genome-wide association SNP data.published_or_final_versio

Crossref

PubMed Central

HKU Scholars Hub

Parallel Processing of Large Graphs

Author: Indyk Wojciech
Kajdanowicz Tomasz
Kazienko Przemyslaw
Publication venue
Publication date: 03/06/2013
Field of study

More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. Also MapReduce extension based on map-side join usually noticeably presents better efficiency, although not as much as BSP. Nevertheless, MapReduce still remains the good alternative for enormous networks, whose data structures do not fit in local memories.Comment: Preprint submitted to Future Generation Computer System

arXiv.org e-Print Archive

CiteSeerX

Context-Free Path Querying by Matrix Multiplication

Author: Azimov Rustam
Grigorev Semyon
Publication venue
Publication date: 19/12/2017
Field of study

Graph data models are widely used in many areas, for example, bioinformatics, graph databases. In these areas, it is often required to process queries for large graphs. Some of the most common graph queries are navigational queries. The result of query evaluation is a set of implicit relations between nodes of the graph, i.e. paths in the graph. A natural way to specify these relations is by specifying paths using formal grammars over the alphabet of edge labels. An answer to a context-free path query in this approach is usually a set of triples (A, m, n) such that there is a path from the node m to the node n, whose labeling is derived from a non-terminal A of the given context-free grammar. This type of queries is evaluated using the relational query semantics. Another example of path query semantics is the single-path query semantics which requires presenting a single path from the node m to the node n, whose labeling is derived from a non-terminal A for all triples (A, m, n) evaluated using the relational query semantics. There is a number of algorithms for query evaluation which use these semantics but all of them perform poorly on large graphs. One of the most common technique for efficient big data processing is the use of a graphics processing unit (GPU) to perform computations, but these algorithms do not allow to use this technique efficiently. In this paper, we show how the context-free path query evaluation using these query semantics can be reduced to the calculation of the matrix transitive closure. Also, we propose an algorithm for context-free path query evaluation which uses relational query semantics and is based on matrix operations that make it possible to speed up computations by using a GPU.Comment: 9 pages, 11 figures, 2 table

arXiv.org e-Print Archive

Model-driven performance evaluation for service engineering

Author: Boskovic Marko
Hasselbring Wilhelm
Pahl Claus
Publication venue
Publication date: 01/01/2007
Field of study

Service engineering and service-oriented architecture as an integration and platform technology is a recent approach to software systems integration. Software quality aspects such as performance are of central importance for the integration of heterogeneous, distributed service-based systems. Empirical performance evaluation is a process of measuring and calculating performance metrics of the implemented software. We present an approach for the empirical, model-based performance evaluation of services and service compositions in the context of model-driven service engineering. Temporal databases theory is utilised for the empirical performance evaluation of model-driven developed service systems

Irish Universities

DCU Online Research Access Service

Automatic Unbounded Verification of Alloy Specifications with Prover9

Author: Cunha Alcino
Macedo Nuno
Publication venue
Publication date: 25/09/2012
Field of study

Alloy is an increasingly popular lightweight specification language based on relational logic. Alloy models can be automatically verified within a bounded scope using off-the-shelf SAT solvers. Since false assertions can usually be disproved using small counter-examples, this approach suffices for most applications. Unfortunately, it can sometimes lead to a false sense of security, and in critical applications a more traditional unbounded proof may be required. The automatic theorem prover Prover9 has been shown to be particularly effective for proving theorems of relation algebras [7], a quantifier-free (or point-free) axiomatization of a fragment of relational logic. In this paper we propose a translation from Alloy specifications to fork algebras (an extension of relation algebras with the same expressive power as relational logic) which enables their unbounded verification in Prover9. This translation covers not only logic assertions, but also the structural aspects (namely type declarations), and was successfully implemented and applied to several examples

arXiv.org e-Print Archive

CiteSeerX

AiiDA: Automated Interactive Infrastructure and Database for Computational Science

Author: Cepellotti Andrea
Kozinsky Boris
Marzari Nicola
Pizzi Giovanni
Sabatini Riccardo
Publication venue: 'Elsevier BV'
Publication date: 08/09/2015
Field of study

Computational science has seen in the last decades a spectacular rise in the scope, breadth, and depth of its efforts. Notwithstanding this prevalence and impact, it is often still performed using the renaissance model of individual artisans gathered in a workshop, under the guidance of an established practitioner. Great benefits could follow instead from adopting concepts and tools coming from computer science to manage, preserve, and share these computational efforts. We illustrate here our paradigm sustaining such vision, based around the four pillars of Automation, Data, Environment, and Sharing. We then discuss its implementation in the open-source AiiDA platform (http://www.aiida.net), that has been tuned first to the demands of computational materials science. AiiDA's design is based on directed acyclic graphs to track the provenance of data and calculations, and ensure preservation and searchability. Remote computational resources are managed transparently, and automation is coupled with data storage to ensure reproducibility. Last, complex sequences of calculations can be encoded into scientific workflows. We believe that AiiDA's design and its sharing capabilities will encourage the creation of social ecosystems to disseminate codes, data, and scientific workflows.Comment: 30 pages, 7 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

XQ2P: Efficient XQuery P2P Time Series Processing

Author: Butnaru Bogdan
Gardarin Georges
Nguyen Benjamin
Yeh Laurent
Publication venue
Publication date: 20/10/2009
Field of study

In this demonstration, we propose a model for the management of XML time series (TS), using the new XQuery 1.1 window operator. We argue that centralized computation is slow, and demonstrate XQ2P, our prototype of efficient XQuery P2P TS computation in the context of financial analysis of large data sets (>1M values)

arXiv.org e-Print Archive

HAL UVSQ