110 research outputs found

    Data mining in bioinformatics using Weka

    Get PDF
    The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it

    Jumble Java Byte Code to Measure the Effectiveness of Unit Tests

    Get PDF
    Jumble is a byte code level mutation testing tool for Java which inter-operates with JUnit. It has been designed to operate in an industrial setting with large projects. Heuristics have been included to speed the checking of mutations, for example, noting which test fails for each mutation and running this first in subsequent mutation checks. Significant effort has been put into ensuring that it can test code which uses custom class loading and reflection. This requires careful attention to class path handling and coexistence with foreign class-loaders. Jumble is currently used on a continuous basis within an agile programming environment with approximately 370,000 lines of Java code under source control. This checks out project code every fifteen minutes and runs an incremental set of unit tests and mutation tests for modified classes. Jumble is being made available as open source

    Development of the Global Width Database for Large Rivers

    Get PDF
    River width is a fundamental parameter of river hydrodynamic simulations, but no global-scale river width database based on observed water bodies has yet been developed. Here we present a new algorithm that automatically calculates river width from satellite-based water masks and flow direction maps. The Global Width Database for Large Rivers (GWD-LR) is developed by applying the algorithm to the SRTM Water Body Database and the HydroSHEDS flow direction map. Both bank-to-bank river width and effective river width excluding islands are calculated for river channels between 60S and 60N. The effective river width of GWD-LR is compared with existing river width databases for the Congo and Mississippi Rivers. The effective river width of the GWD-LR is slightly narrower compared to the existing databases, but the relative difference is within ±20% for most river channels. As the river width of the GWD-LR is calculated along the river channels of the HydroSHEDS flow direction map, it is relatively straightforward to apply the GWD-LR to global- and continental-scale river modeling

    Water availability and agricultural demand:An assessment framework using global datasets in a data scarce catchment, Rokel-Seli River, Sierra Leone

    Get PDF
    Study region: The proposed assessment framework is aimed at application in Sub-Saharan Africa, but could also be applied in other hydrologically data scarce regions. The test study site was the Rokel-Seli River catchment, Sierra Leone, West Africa. Study focus: We propose a simple, transferable water assessment framework that allows the use of global climate datasets in the assessment of water availability and crop demand in data scarce catchments. In this study, we apply the assessment framework to the catchment of the Rokel-Seli River in Sierra Leone to investigate the capabilities of global datasets complemented with limited historical data in estimating water resources of a river basin facing rising demands from large scale agricultural water withdrawals. We demonstrate how short term river flow records can be extended using a lumped hydrological model, and then use a crop water demand model to generate irrigation water demands for a large irrigated biofuels scheme abstracting from the river. The results of using several different global datasets to drive the assessment framework are compared and the performance evaluated against observed rain and flow gauge records. New hydrological insights: We find that the hydrological model capably simulates both low and high flows satisfactorily, and that all the input datasets consistently produce similar results for water withdrawal scenarios. The proposed framework is successfully applied to assess the variability of flows available for abstraction against agricultural demand. The assessment framework conclusions are robust despite the different input datasets and calibration scenarios tested, and can be extended to include other global input datasets

    Perspectives on open access high resolution digital elevation models to produce global flood hazard layers

    Get PDF
    Global flood hazard models have recently become a reality thanks to the release of open access global digital elevation models, the development of simplified and highly efficient flow algorithms, and the steady increase in computational power. In this commentary we argue that although the availability of open access global terrain data has been critical in enabling the development of such models, the relatively poor resolution and precision of these data now limit significantly our ability to estimate flood inundation and risk for the majority of the planet’s surface. The difficulty of deriving an accurate ‘bare-earth’ terrain model due to the interaction of vegetation and urban structures with the satellite-based remote sensors means that global terrain data are often poorest in the areas where people, property (and thus vulnerability) are most concentrated. Furthermore, the current generation of open access global terrain models are over a decade old and many large floodplains, particularly those in developing countries, have undergone significant change in this time. There is therefore a pressing need for a new generation of high resolution and high vertical precision open access global digital elevation models to allow significantly improved global flood hazard models to be developed

    Development of the Global Width Database for Large Rivers

    Get PDF
    River width is a fundamental parameter of river hydrodynamic simulations, but no global-scale river width database based on observed water bodies has yet been developed. Here we present a new algorithm that automatically calculates river width from satellite-based water masks and flow direction maps. The Global Width Database for Large Rivers (GWD-LR) is developed by applying the algorithm to the SRTM Water Body Database and the HydroSHEDS flow direction map. Both bank-to-bank river width and effective river width excluding islands are calculated for river channels between 60S and 60N. The effective river width of GWD-LR is compared with existing river width databases for the Congo and Mississippi Rivers. The effective river width of the GWD-LR is slightly narrower compared to the existing databases, but the relative difference is within ±20% for most river channels. As the river width of the GWD-LR is calculated along the river channels of the HydroSHEDS flow direction map, it is relatively straightforward to apply the GWD-LR to global- and continental-scale river modeling
    corecore