3,192 research outputs found

    On the selection of secondary indices in relational databases

    Get PDF
    An important problem in the physical design of databases is the selection of secondary indices. In general, this problem cannot be solved in an optimal way due to the complexity of the selection process. Often use is made of heuristics such as the well-known ADD and DROP algorithms. In this paper it will be shown that frequently used cost functions can be classified as super- or submodular functions. For these functions several mathematical properties have been derived which reduce the complexity of the index selection problem. These properties will be used to develop a tool for physical database design and also give a mathematical foundation for the success of the before-mentioned ADD and DROP algorithms

    Modelling Uncertainty in Physical Database Design

    Get PDF
    Physical database design can be marked as a crucial step in the overall design process of databases. The outcome of physical database design is a physical schema which describes the storage and access structures of the stored database. The selection of an ecient physical schema is an NP-complete problem. A signi cant number of eorts has been reported to develop tools that assist in the selection of physical schemas. Most of the eorts implicitly apply a number of heuristics to avoid the evaluation of all schemas. In this paper, we present an approach, based on the Dempster-Shafer theory, that explicitly models a rich set of heuristics |used for the selection of an ecient physical schema | into knowledge rules. These rules may be loaded into a knowledge base, which, in turn, can be embedded in physical database design tools.

    Service Offshoring and White-Collar Employment

    Get PDF
    I study the effects of service offshoring on white-collar employment, using highly disaggregated occupational data for the U.S.. I present a structural model of the firm’s behavior that allows tractable derivation of labor demand elasticities for highly detailed occupations. I estimate the model using Quasi-Maximum Likelihood, to simultaneously account for the high degree of censoring of the employment variable and the small cross-sectional dimension of the panel. I find that service offshoring is skill-biased, because it raises employment among high-skilled occupations and lowers employment among medium- and low-skilled ones. Within each skill group, service offshoring penalizes tradeable occupations and tends to benefit complex non tradeable jobs.service offshoring, white-collar occupations, labor demand elasticities, homothetic weak separability, censored demand system estimation

    An exploration into the sparse representation of spectra

    Get PDF
    Includes bibliographical references (leaves 73-76)This thesis describes an exploration in achieving sparse representations of object, with special focus on spectral data. Given a database of objects one would like to know the actual aspects of each class that distinguish it from any other class in the database. We explore the hypothesis that simple abstractions (descriptions) that humans normally make, especially based on the visual phenomenology or physics on the problem, can be helpful in extracting and formulating useful sparse representations of the observed objects. In this thesis we focus on the discovery of such underlying features, employing a number of recent methods from machine learning. Firstly we find that an approach to automatic feature discovery recently proposed in the literature (Non Negative Matrix Factorization) is not as it seems. We show the limitations of this approach and demonstrate a more efficient method on a synthetic problem. Secondly we explore a more empirical approach to extracting visually attractive features of spectra from which we formulate simple re-representation of spectral data and show that the identification and discovery of certain intuitive features at various scales can be sufficient to describe a spectrum profile. Finally we explore a more traditional and principled automatic method of analyzing a spectrum at different resolutions (Wavelets). We find that certain classes of spectra can easily be discriminated between by a simple approximation of the spectrum profile while in other cases only the finer profile details are important. Throughout this thesis we employ a measure called the separability index as our measure of how easy it is to discriminate objects in a database with the proposed representations

    Scale challenges in inventory of forests aided by remote sensing

    Get PDF
    The impact of changing the scale of observation on information derived from forest inventories is the basis of scale-related research in forest inventory and analysis (FIA). Interactions between the scale of observation and observed heterogeneity in studied variables highlight a dependence on scale that affects measurements, estimates, and relationships between inventory data from terrestrial and remote sensing surveys. This doctoral research defines "scale" as the divisions of continuous space over which measurements are made, or hierarchies of discrete units of study/analysis in space. Therefore, the "scale of observation" (also known as support) refers to that integral of space over which statistics are computed and forest inventory variables regionalized. Given the ubiquitous nature of scale issues, a case study approach was undertaken in this research (Articles I-IV) with the goal to provide fundamental understanding of responses to the scale of observation for specific FIA variables. The studied forest inventory variables are; forest stand structural heterogeneity, forest cover proportion and tree species identities. Forest cover proportion (or simply forest area) and tree species are traditional and fundamental forest inventory variables commonly assessed over large areas using both terrestrial samples and remote sensing data whereas, forest stand structural heterogeneity is a contemporary FIA variable that is increasingly demanded in multi-resource inventories to inform management and conservation efforts as it is linked to biodiversity, productivity, ecosystem functioning and productivity, and used as auxiliary data in forest inventory. This research has two overall aims: 1. To improve the understanding of the association between the scale of observation and observed heterogeneity in inventory of forest stand structural heterogeneity, forest-cover proportions, and identification of tree species from a combination of terrestrial samples and remote sensing data. 2. To contribute knowledge to the estimation of scale-dependence in inventory of forest stand structural heterogeneity, forest-cover proportions, and identification of tree species from a combination of terrestrial samples and remote sensing data. Different scales of observation were considered across the four case studies encompassing individual leaf, crown-part or branch, single-tree crown, forest stand, landscape and global levels of analysis. Terrestrial and remote sensing data sets from a variety of temperate forests in Germany and France were utilized across case studies. In cases where no inventory data were available, synthetic data was simulated at different scales of observation. Heterogeneity in FIA variable estimates was monitored across scales of observation using estimators of variance and associated precision. As too much heterogeneity is hardly interpreted due to a low signal to noise ratio, object-based image analysis (OBIA) methods were used to manage heterogeneity in high resolution remote sensing data before evaluating scale dependence or scaling across observed scales. Similarly, ensemble classification techniques were applied to address methodological heterogeneity across classifiers in a case study on classification of two physically and spectrally similar Pinus species. Across case studies, a dependence on the scale of observation was determined by linking estimates of heterogeneity to their respective scales of observation using linear regression and a combination of geo-statistics and Monte-Carlo approaches. In order to address scale-dependence, thresholds to scale domains were identified so as to enable efficient observation of studied FIA variables and scaling approaches proposed to bridge observations across scales. For scaling, this research evaluated the potential of different regression techniques to map forest stand structural heterogeneity and tree species wall-to-wall from remote sensing data. In addition, radiative transfer modelling was evaluated in the transfer between leaf and crown hyperspectra, and a global sampling grid framework proposed to efficiently link different stages of survey sampling. This research shows that the scale of observation affected all studied FIA variables albeit to varying degrees, conditioned on the spatial structure and aggregation properties of the assessed FIA variable (i.e. whether the variable is extensive, intensive or scale-specific) and the method used in aggregation on support (e.g. mean, variance, quantile etc.). The scale of observation affected measurements or estimates of the studied FIA variables as well as relationships between spatially structured FIA variables. The scale of observation determined observed heterogeneity in FIA variables, affected parameter retrieval from radiative transfer models, and affected variable selection and performance of models linking terrestrial and remote sensing data. On the other hand, this research shows that it is possible to determine domains of scale dependence within which to efficiently observe the studied FIA variables and to bridge between scales of observation using various scaling methods. The findings of this doctoral research are relevant for the general understanding of scale issues in FIA. Research in Article I, for example, informs optimization of plot sizes for efficient inventory and mapping of forest structural heterogeneity, as well as for the design of natural resource inventories. Similarly, research in Article II is applicable in large area forest (or general land) cover monitoring from sampling by both visual interpretation of high resolution remote sensing imagery and terrestrial surveys. This research is also useful to determine observation design for efficient inventory of land cover. Research in Article III contributes in many contexts of remote sensing assisted inventory of forests especially in management and conservation planning, pest and diseases control and in the estimation of biomass. Lastly, research in Article IV highlights scale-related effects in passive optical remote sensing of forests currently understudied and can ultimately contribute to sensor calibration and modelling approaches

    A Residential Energy Demand System for Spain

    Get PDF
    Sharp price fluctuations and increasing environmental and distributional concerns, among other issues, have led to a renewed academic interest in energy demand. In this paper we estimate, for the first time in Spain, an energy demand system with household microdata. In doing so, we tackle several econometric and data problems that are generally recognized to bias parameter estimates. This is obviously relevant, as obtaining correct price and income responses is essential if they may be used for assessing the economic consequences of hypothetical or real changes. With this objective, we combine data sources for a long time period and choose a demand system with flexible income and price responses. We also estimate the model in different sub-samples to capture varying responses to energy price changes by households living in rural, intermediate and urban areas. This constitutes a first attempt in the literature and it proved to be a very successful choice.households, energy, demand, spain, location

    Three-dimensional hydrodynamic models coupled with GIS-based neuro-fuzzy classification for assessing environmental vulnerability of marine cage aquaculture

    Get PDF
    There is considerable opportunity to develop new modelling techniques within a Geographic Information Systems (GIS) framework for the development of sustainable marine cage culture. However, the spatial data sets are often uncertain and incomplete, therefore new spatial models employing “soft computing” methods such as fuzzy logic may be more suitable. The aim of this study is to develop a model using Neuro-fuzzy techniques in a 3D GIS (Arc View 3.2) to predict coastal environmental vulnerability for Atlantic salmon cage aquaculture. A 3D hydrodynamic model (3DMOHID) coupled to a particle-tracking model is applied to study the circulation patterns, dispersion processes and residence time in Mulroy Bay, Co. Donegal Ireland, an Irish fjard (shallow fjordic system), an area of restricted exchange, geometrically complicated with important aquaculture activities. The hydrodynamic model was calibrated and validated by comparison with sea surface and water flow measurements. The model provided spatial and temporal information on circulation, renewal time, helping to determine the influence of winds on circulation patterns and in particular the assessment of the hydrographic conditions with a strong influence on the management of fish cage culture. The particle-tracking model was used to study the transport and flushing processes. Instantaneous massive releases of particles from key boxes are modelled to analyse the ocean-fjord exchange characteristics and, by emulating discharge from finfish cages, to show the behaviour of waste in terms of water circulation and water exchange. In this study the results from the hydrodynamic model have been incorporated into GIS to provide an easy-to-use graphical user interface for 2D (maps), 3D and temporal visualization (animations), for interrogation of results. v Data on the physical environment and aquaculture suitability were derived from a 3- dimensional hydrodynamic model and GIS for incorporation into the final model framework and included mean and maximum current velocities, current flow quiescence time, water column stratification, sediment granulometry, particulate waste dispersion distance, oxygen depletion, water depth, coastal protection zones, and slope. The Neuro-fuzzy classification model NEFCLASS–J, was used to develop learning algorithms to create the structure (rule base) and the parameters (fuzzy sets) of a fuzzy classifier from a set of classified training data. A total of 42 training sites were sampled using stratified random sampling from the GIS raster data layers, and the vulnerability categories for each were manually classified into four categories based on the opinions of experts with field experience and specific knowledge of the environmental problems investigated. The final products, GIS/based Neuro Fuzzy maps were achieved by combining modeled and real environmental parameters relevant to marine fin fish Aquaculture. Environmental vulnerability models, based on Neuro-fuzzy techniques, showed sensitivity to the membership shapes of the fuzzy sets, the nature of the weightings applied to the model rules, and validation techniques used during the learning and validation process. The accuracy of the final classifier selected was R=85.71%, (estimated error value of ±16.5% from Cross Validation, N=10) with a Kappa coefficient of agreement of 81%. Unclassified cells in the whole spatial domain (of 1623 GIS cells) ranged from 0% to 24.18 %. A statistical comparison between vulnerability scores and a significant product of aquaculture waste (nitrogen concentrations in sediment under the salmon cages) showed that the final model gave a good correlation between predicted environmental vi vulnerability and sediment nitrogen levels, highlighting a number of areas with variable sensitivity to aquaculture. Further evaluation and analysis of the quality of the classification was achieved and the applicability of separability indexes was also studied. The inter-class separability estimations were performed on two different training data sets to assess the difficulty of the class separation problem under investigation. The Neuro-fuzzy classifier for a supervised and hard classification of coastal environmental vulnerability has demonstrated an ability to derive an accurate and reliable classification into areas of different levels of environmental vulnerability using a minimal number of training sets. The output will be an environmental spatial model for application in coastal areas intended to facilitate policy decision and to allow input into wider ranging spatial modelling projects, such as coastal zone management systems and effective environmental management of fish cage aquaculture
