12,012,567 research outputs found

    Point Information Gain and Multidimensional Data Analysis

    Full text link
    We generalize the Point information gain (PIG) and derived quantities, i.e. Point information entropy (PIE) and Point information entropy density (PIED), for the case of R\'enyi entropy and simulate the behavior of PIG for typical distributions. We also use these methods for the analysis of multidimensional datasets. We demonstrate the main properties of PIE/PIED spectra for the real data on the example of several images, and discuss possible further utilization in other fields of data processing.Comment: 16 pages, 6 figure

    Data Aggregation and Information Loss

    Get PDF
    Analysts often use a single average or otherwise aggregated price series to represent several geographic or product markets even when disaggregate data are available. We hypothesize that such an approach may not be appropriate under some circumstances, such as when only long-term relationships hold among price series or when homogeneous but relatively perishable products are considered. This question is of particular relevance in agriculture because of seasonality in production and harvest across various production regions, and the effect of changes in demand as substitute crops become available. We analyze this question in the context of fresh strawberry production. We find that in the case of the strawberry market, aggregate series are appropriate for long-term decision analysis, but some information loss occurs when conducting short-term decision analysis.strawberry, price, cointegration, Granger causality, average price, Research Methods/ Statistical Methods,

    Temporal Data Modeling and Reasoning for Information Systems

    Get PDF
    Temporal knowledge representation and reasoning is a major research field in Artificial Intelligence, in Database Systems, and in Web and Semantic Web research. The ability to model and process time and calendar data is essential for many applications like appointment scheduling, planning, Web services, temporal and active database systems, adaptive Web applications, and mobile computing applications. This article aims at three complementary goals. First, to provide with a general background in temporal data modeling and reasoning approaches. Second, to serve as an orientation guide for further specific reading. Third, to point to new application fields and research perspectives on temporal knowledge representation and reasoning in the Web and Semantic Web

    Bibliography on Optical Information and Data Processing

    Get PDF
    Bibliography on optical information and data processin

    Data curation standards and social science occupational information resources

    Get PDF
    Occupational information resources - data about the characteristics of different occupational positions - are widely used in the social sciences, across a range of disciplines and international contexts. They are available in many formats, most often constituting small electronic files that are made freely downloadable from academic web-pages. However there are several challenges associated with how occupational information resources are distributed to, and exploited by, social researchers. In this paper we describe features of occupational information resources, and indicate the role digital curation can play in exploiting them. We report upon the strategies used in the GEODE research project (Grid Enabled Occupational Data Environment, http://www.geode.stir.ac.uk). This project attempts to develop long-term standards for the distribution of occupational information resources, by providing a standardized framework-based electronic depository for occupational information resources, and by providing a data indexing service, based on e-Science middleware, which collates occupational information resources and makes them readily accessible to non-specialist social scientists

    Distribution of Mutual Information from Complete and Incomplete Data

    Full text link
    Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(1/n^3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table

    Accessing Earth science data from the EOS data and information system

    Get PDF
    An overview of the Earth Observing System Data and Information System (EOSDIS) is presented, concentrating on the users' interactions with the system and highlighting those features that are driven by the unique requirements of the Global Change Research Program and the supported science community. However, a basic premise of the EOSDIS is that the system must evolve to meet changes in user needs and to incorporate advances in data system technology. Therefore, the development process which is being used to accommodate these changes and some of the potential areas of change are also addressed

    Could Data Broker Information Threaten Physician Prescribing and Professional Behavior?

    Get PDF
    Privacy is threatened by the extent of data collected and sold by consumer data brokers. Physicians, as individual consumers, leave a ‘data trail’ in the offline (e.g. through traditional shopping) and online worlds (e.g. through online purchases and use of social media). Such data could easily and legally be used without a physician’s knowledge or consent to influence prescribing practices or other physician professional behavior. We sought to determine the extent to which such consumer data was available on a sample of more than 3,000 physicians, healthcare faculty and healthcare system staff at one university’s health units. Using just work email addresses for these employees we cheaply and quickly obtained external data on nearly two thirds of employees on demographic characteristics (e.g. income, top 10% national wealth, children at home, married), purchases (e.g. baby products, cooking, sports), behavior (e.g. charitable donor, discount shopper) and interests (e.g. automotive, health and wellness). Consumer data brokers have valuable, cost-effective and detailed information on many healthcare professionals, including data that could be used to segment, target, detail and generally market to physicians in ways that seem under‐appreciated. We call for greater attention to this potential aspect of physician-industry relationships
    corecore