12,012,567 research outputs found
Point Information Gain and Multidimensional Data Analysis
We generalize the Point information gain (PIG) and derived quantities, i.e.
Point information entropy (PIE) and Point information entropy density (PIED),
for the case of R\'enyi entropy and simulate the behavior of PIG for typical
distributions. We also use these methods for the analysis of multidimensional
datasets. We demonstrate the main properties of PIE/PIED spectra for the real
data on the example of several images, and discuss possible further utilization
in other fields of data processing.Comment: 16 pages, 6 figure
Data Aggregation and Information Loss
Analysts often use a single average or otherwise aggregated price series to represent several geographic or product markets even when disaggregate data are available. We hypothesize that such an approach may not be appropriate under some circumstances, such as when only long-term relationships hold among price series or when homogeneous but relatively perishable products are considered. This question is of particular relevance in agriculture because of seasonality in production and harvest across various production regions, and the effect of changes in demand as substitute crops become available. We analyze this question in the context of fresh strawberry production. We find that in the case of the strawberry market, aggregate series are appropriate for long-term decision analysis, but some information loss occurs when conducting short-term decision analysis.strawberry, price, cointegration, Granger causality, average price, Research Methods/ Statistical Methods,
Temporal Data Modeling and Reasoning for Information Systems
Temporal knowledge representation and reasoning is a major research field in Artificial
Intelligence, in Database Systems, and in Web and Semantic Web research. The ability to
model and process time and calendar data is essential for many applications like appointment
scheduling, planning, Web services, temporal and active database systems, adaptive
Web applications, and mobile computing applications. This article aims at three complementary
goals. First, to provide with a general background in temporal data modeling
and reasoning approaches. Second, to serve as an orientation guide for further specific
reading. Third, to point to new application fields and research perspectives on temporal
knowledge representation and reasoning in the Web and Semantic Web
Bibliography on Optical Information and Data Processing
Bibliography on optical information and data processin
Data curation standards and social science occupational information resources
Occupational information resources - data about the characteristics of different occupational positions - are widely used in the social sciences, across a range of disciplines and international contexts. They are available in many formats, most often constituting small electronic files that are made freely downloadable from academic web-pages. However there are several challenges associated with how occupational information resources are distributed to, and exploited by, social researchers. In this paper we describe features of occupational information resources, and indicate the role digital curation can play in exploiting them. We report upon the strategies used in the GEODE research project (Grid Enabled Occupational Data Environment, http://www.geode.stir.ac.uk). This project attempts to develop long-term standards for the distribution of occupational information resources, by providing a standardized framework-based electronic depository for occupational information resources, and by providing a data indexing service, based on e-Science middleware, which collates occupational information resources and makes them readily accessible to non-specialist social scientists
Distribution of Mutual Information from Complete and Incomplete Data
Mutual information is widely used, in a descriptive way, to measure the
stochastic dependence of categorical random variables. In order to address
questions such as the reliability of the descriptive value, one must consider
sample-to-population inferential approaches. This paper deals with the
posterior distribution of mutual information, as obtained in a Bayesian
framework by a second-order Dirichlet prior distribution. The exact analytical
expression for the mean, and analytical approximations for the variance,
skewness and kurtosis are derived. These approximations have a guaranteed
accuracy level of the order O(1/n^3), where n is the sample size. Leading order
approximations for the mean and the variance are derived in the case of
incomplete samples. The derived analytical expressions allow the distribution
of mutual information to be approximated reliably and quickly. In fact, the
derived expressions can be computed with the same order of complexity needed
for descriptive mutual information. This makes the distribution of mutual
information become a concrete alternative to descriptive mutual information in
many applications which would benefit from moving to the inductive side. Some
of these prospective applications are discussed, and one of them, namely
feature selection, is shown to perform significantly better when inductive
mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table
Accessing Earth science data from the EOS data and information system
An overview of the Earth Observing System Data and Information System (EOSDIS) is presented, concentrating on the users' interactions with the system and highlighting those features that are driven by the unique requirements of the Global Change Research Program and the supported science community. However, a basic premise of the EOSDIS is that the system must evolve to meet changes in user needs and to incorporate advances in data system technology. Therefore, the development process which is being used to accommodate these changes and some of the potential areas of change are also addressed
Could Data Broker Information Threaten Physician Prescribing and Professional Behavior?
Privacy is threatened by the extent of data collected and sold by consumer data brokers. Physicians, as individual consumers, leave a ‘data trail’ in the offline (e.g. through traditional shopping) and online worlds (e.g. through online purchases and use of social media). Such data could easily and legally be used without a physician’s knowledge or consent to influence prescribing practices or other physician professional behavior. We sought to determine the extent to which such consumer data was available on a sample of more than 3,000 physicians, healthcare faculty and healthcare system staff at one university’s health units. Using just work email addresses for these employees we cheaply and quickly obtained external data on nearly two thirds of employees on demographic characteristics (e.g. income, top 10% national wealth, children at home, married), purchases (e.g. baby products, cooking, sports), behavior (e.g. charitable donor, discount shopper) and interests (e.g. automotive, health and wellness). Consumer data brokers have valuable, cost-effective and detailed information on many healthcare professionals, including data that could be used to segment, target, detail and generally market to physicians in ways that seem under‐appreciated. We call for greater attention to this potential aspect of physician-industry relationships
- …
