47,322 research outputs found
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
Multi-Level Visual Alphabets
A central debate in visual perception theory is the argument for indirect versus direct perception; i.e., the use of intermediate, abstract, and hierarchical representations versus direct semantic interpretation of images through interaction with the outside world. We present a content-based representation that combines both approaches. The previously developed Visual Alphabet method is extended with a hierarchy of representations, each level feeding into the next one, but based on features that are not abstract but directly relevant to the task at hand. Explorative benchmark experiments are carried out on face images to investigate and explain the impact of the key parameters such as pattern size, number of prototypes, and distance measures used. Results show that adding an additional middle layer improves results, by encoding the spatial co-occurrence of lower-level pattern prototypes
Impact of the spatial context on human communication activity
Technology development produces terabytes of data generated by hu- man
activity in space and time. This enormous amount of data often called big data
becomes crucial for delivering new insights to decision makers. It contains
behavioral information on different types of human activity influenced by many
external factors such as geographic infor- mation and weather forecast. Early
recognition and prediction of those human behaviors are of great importance in
many societal applications like health-care, risk management and urban
planning, etc. In this pa- per, we investigate relevant geographical areas
based on their categories of human activities (i.e., working and shopping)
which identified from ge- ographic information (i.e., Openstreetmap). We use
spectral clustering followed by k-means clustering algorithm based on TF/IDF
cosine simi- larity metric. We evaluate the quality of those observed clusters
with the use of silhouette coefficients which are estimated based on the
similari- ties of the mobile communication activity temporal patterns. The area
clusters are further used to explain typical or exceptional communication
activities. We demonstrate the study using a real dataset containing 1 million
Call Detailed Records. This type of analysis and its application are important
for analyzing the dependency of human behaviors from the external factors and
hidden relationships and unknown correlations and other useful information that
can support decision-making.Comment: 12 pages, 11 figure
Language in Our Time: An Empirical Analysis of Hashtags
Hashtags in online social networks have gained tremendous popularity during
the past five years. The resulting large quantity of data has provided a new
lens into modern society. Previously, researchers mainly rely on data collected
from Twitter to study either a certain type of hashtags or a certain property
of hashtags. In this paper, we perform the first large-scale empirical analysis
of hashtags shared on Instagram, the major platform for hashtag-sharing. We
study hashtags from three different dimensions including the temporal-spatial
dimension, the semantic dimension, and the social dimension. Extensive
experiments performed on three large-scale datasets with more than 7 million
hashtags in total provide a series of interesting observations. First, we show
that the temporal patterns of hashtags can be categorized into four different
clusters, and people tend to share fewer hashtags at certain places and more
hashtags at others. Second, we observe that a non-negligible proportion of
hashtags exhibit large semantic displacement. We demonstrate hashtags that are
more uniformly shared among users, as quantified by the proposed hashtag
entropy, are less prone to semantic displacement. In the end, we propose a
bipartite graph embedding model to summarize users' hashtag profiles, and rely
on these profiles to perform friendship prediction. Evaluation results show
that our approach achieves an effective prediction with AUC (area under the ROC
curve) above 0.8 which demonstrates the strong social signals possessed in
hashtags.Comment: WWW 201
Mapping Local and Regional Potentials for Inter-sectoral Technology Flows in Industrial Clusters â Empirical Results for Germany
The paper explores the potential for inter-sectoral technology flows in industrial clusters in Germany. With the help of a product-embodied R&D flow matrix, calculated using data on inputâoutput tables and sectoral R&D employment, we construct industrial cluster based networks of technology provider and user relationships and examine the regional embeddedness of different sectors in the technological diffusion network of industrial clusters. As a result, the paper shows that simple graphical representations of relevant product-embodied R&D flows illustrate substantial differences in potentials for technological relations within industrial clusters.
A Decision Technology System To Advance the Diagnosis and Treatment of Breast Cancer
Geographical variations in cancer rates have been observed for decades. Described spatial patterns and trends have provided clues for generating hypotheses about the etiology of cancer. For breast cancer, investigators have demonstrated that some variation can be explained by differences in the population distribution of known breast cancer risk factors such as menstrual and reproductive variables (Laden, Spiegelman, and Neas, 1997; Robbins, Bescianini, and Kelsey, 1997; Sturgeon, Schairer, and Gail, 1995). However, regional patterns also may reflect the effects of Workshop on Hormones, Hormone Metabolism, Environment, and Breast Cancer (1995): (a) environmental hazards (such as air and water pollution), (b) demographics and the lifestyle of a mobile population, (c) subgroup susceptibility, (d) changes and advances in medical practice and healthcare management, and (e) other factors. To accurately measure breast cancer risk in individuals and population groups, it is necessary to singly and jointly assess the association between such risk and the hypothesized factors. Various statistical models will be needed to determine the potential relationships between breast cancer development and estimated exposures to environmental contamination. To apply the models, data must be assembled from a variety of sources, converted into the statistical modelsâ parameters, and delivered effectively to researchers and policy makers. A Web-enabled decision technology system can be developed to provide the needed functionality. This chapter will present a conceptual architecture for such a decision technology system. First, there will be a brief overview of a typical geographical analysis. Next, the chapter will present the conceptual Web-based decision technology system and illustrate how the system can assist users in diagnosing and treating breast cancer. The chapter will conclude with an examination of the potential benefits from system use and the implications for breast cancer research and practice
BCAS: A Web-enabled and GIS-based Decision Support System for the Diagnosis and Treatment of Breast Cancer
For decades, geographical variations in cancer rates have been observed but the precise determinants of such geographic differences in breast cancer development are unclear. Various statistical models have been proposed. Applications of these models, however, require that the data be assembled from a variety of sources, converted into the statistical modelsâ parameters and delivered effectively to researchers and policy makers. A web-enabled and GIS-based system can be developed to provide the needed functionality. This article overviews the conceptual web-enabled and GIS-based system (BCAS), illustrates the systemâs use in diagnosing and treating breast cancer and examines the potential benefits and implications for breast cancer research and practice
- âŚ