90,499 research outputs found

    Visualizing and Quantifying Impact and Effect in Twitter Narrative using Geometric Data Analysis

    Full text link
    We use geometric multivariate data analysis which has been termed a methodology for both the visualization and verbalization of data. The general objectives are data mining and knowledge discovery. In the first case study, we use the narrative surrounding very highly profiled tweets, and thus a Twitter event of significance and importance. In the second case study, we use eight carefully planned Twitter campaigns relating to environmental issues. The aim of these campaigns was to increase environmental awareness and behaviour. Unlike current marketing, political and other communication campaigns using Twitter, we develop an innovative approach to measuring bevavioural change. We show also how we can assess statistical significance of social media behaviour.Comment: 34 pages, 11 figure

    Conceptual framework of a novel hybrid methodology between computational fluid dynamics and data mining techniques for medical dataset application

    Get PDF
    This thesis proposes a novel hybrid methodology that couples computational fluid dynamic (CFD) and data mining (DM) techniques that is applied to a multi-dimensional medical dataset in order to study potential disease development statistically. This approach allows an alternate solution for the present tedious and rigorous CFD methodology being currently adopted to study the influence of geometric parameters on hemodynamics in the human abdominal aortic aneurysm. This approach is seen as a “marriage” between medicine and computer domains

    MINING CONCEPT IN BIG DATA

    Get PDF
    To fruitful using big data, data mining is necessary. There are two well-known methods, one is based on apriori principle, and the other one is based on FP-tree. In this project we explore a new approach that is based on simplicial complex, which is a combinatorial form of polyhedron used in algebraic topology. Our approach, similar to FP-tree, is top down, at the same time, it is based on apriori principle in geometric form, called closed condition in simplicial complex. Our method is almost 300 times faster than FP-growth on a real world database using a SJSU laptop. The database is provided by hospital of National Taiwan University. It has 65536 transactions and 1257 columns in bit form. Our major work is mining concepts from big text data; this project is the core engine of the concept based semantic search engine

    A Collaborative Kalman Filter for Time-Evolving Dyadic Processes

    Full text link
    We present the collaborative Kalman filter (CKF), a dynamic model for collaborative filtering and related factorization models. Using the matrix factorization approach to collaborative filtering, the CKF accounts for time evolution by modeling each low-dimensional latent embedding as a multidimensional Brownian motion. Each observation is a random variable whose distribution is parameterized by the dot product of the relevant Brownian motions at that moment in time. This is naturally interpreted as a Kalman filter with multiple interacting state space vectors. We also present a method for learning a dynamically evolving drift parameter for each location by modeling it as a geometric Brownian motion. We handle posterior intractability via a mean-field variational approximation, which also preserves tractability for downstream calculations in a manner similar to the Kalman filter. We evaluate the model on several large datasets, providing quantitative evaluation on the 10 million Movielens and 100 million Netflix datasets and qualitative evaluation on a set of 39 million stock returns divided across roughly 6,500 companies from the years 1962-2014.Comment: Appeared at 2014 IEEE International Conference on Data Mining (ICDM

    Mining Discriminative Triplets of Patches for Fine-Grained Classification

    Full text link
    Fine-grained classification involves distinguishing between similar sub-categories based on subtle differences in highly localized regions; therefore, accurate localization of discriminative regions remains a major challenge. We describe a patch-based framework to address this problem. We introduce triplets of patches with geometric constraints to improve the accuracy of patch localization, and automatically mine discriminative geometrically-constrained triplets for classification. The resulting approach only requires object bounding boxes. Its effectiveness is demonstrated using four publicly available fine-grained datasets, on which it outperforms or achieves comparable performance to the state-of-the-art in classification

    The risk of collapse in abandoned mine sites: the issue of data uncertainty

    Get PDF
    Ground collapses over abandoned underground mines constitute a new environmental risk in the world. The high risk associated with subsurface voids, together with lack of knowledge of the geometric and geomechanical features of mining areas, makes abandoned underground mines one of the current challenges for countries with a long mining history. In this study, a stability analysis of Montevecchia marl mine is performed in order to validate a general approach that takes into account the poor local information and the variability of the input data. The collapse risk was evaluated through a numerical approach that, starting with some simplifying assumptions, is able to provide an overview of the collapse probability. The nal results is an easy-accessible-transparent summary graph that shows the collapse probability. This approach may be useful for public administrators called upon to manage this environmental risk. The approach tries to simplify this complex problem in order to achieve a roughly risk assessment, but, since it relies on just a small amount of information, any nal user should be aware that a comprehensive and detailed risk scenario can be generated only through more exhaustive investigations

    The risk of collapse in abandoned mine sites: the issue of data uncertainty

    Get PDF
    Ground collapses over abandoned underground mines constitute a new environmental risk in the world. The high risk associated with subsurface voids, together with lack of knowledge of the geometric and geomechanical features of mining areas, makes abandoned underground mines one of the current challenges for countries with a long mining history. In this study, a stability analysis of Montevecchia marl mine is performed in order to validate a general approach that takes into account the poor local information and the variability of the input data. The collapse risk was evaluated through a numerical approach that, starting with some simplifying assumptions, is able to provide an overview of the collapse probability. The nal results is an easy-accessible-transparent summary graph that shows the collapse probability. This approach may be useful for public administrators called upon to manage this environmental risk. The approach tries to simplify this complex problem in order to achieve a roughly risk assessment, but, since it relies on just a small amount of information, any nal user should be aware that a comprehensive and detailed risk scenario can be generated only through more exhaustive investigations

    Monitoring land use changes using geo-information : possibilities, methods and adapted techniques

    Get PDF
    Monitoring land use with geographical databases is widely used in decision-making. This report presents the possibilities, methods and adapted techniques using geo-information in monitoring land use changes. The municipality of Soest was chosen as study area and three national land use databases, viz. Top10Vector, CBS land use statistics and LGN, were used. The restrictions of geo-information for monitoring land use changes are indicated. New methods and adapted techniques improve the monitoring result considerably. Providers of geo-information, however, should coordinate on update frequencies, semantic content and spatial resolution to allow better possibilities of monitoring land use by combining data sets

    Never, once, and repeated illness: a geometric view for insights and interpretations

    Get PDF
    Background: Medical/health researchers depend on data evidence for knowledge discovery. At times, data analysis to capture the data evidence is overwhelming and the process becomes too tedious to give up the attempt. A prudent thing to do is to seek out a simpler visual approach to obtain insights. One visual approach is devised in this article to understand what the data are really revealing to either get an insight first or then confirm what is intuitively configured by the medical concepts. This visual approach is geometric concepts based. In specific, triangle is employed in this new and novel approach.  Methods: A successful treatment of any illness is a consequence of knowledge build-up arising from data mining about the never, once, or repeated episode of a disease incidence in a patient. This article investigates and illustrates a novel and pioneering geometric approach, especially based on the properties of triangle, to extract hidden evidence in the data. New probabilistic expressions are derived utilizing trigonometric relations among the corner points of a triangle. The conceptual contents of this article are versatile enough for different medical/health data analysis.Results: For illustration here, the medical binomial data in Hopper et al. (Genetic Epidemiology, 1990) on the occurrence of asthma or hay fever among the four groups: (1) monozygotic females (MZF), (2) monozygotic males (MZM), (3) di-zygotic females (DZF), and (4) di-zygotic males (DZM) are considered and triangularly interpreted. The results indicate that the angle in the vertex representing one episode is the largest compared to the other two angles in the vertices representing never or repeated episode of an illness among a random sample of twins from these four groups with respect to getting asthma or hay fever. This geometric finding implies that the event of never and the event of repeated incidence of the illness have farthest Euclidean distance in probability sense. In other words, the never and repeated incidences are not in close proximity as probable.Conclusions: This geometric view of this article is versatile enough to be useful in other research studies in drug assessment, clinical trial outcomes, business, marketing, finance, economics, engineering and public health whether the data are Poisson or inverse binomial type as well.
    • …
    corecore