14,180 research outputs found
Grids and the Virtual Observatory
We consider several projects from astronomy that benefit from the Grid paradigm and
associated technology, many of which involve either massive datasets or the federation
of multiple datasets. We cover image computation (mosaicking, multi-wavelength
images, and synoptic surveys); database computation (representation through XML,
data mining, and visualization); and semantic interoperability (publishing, ontologies,
directories, and service descriptions)
Towards trajectory anonymization: a generalization-based approach
Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing
anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques
Mining health knowledge graph for health risk prediction
Nowadays classification models have been widely adopted in healthcare, aiming at supporting practitioners for disease diagnosis and human error reduction. The challenge is utilising effective methods to mine real-world data in the medical domain, as many different models have been proposed with varying results. A large number of researchers focus on the diversity problem of real-time data sets in classification models. Some previous works developed methods comprising of homogeneous graphs for knowledge representation and then knowledge discovery. However, such approaches are weak in discovering different relationships among elements. In this paper, we propose an innovative classification model for knowledge discovery from patients’ personal health repositories. The model discovers medical domain knowledge from the massive data in the National Health and Nutrition Examination Survey (NHANES). The knowledge is conceptualised in a heterogeneous knowledge graph. On the basis of the model, an innovative method is developed to help uncover potential diseases suffered by people and, furthermore, to classify patients’ health risk. The proposed model is evaluated by comparison to a baseline model also built on the NHANES data set in an empirical experiment. The performance of proposed model is promising. The paper makes significant contributions to the advancement of knowledge in data mining with an innovative classification model specifically crafted for domain-based data. In addition, by accessing the patterns of various observations, the research contributes to the work of practitioners by providing a multifaceted understanding of individual and public health
A network approach for managing and processing big cancer data in clouds
Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data
Knowing Your Population: Privacy-Sensitive Mining of Massive Data
Location and mobility patterns of individuals are important to environmental
planning, societal resilience, public health, and a host of commercial
applications. Mining telecommunication traffic and transactions data for such
purposes is controversial, in particular raising issues of privacy. However,
our hypothesis is that privacy-sensitive uses are possible and often beneficial
enough to warrant considerable research and development efforts. Our work
contends that peoples behavior can yield patterns of both significant
commercial, and research, value. For such purposes, methods and algorithms for
mining telecommunication data to extract commonly used routes and locations,
articulated through time-geographical constructs, are described in a case study
within the area of transportation planning and analysis. From the outset, these
were designed to balance the privacy of subscribers and the added value of
mobility patterns derived from their mobile communication traffic and
transactions data. Our work directly contrasts the current, commonly held
notion that value can only be added to services by directly monitoring the
behavior of individuals, such as in current attempts at location-based
services. We position our work within relevant legal frameworks for privacy and
data protection, and show that our methods comply with such requirements and
also follow best-practice
- …