7 research outputs found
Big Data and Causality
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Causality analysis continues to remain one of the fundamental research questions and the ultimate objective for a tremendous amount of scientific studies. In line with the rapid progress of science and technology, the age of big data has significantly influenced the causality analysis on various disciplines especially for the last decade due to the fact that the complexity and difficulty on identifying causality among big data has dramatically increased. Data mining, the process of uncovering hidden information from big data is now an important tool for causality analysis, and has been extensively exploited by scholars around the world. The primary aim of this paper is to provide a concise review of the causality analysis in big data. To this end the paper reviews recent significant applications of data mining techniques in causality analysis covering a substantial quantity of research to date, presented in chronological order with an overview table of data mining applications in causality analysis domain as a reference directory
Constraint-based clustering in large databases
Constrained clustering finding clusters that satisfy user-specified constraints-is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show that traditional clustering algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed in this study which starts by finding an initial solution that satisfies user-specified constraints and then refines the solution by performing confined object movements under constraints. Our algorithm consists of two phases: pivot movement and deadlock resolution. For both phases, we show that finding the optimal solution is NP-hard. We then propose several heuristics and show how our algorithm can scale up for large data sets using the heuristic of micro-cluster sharing. By experiments, we show the effectiveness and efficiency of the heuristics
Abundance and sources of ambient dioxins in Hong Kong: A review of dioxin measurements from 1997 to 2001
Ambient measurements of seventeen 2,3,7,8-polychlorinated dibenzo-p-dioxin/dibenzofuran congeners (2,3,7,8-PCDD/Fs) have been taken in a number of monitoring programs or ad-hoc studies in Hong Kong. The longest monitoring program started at two locations in the territory in July 1997. The other monitoring efforts are ad-hoc studies, varying from a few coordinated sampling events at multiple sites to a year-long monitoring project that targeted suspected local dioxin sources. In this paper, we examined these measurements to understand the ambient levels, temporal and spatial variation, and possible sources of the 2,3,7,8-PCDD/Fs in Hong Kong. The territory-wide annual average concentration of the dioxins was 0.052 pg I-TEQ/m(3) measured at the regular monitoring stations in the most recent annual cycle of 2000/2001. This level fell at the lower end of the range of dioxin concentrations measured at other urban locations around the world. The dioxin levels showed a clear seasonality in that elevated concentrations were observed in the winter and lower concentrations in the summer at all monitoring sites with one year or more regular measurements. The measurements indicated that the few known local dioxin sources, including a major chemical waste incinerator facility, landfill sites, and vehicular traffic, are not important contributors to ambient dioxins in Hong Kong. On days of high dioxin concentrations, the 2,3,7,8-PCDD/F congeners were observed to have almost identical compositions with a north-northwest to south-southeast spatial gradient in concentrations at different sampling locations in Hong Kong. This observation, along with other collaborative evidence, established a strong link between high dioxin concentration days in Hong Kong and regional transport of the polluted air masses from the north. (c) 2005 Elsevier Ltd. All rights reserved