23 research outputs found

    Mrdbscan: An efficient parallel density-based clustering algorithm using mapreduce

    Get PDF
    Abstract-Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scaleup of our work are very efficient

    EaSync: A Transparent File Synchronization Service across Multiple Machines

    No full text
    Part 8: Web, Communication, and Cloud ComputingInternational audienceIn our daily life, people increasingly use multiple machines to do their daily work. As platform switching and file modification are so frequently that a way for file synchronization across multiple machines is required to make the files in synchronized. In this paper, we propose EaSync, a transparent file synchronization service across multiple machines. EaSync proposes several key technologies for file synchronization oriented service, including a timestamp based synchronization protocol, an enhanced deduplication algorithm DS-Dedup. We implement and evaluate the EaSync prototype system. As the result shown, EaSync outperforms other synchronization system in operation latency and other metrics

    Exploration of Ground Truth from Raw GPS Data

    No full text
    To enable smart transportation, a large volume of vehicular GPS trajectory data has been collected in the metropolitanscale Shanghai Grid project. The collected raw GPS data, however, suffers from various errors. Thus, it is inappropriate to use the raw GPS dataset directly for many potential smart transportation applications. Map matching, a process to align the raw GPS data onto the corresponding road network, is a commonly used technique to calibrate the raw GPS data. In practice, however, there is no ground truth data to validate the calibrated GPS data. It is necessary and desirable to have ground truth data to evaluate the effectiveness of various map matching algorithms, especially in complex environments. In this paper, we propose truthFinder, an interactive map matching system for ground truth data exploration. It incorporates traditional map matching algorithms and human intelligence in a unified manner. The accuracy of truthFinder is guaranteed by the observation that a vehicular trajectory can be correctly identified by human-labeling with the help of a period of historical GPS dataset. To the best of our knowledge, truthFinder is the first interactive map matching system trying to explore the ground truth from historical GPS trajectory data. To measure the cost of human interactions, we design a cost model that classifies and quantifies user operations. Having the guaranteed accuracy, truthFinder is evaluated in terms of operation cost. The results show that truthFinder makes the cost of map matching process up to two orders of magnitude less than the pure human-labeling approach

    MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce

    No full text
    Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scaleup of our work are very efficient. © 2011 IEEE

    Positive Effects of Reforestation on the Diversity and Abundance of Soil Fauna in a Landscape Degraded Red Soil Area in Subtropical China

    No full text
    Serious soil degradation due to human intervention in subtropical China has resulted in a series of ecological problems. Soil fauna is an important part of forest soil ecosystems and plays a vital role in the maintenance of soil quality and can sensitively reflect the soil disturbances caused by human activities. This study assessed the long-term effects of reforestation on the soil fauna community and underground food web. Soil fauna was sampled from plots in a 30-year reforestation positioning test site. Six reforestation models (the pure Schima superba (Ss) forest, pure Liquidambar formosana (Lf) forest, pure Pinus massoniana (Pm) forest, mixed forest of Lf & Ss, mixed forest of Pm & Ss, and the mixed forest of Lf & Pm) were chosen in Taihe County, southern China. The results found that the mixed vegetation restoration of Lf & Pm significantly improved the soil fauna abundance and biomass when compared with other reforestation models in the degraded red soil region. Acari and Collembola accounted for 65.8% and 23.3%, respectively, of the total soil fauna abundance in the region. The mixed forest of Lf & Pm had a positive effect on the abundance of secondary decomposers and micro predators in Acari. Moreover, a significant increase in the abundance of Collembola was found in the Lf & Pm stand type. The stand type with the highest soil faunal population also had a higher soil fauna biomass. Therefore, reforestation in a degraded red soil area had positive effects on the soil fauna community

    Positive Effects of Reforestation on the Diversity and Abundance of Soil Fauna in a Landscape Degraded Red Soil Area in Subtropical China

    No full text
    Serious soil degradation due to human intervention in subtropical China has resulted in a series of ecological problems. Soil fauna is an important part of forest soil ecosystems and plays a vital role in the maintenance of soil quality and can sensitively reflect the soil disturbances caused by human activities. This study assessed the long-term effects of reforestation on the soil fauna community and underground food web. Soil fauna was sampled from plots in a 30-year reforestation positioning test site. Six reforestation models (the pure Schima superba (Ss) forest, pure Liquidambar formosana (Lf) forest, pure Pinus massoniana (Pm) forest, mixed forest of Lf & Ss, mixed forest of Pm & Ss, and the mixed forest of Lf & Pm) were chosen in Taihe County, southern China. The results found that the mixed vegetation restoration of Lf & Pm significantly improved the soil fauna abundance and biomass when compared with other reforestation models in the degraded red soil region. Acari and Collembola accounted for 65.8% and 23.3%, respectively, of the total soil fauna abundance in the region. The mixed forest of Lf & Pm had a positive effect on the abundance of secondary decomposers and micro predators in Acari. Moreover, a significant increase in the abundance of Collembola was found in the Lf & Pm stand type. The stand type with the highest soil faunal population also had a higher soil fauna biomass. Therefore, reforestation in a degraded red soil area had positive effects on the soil fauna community

    Metallic nanocrystals with low angle grain boundary for controllable plastic reversibility

    No full text
    Improving the reversible plastic deformability and damage tolerance of nanosized metals remains challenging. Here, the authors custom-design low angle grain boundaries in metallic bicrystals to achieve controllable plastic reversibility via fully conservative grain boundary migration
    corecore