27 research outputs found

    A Novel Kernel for Text Classification Based on Semantic and Statistical Information

    Get PDF
    In text categorization, a document is usually represented by a vector space model which can accomplish the classification task, but the model cannot deal with Chinese synonyms and polysemy phenomenon. This paper presents a novel approach which takes into account both the semantic and statistical information to improve the accuracy of text classification. The proposed approach computes semantic information based on HowNet and statistical information based on a kernel function with class-based weighting. According to our experimental results, the proposed approach could achieve state-of-the-art or competitive results as compared with traditional approaches such as the k-Nearest Neighbor (KNN), the Naive Bayes and deep learning models like convolutional networks

    Aboveground Forest Biomass Estimation with Landsat and LiDAR Data and Uncertainty Analysis of the Estimates

    Get PDF
    Landsat Thematic mapper (TM) image has long been the dominate data source, and recently LiDAR has offered an important new structural data stream for forest biomass estimations. On the other hand, forest biomass uncertainty analysis research has only recently obtained sufficient attention due to the difficulty in collecting reference data. This paper provides a brief overview of current forest biomass estimation methods using both TM and LiDAR data. A case study is then presented that demonstrates the forest biomass estimation methods and uncertainty analysis. Results indicate that Landsat TM data can provide adequate biomass estimates for secondary succession but are not suitable for mature forest biomass estimates due to data saturation problems. LiDAR can overcome TM’s shortcoming providing better biomass estimation performance but has not been extensively applied in practice due to data availability constraints. The uncertainty analysis indicates that various sources affect the performance of forest biomass/carbon estimation. With that said, the clear dominate sources of uncertainty are the variation of input sample plot data and data saturation problem related to optical sensors. A possible solution to increasing the confidence in forest biomass estimates is to integrate the strengths of multisensor data

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    CLOTHO: a large-scale Internet of Things based crowd evacuation planning system for disaster management

    Get PDF
    In recent years, different kinds of natural hazards or man-made disasters happened that were diversified and difficult to control with heavy casualties. In this work, we focus on the rapid and systematic evacuation of large-scale densities of people after disasters to reduce loss in an effective manner. The optimal evacuation planning is a key challenge and becomes a hotspot of research and development. We design our system based on an Internet of Things (IoT) scenario that utilizes a mobile Cloud computing platform in order to develop the Crowd Lives Oriented Track and Help Optimizition system (CLOTHO). CLOTHO is an evacuation planning system for large-scale densities of people in disasters. It includes the mobile terminal (IoT side) for data collection and the Cloud backend system for storage and analytics. We build our solution upon a typical IoT/fog disaster management scenario and we propose an IoT application based on an evacuation planning algorithm that uses the Artificial Potential Field (APF), which is the core of CLOTHO. APF is conceptualized as an IoT service, and can determine the direction of evacuation automatically according to the gradient direction of the potential field, suitable for rapid evacuation of large population. People are usually in panic, which easily causes the chaos of evacuation and brings secondary disasters. Based on APF, we propose an evacuation planning algorithm names as Artificial Potential Field with Relationship Attraction (APF-RA). APF-RA guides the evacuees with relationship to move to the same shelter as much as possible, to calm evacuees and realize a more humanitarian evacuation. The experimental results show that CLOTHO (using APF and APF-RA) can effectively improve convergence rate, shorten the evacuation route length and evacuation time, and make the remaining capacity of the surrounding shelters well balanced

    Effect of measurement errors on the estimation of tree biomass

    No full text
    Diameter at breast height (DBH) is commonly used to predict the aboveground biomass (AGB) of forests and to derive biomass models for single trees; however, there is evidence that measurement errors of DBH have not been previously considered. In this study, two types of measurement errors were evaluated: errors in national forest inventory data (NFID) and errors in a calibration data set (CDS). Using Monte Carlo simulations, the uncertainties arising from these two measurement errors were quantified. In addition, the effects of measurement errors on estimates under different error assumptions were analyzed to determine how these two uncertainties change with increasing errors. The results show that CDS measurement error contributes more to the total uncertainty, whereas NFID measurement error has a negligible effect on estimating the biomass of regional forests. The uncertainties of both types of measurement error increased with increasing error assumptions; however, the uncertainties caused by CDS measurement error were noticeably larger than those caused by NFID measurement error. Thus, the greatest potential for reducing uncertainties caused by measurement error lies in increasing the accuracy of DBH measurements in CDS.The accepted manuscript in pdf format is listed with the files at the bottom of this page. The presentation of the authors' names and (or) special characters in the title of the manuscript may differ slightly between what is listed on this page and what is listed in the pdf file of the accepted manuscript; that in the pdf file of the accepted manuscript is what was submitted by the author
    corecore