705 research outputs found

    Methodological Issues in Building, Training, and Testing Artificial Neural Networks

    Full text link
    We review the use of artificial neural networks, particularly the feedforward multilayer perceptron with back-propagation for training (MLP), in ecological modelling. Overtraining on data or giving vague references to how it was avoided is the major problem. Various methods can be used to determine when to stop training in artificial neural networks: 1) early stopping based on cross-validation, 2) stopping after a analyst defined error is reached or after the error levels off, 3) use of a test data set. We do not recommend the third method as the test data set is then not independent of model development. Many studies used the testing data to optimize the model and training. Although this method may give the best model for that set of data it does not give generalizability or improve understanding of the study system. The importance of an independent data set cannot be overemphasized as we found dramatic differences in model accuracy assessed with prediction accuracy on the training data set, as estimated with bootstrapping, and from use of an independent data set. The comparison of the artificial neural network with a general linear model (GLM) as a standard procedure is recommended because a GLM may perform as well or better than the MLP. MLP models should not be treated as black box models but instead techniques such as sensitivity analyses, input variable relevances, neural interpretation diagrams, randomization tests, and partial derivatives should be used to make the model more transparent, and further our ecological understanding which is an important goal of the modelling process. Based on our experience we discuss how to build a MLP model and how to optimize the parameters and architecture.Comment: 22 pages, 2 figures. Presented in ISEI3 (2002). Ecological Modelling in pres

    Probabilistic Mapping and Spatial Pattern Analysis of Grazing Lawns in Southern African Savannahs Using WorldView-3 Imagery and Machine Learning Techniques

    Get PDF
    Savannah grazing lawns are a key food resource for large herbivores such as blue wildebeest (Connochaetes taurinus), hippopotamus (Hippopotamus amphibius) and white rhino (Ceratotherium simum), and impact herbivore densities, movement and recruitment rates. They also exert a strong influence on fire behaviour including frequency, intensity and spread. Thus, variation in grazing lawn cover can have a profound impact on broader savannah ecosystem dynamics. However, knowledge of their present cover and distribution is limited. Importantly, we lack a robust, broad-scale approach for detecting and monitoring grazing lawns, which is critical to enhancing understanding of the ecology of these vital grassland systems. We selected two sites in the Lower Sabie and Satara regions of Kruger National Park, South Africa with mesic and semiarid conditions, respectively. Using spectral and texture features derived from WorldView-3 imagery, we (i) parameterised and assessed the quality of Random Forest (RF), Support Vector Machines (SVM), Classification and Regression Trees (CART) and Multilayer Perceptron (MLP) models for general discrimination of plant functional types (PFTs) within a sub-area of the Lower Sabie landscape, and (ii) compared model performance for probabilistic mapping of grazing lawns in the broader Lower Sabie and Satara landscapes. Further, we used spatial metrics to analyse spatial patterns in grazing lawn distribution in both landscapes along a gradient of distance from waterbodies. All machine learning models achieved high F-scores (F1) and overall accuracy (OA) scores in general savannah PFTs classification, with RF (F1 = 95.73±0.004%, OA = 94.16±0.004%), SVM (F1 = 95.64±0.002%, OA = 94.02±0.002%) and MLP (F1 = 95.71±0.003%, OA = 94.27±0.003%) forming a cluster of the better performing models and marginally outperforming CART (F1 = 92.74±0.006%, OA = 90.93±0.003%). Grazing lawn detection accuracy followed a similar trend within the Lower Sabie landscape, with RF, SVM, MLP and CART achieving F-scores of 0.89, 0.93, 0.94 and 0.81, respectively. Transferring models to the Satara landscape however resulted in relatively lower but high grazing lawn detection accuracies across models (RF = 0.87, SVM = 0.88, MLP = 0.85 and CART = 0.75). Results from spatial pattern analysis revealed a relatively higher proportion of grazing lawn cover under semiarid savannah conditions (Satara) compared to the mesic savannah landscape (Lower Sabie). Additionally, the results show strong negative correlation between grazing lawn spatial structure (fractional cover, patch size and connectivity) and distance from waterbodies, with larger and contiguous grazing lawn patches occurring in close proximity to waterbodies in both landscapes. The proposed machine learning approach provides a novel and robust workflow for accurate and consistent landscape-scale monitoring of grazing lawns, while our findings and research outputs provide timely information critical for understanding habitat heterogeneity in southern African savannah

    Investigation into the Predictive Power of Artificial Neural Networks and Logistic Regression for Predicting Default in Chit Funds

    Get PDF
    This study evaluated the performance of an artificial neural network (ANN) multi-layer perceptron model and a logistic regression logitboost (LR) model to predict default in chit funds. The two types of default investigated were late payment of 30 days and late payment of 90 days. The dataset was broken up into training and validation datasets using random sampling and K folds cross validation was used on the training dataset to assess performance of the tuning parameters. The validation dataset was used to compare performance of both algorithms. Principle component analysis (PCA) was used to reduce the feature set while still explaining 95% of the variance in the data. The classes were highly imbalanced and Synthetic Minority Oversampling Technique (SMOTE) and down sampling were used to overcome the class imbalance. 16 experiments were ran, 8 for each of the two defaults. The three key metrics that were measured for these experiments were balanced accuracy, Area under the ROC curve (AUC) and F1 score. After making Bonferroni’s adjustment to the original p value statistical significance was set to 0.003 when comparing multiple experiments. In these experiments the ANN model had the best results for balanced accuracy, AUC and F1score. Statistical analysis using a paired t test showed that there was a statistically significant difference in the results between ANN and LR. The results of these experiments also showed that there was very little difference in the contribution of the top 20 features to the first 30 principal components, which were used to predict default. These features included family id, income and address. Features that had little or no contribution to the principle components included Commission, Auction Amount, and type of relation the nominee is to the chit fund member. These findings are context specific and in this case the context is chit funds from a digital chit fund operator in Indi

    Improving the prediction of air pollution peak episodes generated by urban transport networks

    Get PDF
    This paper illustrates the early results of ongoing research developing novel methods to analyse and simulate the relationship between trasport-related air pollutant concentrations and easily accessible explanatory variables. The final scope is to integrate the new models in traditional traffic management support systems for a sustainable mobility of road vehicles in urban areas.This first stage concerns the relationship between the hourly mean concentration of nitrogen dioxide (NO2) and explanatory factors reflecting the NO2 mean level one hour back, along with traffic and weather conditions. Particular attention is given to the prediction of pollution peaks, defined as exceedances of normative concentration limits. Two model frameworks are explored: the Artificial Neural Network approach and the ARIMAX model. Furthermore, the benefit of a synergic use of both models for air quality forecasting is investigated.The analysis of findings points out that the prediction of extreme concentrations is best performed by integrating the two models into an ensemble. The neural network is outperformed by the ARIMAX model in foreseeing peaks, but gives a more realistic representation of the concentration's dependency upon wind characteristics. So, the Neural Network can be exploited to highlight the involved functional forms and improve the ARIMAX model specification. In the end, the study shows that the ability to forecast exceedances of legal pollution limits can be enhanced by requiring traffic management actions when the predicted concentration exceeds a lower threshold than the normative one

    Is the vessel fishing? Discrimination of fishing activity with low-cost intelligent mobile devices through traditional and heuristic approaches

    Get PDF
    Knowing the activity of fishing vessels accurately and in real time means a leap in quality in the management of fishing activity. This paper presents the development of a new fishing activity monitoring integral system (FAMIS) that can complement and overcome the limitations of current fishing vessel monitoring systems (VMS). FAMIS is developed on the basis of a low-cost mobile device with GPS sensors, accelerometer, gyroscope and magnetic field and integrates different statistical methods (discriminant functions) and heuristics (artificial neural networks and vectorial support machines) as techniques to classify the information recorded by the sensors of a mobile device during fishing activity. The results obtained with FAMIS indicate that, in general, heuristics have a high degree of discrimination of each of the phases of fishing operation and that, in particular, multilayer perceptrons (MLPs) are capable of correctly identifying 96.3% of towing phases using only GPS and gyro sensors

    Deciphering signatures of natural selection via deep learning

    Get PDF
    XQ was supported by a PhD scholarship from the China Scholarship Council and now is supported by International Postdoctoral Exchange Fellowship Program (Talent-Introduction Program) from China Postdoc Council. CWKC is supported in part by National Institute of General Medical Sciences (NIGMS) of the National Institute of Health (award number R35GM142783). Computation for this work is supported in part by USC’s Center for Advanced Research Computing (https://www.carc.usc.edu/).Identifying genomic regions influenced by natural selection provides fundamental insights into the genetic basis of local adaptation. However, it remains challenging to detect loci under complex spatially varying selection. We propose a deep learning-based framework, DeepGenomeScan, which can detect signatures of spatially varying selection. We demonstrate that DeepGenomeScan outperformed principal component analysis- and redundancy analysis-based genome scans in identifying loci underlying quantitative traits subject to complex spatial patterns of selection. Noticeably, DeepGenomeScan increases statistical power by up to 47.25% under nonlinear environmental selection patterns. We applied DeepGenomeScan to a European human genetic dataset and identified some well-known genes under selection and a substantial number of clinically important genes that were not identified by SPA, iHS, Fst and Bayenv when applied to the same dataset.Publisher PDFPeer reviewe

    Estimating the concentration of physico chemical parameters in hydroelectric power plant reservoir

    Get PDF
    The United Nations Educational, Scientific and Cultural Organization (UNESCO) defines the amazon region and adjacent areas, such as the Pantanal, as world heritage territories, since they possess unique flora and fauna and great biodiversity. Unfortunately, these regions have increasingly been suffering from anthropogenic impacts. One of the main anthropogenic impacts in the last decades has been the construction of hydroelectric power plants. As a result, dramatic altering of these ecosystems has been observed, including changes in water levels, decreased oxygenation and loss of downstream organic matter, with consequent intense land use and population influxes after the filling and operation of these reservoirs. This, in turn, leads to extreme loss of biodiversity in these areas, due to the large-scale deforestation. The fishing industry in place before construction of dams and reservoirs, for example, has become much more intense, attracting large populations in search of work, employment and income. Environmental monitoring is fundamental for reservoir management, and several studies around the world have been performed in order to evaluate the water quality of these ecosystems. The Brazilian Amazon, in particular, goes through well defined annual hydrological cycles, which are very importante since their study aids in monitoring anthropogenic environmental impacts and can lead to policy and decision making with regard to environmental management of this area. The water quality of amazon reservoirs is greatly influenced by this defined hydrological cycle, which, in turn, causes variations of microbiological, physical and chemical characteristics. Eutrophication, one of the main processes leading to water deterioration in lentic environments, is mostly caused by anthropogenic activities, such as the releases of industrial and domestic effluents into water bodies. Physico-chemical water parameters typically related to eutrophication are, among others, chlorophyll-a levels, transparency and total suspended solids, which can, thus, be used to assess the eutrophic state of water bodies. Usually, these parameters must be investigated by going out to the field and manually measuring water transparency with the use of a Secchi disk, and taking water samples to the laboratory in order to obtain chlorophyll-a and total suspended solid concentrations. These processes are time- consuming and require trained personnel. However, we have proposed other techniques to environmental monitoring studies which do not require fieldwork, such as remote sensing and computational intelligence. Simulations in different reservoirs were performed to determine a relationship between these physico-chemical parameters and the spectral response. Based on the in situ measurements, empirical models were established to relate the reflectance of the reservoir measured by the satellites. The images were calibrated and corrected atmospherically. Statistical analysis using error estimation was used to evaluate the most accurate methodology. The Neural Networks were trained by hydrological cycle, and were useful to estimate the physicalchemical parameters of the water from the reflectance of visible bands and NIR of satellite images, with better results for the period with few clouds in the regions analyzed. The present study shows the application of wavelet neural network to estimate water quality parameters using concentration of the water samples collected in the Amazon reservoir and Cefni reservoir, UK. Sattelite imagens from Landsats and Sentinel-2 were used to train the ANN by hydrological cycle. The trained ANNs demonstrated good results between observed and estimated after Atmospheric corrections in satellites images. The ANNs showed in the results are useful to estimate these concentrations using remote sensing and wavelet transform for image processing. Therefore, the techniques proposed and applied in the present study are noteworthy since they can aid in evaluating important physico-chemical parameters, which, in turn, allows for identification of possible anthropogenic impacts, being relevant in environmental management and policy decision-making processes. The tests results showed that the predicted values have good accurate. Improving efficiency to monitor water quality parameters and confirm the reliability and accuracy of the approaches proposed for monitoring water reservoirs. This thesis contributes to the evaluation of the accuracy of different methods in the estimation of physical-chemical parameters, from satellite images and artificial neural networks. For future work, the accuracy of the results can be improved by adding more satellite images and testing new neural networks with applications in new water reservoirs

    Cyber-Physical System Intrusion: A Case Study of Automobile Identification Vulnerabilities and Automated Approaches for Intrusion Detection

    Get PDF
    Today\u27s vehicle manufacturers do not tend to publish proprietary packet formats for the controller area network (CAN), a network protocol regularly used in automobiles and manufacturing. This is a form of security through obscurity -it makes reverse engineering efforts more difficult for would-be intruders -but obfuscating the CAN data in this way does not adequately hide the vehicle\u27s unique signature, even if these data are unprocessed or limited in scope. To prove this, we train two distinct deep learning models on data from 11 different vehicles. Our results clearly indicate that one can determine which vehicle generated a given sample of CAN data. This erodes consumer safety: a sophisticated attacker who establishes a presence on an unknown vehicle can use similar techniques to identify the vehicle and better format attacks. To protect critical cyber-physical systems (CPSs) against attacks like those enabled by this CAN vulnerability, system administrators often develop and employ intrusion detection systems (IDSs). Before developing an IDS, one requires an understanding of the behavior of the CPS and of the causality of its constituent parts. Such an understanding allows one to characterize normal behavior and, in turn, identify and report anomalous behavior. This research explores two different time series analysis techniques, Granger causality and empirical dynamic modeling (EDM), which may contribute to this understanding of a system. Our findings indicate that Granger causality is not a suitable approach to IDS development but that EDM may enable the understanding of a system required of an IDS architect. We thus encourage further research into EDM applications to IDSs for CPSs
    • …
    corecore