507 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Landslide Susceptibility Mapping Using the Stacking Ensemble Machine Learning Method in Lushui, Southwest China

    Get PDF
    Landslide susceptibility mapping is considered to be a prerequisite for landslide prevention and mitigation. However, delineating the spatial occurrence pattern of the landslide remains a challenge. This study investigates the potential application of the stacking ensemble learning technique for landslide susceptibility assessment. In particular, support vector machine (SVM), artificial neural network (ANN), logical regression (LR), and naive Bayes (NB) were selected as base learners for the stacking ensemble method. The resampling scheme and Pearson’s correlation analysis were jointly used to evaluate the importance level of these base learners. A total of 388 landslides and 12 conditioning factors in the Lushui area (Southwest China) were used as the dataset to develop landslide modeling. The landslides were randomly separated into two parts, with 70% used for model training and 30% used for model validation. The models’ performance was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC) and statistical measures. The results showed that the stacking-based ensemble model achieved an improved predictive accuracy as compared to the single algorithms, while the SVM-ANN-NB-LR (SANL) model, the SVM-ANN-NB (SAN) model, and the ANN-NB-LR (ANL) models performed equally well, with AUC values of 0.931, 0.940, and 0.932, respectively, for validation stage. The correlation coefficient between the LR and SVM was the highest for all resampling rounds, with a value of 0.72 on average. This connotes that LR and SVM played an almost equal role when the ensemble of SANL was applied for landslide susceptibility analysis. Therefore, it is feasible to use the SAN model or the ANL model for the study area. The finding from this study suggests that the stacking ensemble machine learning method is promising for landslide susceptibility mapping in the Lushui area and is capable of targeting areas prone to landslides

    A survey on pre-processing techniques: relevant issues in the context of environmental data mining

    Get PDF
    One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft

    Radiomic data mining and machine learning on preoperative pituitary adenoma MRI

    Get PDF
    Pituitary adenomas are among the most frequent intracranial tumors, accounting for the majority of sellar/suprasellar masses in adults. MRI is the preferred imaging modality for detecting pituitary adenomas. Radiomics represents the conversion of digital medical images into mineable high-dimensional data. This process is motivated by the concept that biomedical images contain information that reflects underlying pathophysiology and that these relationships can be revealed via quantitative image analyses. The aim of this thesis is to apply machine learning algorithms on parameters obtained by texture analysis on MRI images in order to distinguish functional from non-functional pituitary macroadenomas, to predict their ki-67 proliferation index class, and to predict pituitary macroadenoma surgical consistency prior to an endoscopic endonasal procedure

    An academic review: applications of data mining techniques in finance industry

    Get PDF
    With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance

    Multi-aspect rule-based AI: Methods, taxonomy, challenges and directions towards automation, intelligence and transparent cybersecurity modeling for critical infrastructures

    Get PDF
    Critical infrastructure (CI) typically refers to the essential physical and virtual systems, assets, and services that are vital for the functioning and well-being of a society, economy, or nation. However, the rapid proliferation and dynamism of today\u27s cyber threats in digital environments may disrupt CI functionalities, which would have a debilitating impact on public safety, economic stability, and national security. This has led to much interest in effective cybersecurity solutions regarding automation and intelligent decision-making, where AI-based modeling is potentially significant. In this paper, we take into account “Rule-based AI” rather than other black-box solutions since model transparency, i.e., human interpretation, explainability, and trustworthiness in decision-making, is an essential factor, particularly in cybersecurity application areas. This article provides an in-depth study on multi-aspect rule based AI modeling considering human interpretable decisions as well as security automation and intelligence for CI. We also provide a taxonomy of rule generation methods by taking into account not only knowledge-driven approaches based on human expertise but also data-driven approaches, i.e., extracting insights or useful knowledge from data, and their hybridization. This understanding can help security analysts and professionals comprehend how systems work, identify potential threats and anomalies, and make better decisions in various real-world application areas. We also cover how these techniques can address diverse cybersecurity concerns such as threat detection, mitigation, prediction, diagnosis for root cause findings, and so on in different CI sectors, such as energy, defence, transport, health, water, agriculture, etc. We conclude this paper with a list of identified issues and opportunities for future research, as well as their potential solution directions for how researchers and professionals might tackle future generation cybersecurity modeling in this emerging area of study
    corecore