2,367 research outputs found

    Recognition decision-making model using temporal data mining technique

    Get PDF
    An accurate and timely decision is crucial in any emergency situation. This paper presents a recognition decision making model that adopts the temporal data mining approach in making decisions. Reservoir water level and rainfall measurement were used as the case study to test the developed computational recognition-primed decision (RPD) model in predicting the amount of water to be dispatched represented by the number of spillway gates. Experimental results indicated that new events can be predicted from historical events. Patterns were extracted and can be transformed into readable and descriptive rule based form

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Reservoir water release dynamic decision model based on spatial temporal pattern

    Get PDF
    The multi-purpose reservoir water release decision requires an expert to make a decision by assembling complex decision information that occurred in real time. The decision needs to consider adequate reservoir water balance in order to maintain reservoir multi-purpose function and provide enough space for incoming heavy rainfall and inflow. Crucially, the water release should not exceed the downstream maximum river level so that it will not cause flood. The rainfall and water level are fuzzy information, thus the decision model needs the ability to handle the fuzzy information. Moreover, the rainfalls that are recorded at different location take different time to reach into the reservoir. This situation shows that there is spatial temporal relationship hidden in between each gauging station and the reservoir. Thus, this study proposed dynamic reservoir water release decision model that utilize both spatial and temporal information in the input pattern. Based on the patterns, the model will suggest when the reservoir water should be released. The model adopts Adaptive Neuro-Fuzzy Inference System (ANFIS) in order to deal with the fuzzy information. The data used in this study was obtained from the Perlis Department of Irrigation and Drainage. The modified Sliding Window algorithm was used to construct the rainfall temporal pattern, while the spatial information was established by simulating the mapped rainfall and reservoir water level pattern. The model performance was measured based on the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Findings from this study shows that ANFIS produces the lowest RMSE and MAE when compare to Autoregressive Integrated Moving Average (ARIMA) and Backpropagation Neural Network (BPNN) model. The model can be used by the reservoir operator to assist their decision making and support the new reservoir operator in the absence of an experience reservoir operator

    Forecasting model for the change in stage of reservoir water level

    Get PDF
    Reservoir is one of major structural approaches for flood mitigation. During floods, early reservoir water release is one of the actions taken by the reservoir operator to accommodate incoming heavy rainfall. Late water release might give negative effect to the reservoir structure and cause flood at downstream area. However, current rainfall may not directly influence the change of reservoir water level. The delay may occur as the streamflow that carries the water might take some time to reach the reservoir. This study is aimed to develop a forecasting model for the change in stage of reservoir water level. The model considers the changes of reservoir water level and its stage as the input and the future change in stage of reservoir water level as the output. In this study, the Timah Tasoh reservoir operational data was obtained from the Perlis Department of Irrigation and Drainage (DID). The reservoir water level was categorised into stages based on DID manual. A modified sliding window algorithm has been deployed to segment the data into temporal patterns. Based on the patterns, three models were developed: the reservoir water level model, the change of reservoir water level and stage of reservoir water level model, and the combination of the change of reservoir water level and stage of reservoir water level model. All models were simulated using neural network and their performances were compared using on mean square error (MSE) and percentage of correctness. The result shows that the change of reservoir water level and stage of reservoir water model produces the lowest MSE and the highest percentage of correctness when compared to the other two models. The findings also show that a delay of two previous days has affected the change in stage of reservoir water level. The model can be applied to support early reservoir water release decision making. Thus, reduce the impact of flood at the downstream area

    An enhanced resampling technique for imbalanced data sets

    Get PDF
    A data set is considered imbalanced if the distribution of instances in one class (majority class) outnumbers the other class (minority class). The main problem related to binary imbalanced data sets is classifiers tend to ignore the minority class. Numerous resampling techniques such as undersampling, oversampling, and a combination of both techniques have been widely used. However, the undersampling and oversampling techniques suffer from elimination and addition of relevant data which may lead to poor classification results. Hence, this study aims to increase classification metrics by enhancing the undersampling technique and combining it with an existing oversampling technique. To achieve this objective, a Fuzzy Distancebased Undersampling (FDUS) is proposed. Entropy estimation is used to produce fuzzy thresholds to categorise the instances in majority and minority class into membership functions. FDUS is then combined with the Synthetic Minority Oversampling TEchnique (SMOTE) known as FDUS+SMOTE, which is executed in sequence until a balanced data set is achieved. FDUS and FDUS+SMOTE are compared with four techniques based on classification accuracy, F-measure and Gmean. From the results, FDUS achieved better classification accuracy, F-measure and G-mean, compared to the other techniques with an average of 80.57%, 0.85 and 0.78, respectively. This showed that fuzzy logic when incorporated with Distance-based Undersampling technique was able to reduce the elimination of relevant data. Further, the findings showed that FDUS+SMOTE performed better than combination of SMOTE and Tomek Links, and SMOTE and Edited Nearest Neighbour on benchmark data sets. FDUS+SMOTE has minimised the removal of relevant data from the majority class and avoid overfitting. On average, FDUS and FDUS+SMOTE were able to balance categorical, integer and real data sets and enhanced the performance of binary classification. Furthermore, the techniques performed well on small record size data sets that have of instances in the range of approximately 100 to 800

    An improved algorithm for identifying shallow and deep-seated landslides in dense tropical forest from airborne laser scanning data

    Full text link
    © 2018 Landslides are natural disasters that cause environmental and infrastructure damage worldwide. They are difficult to be recognized, particularly in densely vegetated regions of the tropical forest areas. Consequently, an accurate inventory map is required to analyze landslides susceptibility, hazard, and risk. Several studies were done to differentiate between different types of landslide (i.e. shallow and deep-seated); however, none of them utilized any feature selection techniques. Thus, in this study, three feature selection techniques were used (i.e. correlation-based feature selection (CFS), random forest (RF), and ant colony optimization (ACO)). A fuzzy-based segmentation parameter (FbSP optimizer) was used to optimize the segmentation parameters. Random forest (RF) was used to evaluate the performance of each feature selection algorithms. The overall accuracies of the RF classifier revealed that CFS algorithm exhibited higher ranks in differentiation landslide types. Moreover, the results of the transferability showed that this method is easy, accurate, and highly suitable for differentiating between types of landslides (shallow and deep-seated). In summary, the study recommends that the outlined approaches are significant to improve in distinguishing between shallow and deep-seated landslide in the tropical areas, such as; Malaysia

    Advanced of Mathematics-Statistics Methods to Radar Calibration for Rainfall Estimation; A Review

    Get PDF
    Ground-based radar is known as one of the most important systems for precipitation measurement at high spatial and temporal resolutions. Radar data are recorded in digital manner and readily ingested to any statistical analyses. These measurements are subjected to specific calibration to eliminate systematic errors as well as minimizing the random errors, respectively. Since statistical methods are based on mathematics, they offer more precise results and easy interpretation with lower data detail. Although they have challenge to interpret due to their mathematical structure, but the accuracy of the conclusions and the interpretation of the output are appropriate. This article reviews the advanced methods in using the calibration of ground-based radar for forecasting meteorological events include two aspects: statistical techniques and data mining. Statistical techniques refer to empirical analyses such as regression, while data mining includes the Artificial Neural Network (ANN), data Kriging, Nearest Neighbour (NN), Decision Tree (DT) and fuzzy logic. The results show that Kriging is more applicable for interpolation. Regression methods are simple to use and data mining based on Artificial Intelligence is very precise. Thus, this review explores the characteristics of the statistical parameters in the field of radar applications and shows which parameters give the best results for undefined cases. DOI: 10.17762/ijritcc2321-8169.15012

    Improving landslide detection from airborne laser scanning data using optimized Dempster-Shafer

    Full text link
    © 2018 by the authors. A detailed and state-of-the-art landslide inventory map including precise landslide location is greatly required for landslide susceptibility, hazard, and risk assessments. Traditional techniques employed for landslide detection in tropical regions include field surveys, synthetic aperture radar techniques, and optical remote sensing. However, these techniques are time consuming and costly. Furthermore, complications arise for the generation of accurate landslide location maps in these regions due to dense vegetation in tropical forests. Given its ability to penetrate vegetation cover, high-resolution airborne light detection and ranging (LiDAR) is typically employed to generate accurate landslide maps. The object-based technique generally consists of many homogeneous pixels grouped together in a meaningful way through image segmentation. In this paper, in order to address the limitations of this approach, the final decision is executed using Dempster-Shafer theory (DST) rule combination based on probabilistic output from object-based support vector machine (SVM), random forest (RF), and K-nearest neighbor (KNN) classifiers. Therefore, this research proposes an efficient framework by combining three object-based classifiers using the DST method. Consequently, an existing supervised approach (i.e., fuzzy-based segmentation parameter optimizer) was adopted to optimize multiresolution segmentation parameters such as scale, shape, and compactness. Subsequently, a correlation-based feature selection (CFS) algorithm was employed to select the relevant features. Two study sites were selected to implement the method of landslide detection and evaluation of the proposed method (subset "A" for implementation and subset "B" for the transferrable). The DST method performed well in detecting landslide locations in tropical regions such as Malaysia, with potential applications in other similarly vegetated regions

    Visualization approach to effective decision making on hydrological data

    Get PDF
    Temporal data is by nature arranged according to the sequence of time where the order of the data is very significant.Thus in order to visualize a temporal data, the order of the data has to be preserve that will show certain trends or temporal patterns. Most visualization technique however uses technical visual representation such as bar chart and line graph. This approach is suitable and can be easily comprehended only by technical users. In order to reduce the learning curve in understanding the prototype develop and facilitate decision making, metaphor based visualization approach was used for representing temporal hydrological data. To evaluate the correct of decision making similarity test was conducted by using data mining approach, specifically incorporating case-based reasoning. The test case or new data was compared with the case extracted from previous operation data and the case closely was examined by exploring the detailed data. Results were evaluated through usability testing and similarity testing. The prototype was demonstrated to a group of users specifically three DID staff involved with the dam operation directly and indirectly. The feedbacks received from the users are positive where the interface objects used took a short time for them to learn and understand due to the familiarity of the representation. One look at the map, it will give them the overall picture of the situation patterns of the dam water level and rainfall around the catchments area according to the time frame chosen. The metaphorical representation based visualization is used as a basis to represent temporal and multi-variate data using icon based technique and colour code to enhance interface usability and usefulness. This type of representation can be easily understood by a non-expert from the domain. The visualization actually assists users in the process of decision-making by representing the patterns in form close to the mental model of a user by using metaphor.This help speed up data exploration thus decision-making process. In critical situation speed and accuracy is vital in the decision making process

    Classification Techniques for Predicting Graduate Employability

    Get PDF
    Unemployment is a current issue that happens globally and brings adverse impacts on worldwide. Thus, graduate employability is one of the significant elements to be highlighted in unemployment issue. There are several factors affecting graduate employability, traditionally, excellent academic performance (i.e., cumulative grade point average, CGPA) has been the most dominant element in determining an individual’s employment status. However, researches have shown that not only CGPA determines the graduate employability; in fact other factors may influence the graduate achievement in getting a job. In this work data mining techniques are used to determine what are the factors that affecting the graduates. Therefore, the objective of this study is to identify factors that influence graduates employability. Seven years of data (from 2011 to 2017) are collected through the Malaysia’s Ministry of Education tracer study. Total number of 43863 data instances involved in this employability class model development. Three classification algorithms, Decision Tree, Support Vector Machines and Artificial Neural Networks are used and being compared for the best models. The results show decision tree J48 produces higher accuracy compared to other techniques with classification accuracy of 66.0651% and it increased to 66.1824% after the parameter tuning. Besides, the algorithm is easily interpreted, and time to build the model is small which is 0.22 seconds. This paper identified seven factors affecting graduate employability, namely age, faculty, field of study, co-curriculum, marital status, industrial internship and English skill. Among these factors, attribute age, industrial internship and faculty contain the most information and affect the final class, i.e. employability status. Therefore, the results of this study will help higher education institutions in Malaysia to prepare their graduates with necessary skills before entering the job market
    • …
    corecore