8 research outputs found

    Data Mining Techniques in Cancer Research Area

    Get PDF
    In this paper we present an analysis of the prediction of survivability on different attributes, rate ofbreast cancer patients using data mining techniques. The data used is the real data. Thepreprocessed data set, which have all the available twelve fields from the database. We haveinvestigated data mining techniques

    EASY ENSEMMBLE WITH RANDOM FOREST TO HANDLE IMBALANCED DATA IN CLASSIFICATION

    Get PDF
    Imbalanced data might cause some issues in problem definition level, algorithm level, and data level. Some of the methods have been developed to overcome this issue, one of state-of-the-art method is Easy Ensemble. Easy Ensemble was claimed can improve model performance to classify minority class, and overcome the deficiency of random under- sampling. In this paper we discussed the implementation of Easy Ensemble with Random Forest Classifiers to handle imbalance problem in credit scoring case. This combination method is implemented in two datasets which taken from data science competition website, finhacks.id and kaggle.com with class proportion within majority and minority is 70:30 and 94:6. The results showed that resampling with Easy Ensemble can improve Random Forest classifier performance upon minority class. Recall on minority class increased significantly after the resampling. Before resampling, the recall on minority class for the first dataset (finhacks.id) was 0.49, and increased to 0.82 after the resampling. Similar results were obtained for the second data set (kaggle.com), where the recall for the minority class was increased from just 0.14 to 0.73

    DESIGN INFORMATION SYSTEM OF EMPLOYEES TARGET WITH NEURAL NETWORK BACKPROPAGATION

    Get PDF
    Assessment of the performance of civil servants (PNS) is still considered less objective and subjective tended to by some, so we need a solution to improve the objectivity of assessment. The target of employee work (SKP) is one solution to improve objectivity in the assessment of civil servants. Backpropagation is one of the methods in neural networks which is implemented in the information systems of SKP for used classification of data performance. Observation and literature became the method of data collection in this study. Web-based information systems of skp are facilitated for employees in the preparation of assessments. Backpropagation can be implemented to perform data classification of performance. Keyword: Neural network; Backpropagation, Classification, SKP Received: 2 February, 2017; Accepter: 15 March, 201

    SystemC Through the Looking Glass : Non-Intrusive Analysis of Electronic System Level Designs in SystemC

    Get PDF
    Due to the ever increasing complexity of hardware and hardware/software co-designs, developers strive for higher levels of abstractions in the early stages of the design flow. To address these demands, design at the Electronic System Level (ESL) has been introduced. SystemC currently is the de-facto standard for ESL design. The extraction of data from system designs written in SystemC is thereby crucial e.g. for the proper understanding of a given system. However, no satisfactory support of reflection/introspection of SystemC has been provided yet. Previously proposed methods for this purpose %introduced to achieve the goal nonetheless either focus on static aspects only, restrict the language means of SystemC, or rely on modifications of the compiler and/or parser. In this thesis, approaches that overcome these limitations are introduced, allowing the extraction of information from a given SystemC design without changing the SystemC library or the compiler. The proposed approaches retrieve both, static and dynamic (i.e. run-time) information

    Procedimientos de explotaci贸n de informaci贸n para la identificaci贸n de datos faltantes, con ruido e inconsistentes

    Get PDF
    La informaci贸n es uno de los activos m谩s importantes que tienen las empresas y es necesario garantizar la gobernanza de la tecnolog铆a de la informaci贸n, la calidad de las bases de datos es uno de los elementos fundamentales para lograr esa gobernanza. Un auditor de sistemas dar谩 empleo a muchas t茅cnicas, procesos y herramientas para identificar los datos faltantes, con ruido e inconsistentes en una base de datos, la miner铆a de datos es uno de esos medio a trav茅s del cual el auditor puede analizar la informaci贸n. Dada la enorme cantidad de informaci贸n que contienen los sistemas software es que los auditores deben emplear procedimientos que automaticen la detecci贸n de datos an贸malos. Varios algoritmos de miner铆a de datos han sido utilizados en la detecci贸n de tuplas consideradas an贸malas, el problema es que no se encuentran antecedentes de algoritmos o procedimientos que permitan detectar espec铆ficamente dentro de una tupla que campo es el que contiene valores an贸malos, siendo esta detecci贸n de fundamental importancia en las grandes bases de datos ya que si no es necesario hacer esta tarea en forma manual, requiriendo tiempo y una capacitaci贸n especifica por parte del auditor. El objetivo de la tesis es establecer una taxonom铆a relacionada con los m茅todos, t茅cnicas y algoritmos de detecci贸n de valores an贸malos en bases de datos. Y dise帽ar y validar procedimientos de explotaci贸n de informaci贸n que combinados entre s铆 permitan detectar los campos que tienen valores at铆picos en bases de datos, para mejorar la calidad de los datos. Se detectan tres enfoques diferentes relacionados con la Miner铆a de Datos para detectar datos an贸malos, el enfoque no supervisado, el enfoque supervisado y el enfoque semi-supervisado. Esta tesis desarrolla cuatro procedimientos de explotaci贸n de informaci贸n para detectar en forma autom谩tica que campo espec铆ficamente tiene valores que son considerados an贸malos utilizando una metodolog铆a hibrida que combina algoritmos de distintos enfoques para realizar la tarea, estos cuatro procedimientos se relacionan con bases de datos num茅ricas con o sin atributos Target, bases de datos alfanum茅ricas sin atributo target y bases de datos alfanum茅ricas con atributos target. Se realizaron pruebas experimentales para validar los resultados utilizando bases de datos de laboratorio y bases de datos reales, demostr谩ndose la eficacia de los procedimientos propuestos. La integraci贸n de distintos algoritmos no solo permiten detectar los campos considerados faltantes, con ruido e inconsistentes, sino que minimiza los posibles errores que pueda tener un algoritmo ante tan diversos e inciertos escenarios a los que debe enfrentarse la tarea de un auditor

    Evaluating productive efficiency:comparative study of commercial banks in Gulf countries

    Get PDF
    Financial institutes are an integral part of any modern economy. In the 1970s and 1980s, Gulf Cooperation Council (GCC) countries made significant progress in financial deepening and in building a modern financial infrastructure. This study aims to evaluate the performance (efficiency) of financial institutes (banking sector) in GCC countries. Since, the selected variables include negative data for some banks and positive for others, and the available evaluation methods are not helpful in this case, so we developed a Semi Oriented Radial Model to perform this evaluation. Furthermore, since the SORM evaluation result provides a limited information for any decision maker (bankers, investors, etc...), we proposed a second stage analysis using classification and regression (C&R) method to get further results combining SORM results with other environmental data (Financial, economical and political) to set rules for the efficient banks, hence, the results will be useful for bankers in order to improve their bank performance and to the investors, maximize their returns. Mainly there are two approaches to evaluate the performance of Decision Making Units (DMUs), under each of them there are different methods with different assumptions. Parametric approach is based on the econometric regression theory and nonparametric approach is based on a mathematical linear programming theory. Under the nonparametric approaches, there are two methods: Data Envelopment Analysis (DEA) and Free Disposal Hull (FDH). While there are three methods under the parametric approach: Stochastic Frontier Analysis (SFA); Thick Frontier Analysis (TFA) and Distribution-Free Analysis (DFA). The result shows that DEA and SFA are the most applicable methods in banking sector, but DEA is seem to be most popular between researchers. However DEA as SFA still facing many challenges, one of these challenges is how to deal with negative data, since it requires the assumption that all the input and output values are non-negative, while in many applications negative outputs could appear e.g. losses in contrast with profit. Although there are few developed Models under DEA to deal with negative data but we believe that each of them has it is own limitations, therefore we developed a Semi-Oriented-Radial-Model (SORM) that could handle the negativity issue in DEA. The application result using SORM shows that the overall performance of GCC banking is relatively high (85.6%). Although, the efficiency score is fluctuated over the study period (1998-2007) due to the second Gulf War and to the international financial crisis, but still higher than the efficiency score of their counterpart in other countries. Banks operating in Saudi Arabia seem to be the highest efficient banks followed by UAE, Omani and Bahraini banks, while banks operating in Qatar and Kuwait seem to be the lowest efficient banks; this is because these two countries are the most affected country in the second Gulf War. Also, the result shows that there is no statistical relationship between the operating style (Islamic or Conventional) and bank efficiency. Even though there is no statistical differences due to the operational style, but Islamic bank seem to be more efficient than the Conventional bank, since on average their efficiency score is 86.33% compare to 85.38% for Conventional banks. Furthermore, the Islamic banks seem to be more affected by the political crisis (second Gulf War), whereas Conventional banks seem to be more affected by the financial crisis

    Evaluating productive efficiency : comparative study of commercial banks in Gulf countries

    Get PDF
    Financial institutes are an integral part of any modern economy. In the 1970s and 1980s, Gulf Cooperation Council (GCC) countries made significant progress in financial deepening and in building a modern financial infrastructure. This study aims to evaluate the performance (efficiency) of financial institutes (banking sector) in GCC countries. Since, the selected variables include negative data for some banks and positive for others, and the available evaluation methods are not helpful in this case, so we developed a Semi Oriented Radial Model to perform this evaluation. Furthermore, since the SORM evaluation result provides a limited information for any decision maker (bankers, investors, etc.), we proposed a second stage analysis using classification and regression (C&R) method to get further results combining SORM results with other environmental data (Financial, economical and political) to set rules for the efficient banks, hence, the results will be useful for bankers in order to improve their bank performance and to the investors, maximize their returns. Mainly there are two approaches to evaluate the performance of Decision Making Units (DMUs), under each of them there are different methods with different assumptions. Parametric approach is based on the econometric regression theory and nonparametric approach is based on a mathematical linear programming theory. Under the nonparametric approaches, there are two methods: Data Envelopment Analysis (DEA) and Free Disposal Hull (FDH). While there are three methods under the parametric approach: Stochastic Frontier Analysis (SFA); Thick Frontier Analysis (TFA) and Distribution-Free Analysis (DFA). The result shows that DEA and SFA are the most applicable methods in banking sector, but DEA is seem to be most popular between researchers. However DEA as SFA still facing many challenges, one of these challenges is how to deal with negative data, since it requires the assumption that all the input and output values are non-negative, while in many applications negative outputs could appear e.g. losses in contrast with profit. Although there are few developed Models under DEA to deal with negative data but we believe that each of them has it is own limitations, therefore we developed a Semi-Oriented-Radial-Model (SORM) that could handle the negativity issue in DEA. The application result using SORM shows that the overall performance of GCC banking is relatively high (85.6%). Although, the efficiency score is fluctuated over the study period (1998-2007) due to the second Gulf War and to the international financial crisis, but still higher than the efficiency score of their counterpart in other countries. Banks operating in Saudi Arabia seem to be the highest efficient banks followed by UAE, Omani and Bahraini banks, while banks operating in Qatar and Kuwait seem to be the lowest efficient banks; this is because these two countries are the most affected country in the second Gulf War. Also, the result shows that there is no statistical relationship between the operating style (Islamic or Conventional) and bank efficiency. Even though there is no statistical differences due to the operational style, but Islamic bank seem to be more efficient than the Conventional bank, since on average their efficiency score is 86.33% compare to 85.38% for Conventional banks. Furthermore, the Islamic banks seem to be more affected by the political crisis (second Gulf War), whereas Conventional banks seem to be more affected by the financial crisis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore