3,021 research outputs found

    RESEARCH ISSUES CONCERNING ALGORITHMS USED FOR OPTIMIZING THE DATA MINING PROCESS

    Get PDF
    In this paper, we depict some of the most widely used data mining algorithms that have an overwhelming utility and influence in the research community. A data mining algorithm can be regarded as a tool that creates a data mining model. After analyzing a set of data, an algorithm searches for specific trends and patterns, then defines the parameters of the mining model based on the results of this analysis. The above defined parameters play a significant role in identifying and extracting actionable patterns and detailed statistics. The most important algorithms within this research refer to topics like clustering, classification, association analysis, statistical learning, link mining. In the following, after a brief description of each algorithm, we analyze its application potential and research issues concerning the optimization of the data mining process. After the presentation of the data mining algorithms, we will depict the most important data mining algorithms included in Microsoft and Oracle software products, useful suggestions and criteria in choosing the most recommended algorithm for solving a mentioned task, advantages offered by these software products.data mining optimization, data mining algorithms, software solutions

    Training of Crisis Mappers and Map Production from Multi-sensor Data: Vernazza Case Study (Cinque Terre National Park, Italy)

    Get PDF
    This aim of paper is to presents the development of a multidisciplinary project carried out by the cooperation between Politecnico di Torino and ITHACA (Information Technology for Humanitarian Assistance, Cooperation and Action). The goal of the project was the training in geospatial data acquiring and processing for students attending Architecture and Engineering Courses, in order to start up a team of "volunteer mappers". Indeed, the project is aimed to document the environmental and built heritage subject to disaster; the purpose is to improve the capabilities of the actors involved in the activities connected in geospatial data collection, integration and sharing. The proposed area for testing the training activities is the Cinque Terre National Park, registered in the World Heritage List since 1997. The area was affected by flood on the 25th of October 2011. According to other international experiences, the group is expected to be active after emergencies in order to upgrade maps, using data acquired by typical geomatic methods and techniques such as terrestrial and aerial Lidar, close-range and aerial photogrammetry, topographic and GNSS instruments etc.; or by non conventional systems and instruments such us UAV, mobile mapping etc. The ultimate goal is to implement a WebGIS platform to share all the data collected with local authorities and the Civil Protectio

    A Type-2 Fuzzy Logic Based System for Malaria Epidemic Prediction in Ethiopia

    Get PDF
    Malaria is the most prevalent mosquito-borne disease throughout tropical and subtropical regions of the world with severe medical, economic, and social impact. Malaria is a serious public health problem in Ethiopia since 1959, even if, its morbidity and mortality have been reduced starting from 2001. Various studies were conducted to predict the malaria epidemic using mathematical and statistical approaches, nevertheless, they had no learning capabilities. In this paper, we present a Type-2 Fuzzy Logic Based System for Malaria epidemic prediction in Ethiopia which was trained using real data collected throughout Ethiopia from 2013 to 2017. Fuzzy Logic Based Systems provide a transparent model which employs IF-Then rules for the prediction that could be easily analyzed and interpreted by decision-makers. This is quite important to fight the sources of Malaria and take the needed preventive measures where the generated rules from our system were able to explain the situations and intensity of input factors which contributed to Malaria epidemic incidence up to three months ahead. The presented Type-2 Fuzzy Logic System (T2FLS) learns its rules and fuzzy set parameters from data and was able to outperform its counterparts T1FLS in 2% and ANFIS in 0.33% in the accuracy of prediction of Malaria epidemic in Ethiopia. In addition, the proposed system did shed light on the main causes behind such outbreaks in Ethiopia because of its high level of interpretabilit

    Ontological approach to development of computing with words based systems

    Get PDF
    AbstractComputing with words introduced by Zadeh becomes a very important concept in processing of knowledge represented in the form of propositions. Two aspects of this concept – approximation and personalization – are essential to the process of building intelligent systems for human-centric computing.For the last several years, Artificial Intelligence community has used ontology as a means for representing knowledge. Recently, the development of a new Internet paradigm – the Semantic Web – has led to introduction of another form of ontology. It allows for defining concepts, identifying relationships among these concepts, and representing concrete information. In other words, an ontology has become a very powerful way of representing not only information but also its semantics.The paper proposes an application of ontology, in the sense of the Semantic Web, for development of computing with words based systems capable of performing operations on propositions including their semantics. The ontology-based approach is very flexible and provides a rich environment for expressing different types of information including perceptions. It also provides a simple way of personalization of propositions. An architecture of computing with words based system is proposed. A prototype of such a system is described

    Diabetes Diagnosis by Case-Based Reasoning and Fuzzy Logic

    Get PDF
    In the medical field, experts’ knowledge is based on experience, theoretical knowledge and rules. Case-based reasoning is a problem-solving paradigm which is based on past experiences. For this purpose, a large number of decision support applications based on CBR have been developed. Cases retrieval is often considered as the most important step of case-based reasoning. In this article, we integrate fuzzy logic and data mining to improve the response time and the accuracy of the retrieval of similar cases. The proposed Fuzzy CBR is composed of two complementary parts; the part of classification by fuzzy decision tree realized by Fispro and the part of case-based reasoning realized by the platform JColibri. The use of fuzzy logic aims to reduce the complexity of calculating the degree of similarity that can exist between diabetic patients who require different monitoring plans. The results of the proposed approach are compared with earlier methods using accuracy as metrics. The experimental results indicate that the fuzzy decision tree is very effective in improving the accuracy for diabetes classification and hence improving the retrieval step of CBR reasoning

    Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

    Full text link
    Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

    Smart hierarchical WiFi localization system for indoors

    Get PDF
    Premio Extraordinario de Doctorado de la UAH en el año académico 2013-2014En los últimos años, el número de aplicaciones para smartphones y tablets ha crecido rápidamente. Muchas de estas aplicaciones hacen uso de las capacidades de localización de estos dispositivos. Para poder proporcionar su localización, es necesario identificar la posición del usuario de forma robusta y en tiempo real. Tradicionalmente, esta localización se ha realizado mediante el uso del GPS que proporciona posicionamiento preciso en exteriores. Desafortunadamente, su baja precisión en interiores imposibilita su uso. Para proporcionar localización en interiores se utilizan diferentes tecnologías. Entre ellas, la tecnología WiFi es una de las más usadas debido a sus importantes ventajas tales como la disponibilidad de puntos de acceso WiFi en la mayoría de edificios y que medir la señal WiFi no tiene coste, incluso en redes privadas. Desafortunadamente, también tiene algunas desventajas, ya que en interiores la señal es altamente dependiente de la estructura del edificio por lo que aparecen otros efectos no deseados, como el efecto multicamino o las variaciones de pequeña escala. Además, las redes WiFi están instaladas para maximizar la conectividad sin tener en cuenta su posible uso para localización, por lo que los entornos suelen estar altamente poblados de puntos de acceso, aumentando las interferencias co-canal, que causan variaciones en el nivel de señal recibido. El objetivo de esta tesis es la localización de dispositivos móviles en interiores utilizando como única información el nivel de señal recibido de los puntos de acceso existentes en el entorno. La meta final es desarrollar un sistema de localización WiFi para dispositivos móviles, que pueda ser utilizado en cualquier entorno y por cualquier dispositivo, en tiempo real. Para alcanzar este objetivo, se propone un sistema de localización jerárquico basado en clasificadores borrosos que realizará la localización en entornos descritos topológicamente. Este sistema proporcionará una localización robusta en diferentes escenarios, prestando especial atención a los entornos grandes. Para ello, el sistema diseñado crea una partición jerárquica del entorno usando K-Means. Después, el sistema de localización se entrena utilizando diferentes algoritmos de clasificación supervisada para localizar las nuevas medidas WiFi. Finalmente, se ha diseñado un sistema probabilístico para seguir la posición del dispositivo en movimiento utilizando un filtro Bayesiano. Este sistema se ha probado en un entorno real, con varias plantas, obteniendo un error medio total por debajo de los 3 metros
    corecore