5,040 research outputs found

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Advances in Data Mining Knowledge Discovery and Applications

    Get PDF
    Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications

    Training of Crisis Mappers and Map Production from Multi-sensor Data: Vernazza Case Study (Cinque Terre National Park, Italy)

    Get PDF
    This aim of paper is to presents the development of a multidisciplinary project carried out by the cooperation between Politecnico di Torino and ITHACA (Information Technology for Humanitarian Assistance, Cooperation and Action). The goal of the project was the training in geospatial data acquiring and processing for students attending Architecture and Engineering Courses, in order to start up a team of "volunteer mappers". Indeed, the project is aimed to document the environmental and built heritage subject to disaster; the purpose is to improve the capabilities of the actors involved in the activities connected in geospatial data collection, integration and sharing. The proposed area for testing the training activities is the Cinque Terre National Park, registered in the World Heritage List since 1997. The area was affected by flood on the 25th of October 2011. According to other international experiences, the group is expected to be active after emergencies in order to upgrade maps, using data acquired by typical geomatic methods and techniques such as terrestrial and aerial Lidar, close-range and aerial photogrammetry, topographic and GNSS instruments etc.; or by non conventional systems and instruments such us UAV, mobile mapping etc. The ultimate goal is to implement a WebGIS platform to share all the data collected with local authorities and the Civil Protectio

    A Correlation Framework for Continuous User Authentication Using Data Mining

    Get PDF
    Merged with duplicate records: 10026.1/572, 10026.1/334 and 10026.1/724 on 01.02.2017 by CS (TIS)The increasing security breaches revealed in recent surveys and security threats reported in the media reaffirms the lack of current security measures in IT systems. While most reported work in this area has focussed on enhancing the initial login stage in order to counteract against unauthorised access, there is still a problem detecting when an intruder has compromised the front line controls. This could pose a senous threat since any subsequent indicator of an intrusion in progress could be quite subtle and may remain hidden to the casual observer. Having passed the frontline controls and having the appropriate access privileges, the intruder may be in the position to do virtually anything without further challenge. This has caused interest'in the concept of continuous authentication, which inevitably involves the analysis of vast amounts of data. The primary objective of the research is to develop and evaluate a suitable correlation engine in order to automate the processes involved in authenticating and monitoring users in a networked system environment. The aim is to further develop the Anoinaly Detection module previously illustrated in a PhD thesis [I] as part of the conceptual architecture of an Intrusion Monitoring System (IMS) framework

    Predictive modelling : flight delays and associated factors hartsfield–Jackson Atlanta international airport

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceAtualmente, um ponto negativo nas viagens de avião são os atrasos que, constantemente, são anunciados aos passageiros resultando numa diminuição da sua satisfação enquanto clientes. Este e outros fatores fazem com que elevados custos, tanto quantitativos como qualitativos sejam imputados às companhias. Consequentemente, existe a necessidade de prever e mitigar a existência de atrasos aéreos que pode ajudar as companhias aéreas bem como aeroportos a melhorar a sua performance e a aplicar algumas medidas, dirigidas ao consumidor, que permitiam atenuar ou até anular o efeito que estes atrasos provoca nos seus passageiros. Deste modo, este estudo tem como principal objetivo prever a ocorrência de atrasos nas chegadas ao aeroporto internacional de Hartsfield-Jackson. Esta estimativa será possível através da elaboração de um modelo preditivo, recorrendo a diversas técnicas de Data Mining. Com a aplicação destas técnicas, foi possível identificar as variáveis que mais contribuíram para a existência do atraso. No desenvolvimento deste trabalho, foi seguida a metodologia da descoberta de conhecimento em base de dados (conhecida em inglês por Knowledge Discovery Database, KDD). Fases como a recolha dos dados, a aplicação de técnicas de amostragem (SMOTE e Undersampling), a partição dos dados em treino e teste, o pré-processamento (dados omissos e outliers) e transformação dos dados (normalização dos dados e seleção de atributos), a definição de modelos a treinar (Decision Trees, Random Forest e Multilayer Perceptron) bem como a avaliação da performance dos modelos através de métricas variadas foram aplicadas. Depois de testar diferentes abordagens, concluiu-se que o melhor modelo é alcançado com as variáveis relacionadas com a partida, usando o algoritmo Multilayer Perceptron e aplicando a técnica de SMOTE para lidar com dados não balanceados, removendo outliers e selecionando dez variáveis usando GainRatio. Por outro lado, quando as variáveis com informação da partida são excluídas, o algoritmo que melhor se destaca é o Multilayer Perceptron usando a técnica SMOTE, mas desta vez, incluindo os outliers e com quinze variáveis selecionadas novamente pelo GainRatio. Em ambas as hipóteses, as variáveis explicativas que mais contribuem para a existência do atraso na chegada são relacionadas com o clima, com as características do avião e com a propagação do atraso. Os resultados do algoritmo de Random Forests mostraram melhor desempenho, em relação à precisão, em comparação com outros autores (Belcastro, Marozzo, Talia, & Trunfio, 2016; Choi, Kim, Briceno, & Mavris, 2016). Contrariamente, o algoritmo Multilayer Perceptron, apresentou menor precisão em comparação com outro estudo equivalente (Y. J. Kim, Choi, Briceno, & Mavris, 2016).Nowadays, a downside to traveling is the delays that are constantly advertised to passengers resulting in a decrease in customer satisfaction. These delays associated with other factors can cause costs, both quantitative and qualitative. Consequently, there is a need to anticipate and mitigate the existence of airborne delays that can help airlines and airports improving their performance or even take some consumer-oriented measures that can undo or attenuate the effect that these delays have on their passengers. This study has as primary objective to predict the occurrence of arrival delays of the international airport of Hartsfield-Jackson. It was possible by building a predictive model, applying several Data Mining techniques. With these applications, it was possible to show the variables, among the proposals, that most contributed to the existence of the delay. In this work, the Knowledge Discovery Database (KDD) methodology was followed. Phases such as data collection; sampling techniques (SMOTE and Undersampling); Data partitioning in training and testing; Pre-processing (missing data and outliers) and data transformation (data normalization and attribute selection); And, finally the definition of models to be trained (Decision Trees, Random Forests, and Multilayer Perceptron), as well as the evaluation of the performance of the models through varied metrics, were used. After testing different approaches, it was concluded that the best model is achieved with the variables related to departure, using the Multilayer Perceptron algorithm and applying SMOTE to deal with unbalanced data, removing outliers and selecting ten variables using GainRatio. On the other hand, when the variables with information of the departure are excluded, the algorithm that performs best is also the Multilayer Perceptron using the SMOTE technique but, this time, including the outliers and with fifteen variables selected again by the GainRatio. On both hypotheses, the explanatory variables that most contributed to the existence of the delay in arrivals were related to the weather, the airplane characteristics and the propagation of the delay. Our results for the Random Forests algorithm shown better performance, regarding accuracy, compared to other authors (Belcastro et al., 2016; Choi et al., 2016). Contrary, for the Multilayer Perceptron algorithm, was presented a lower accuracy compared to another equivalent study (Y. J. Kim et al., 2016)

    Predicting parking space availability based on heterogeneous data using Machine Learning techniques

    Get PDF
    Abstract. These days, smart cities are focused on improving their services and bringing quality to everyday life, leveraging modern ICT technologies. For this reason, the data from connected IoT devices, environmental sensors, economic platforms, social networking sites, governance systems, and others can be gathered for achieving such goals. The rapid increase in the number of vehicles in major cities of the world has made mobility in urban areas difficult, due to traffic congestion and parking availability issues. Finding a suitable parking space is often influenced by various factors such as weather conditions, traffic flows, and geographical information (markets, hospitals, parks, and others). In this study, a predictive analysis has been performed to estimate the availability of parking spaces using heterogeneous data from Cork County, Ireland. However, accumulating, processing, and analysing the produced data from heterogeneous sources is itself a challenge, due to their diverse nature and different acquisition frequencies. Therefore, a data lake has been proposed in this study to collect, process, analyse, and visualize data from disparate sources. In addition, the proposed platform is used for predicting the available parking spaces using the collected data from heterogeneous sources. The study includes proposed design and implementation details of data lake as well as the developed parking space availability prediction model using machine learning techniques
    corecore