1,144 research outputs found

    Previsão Inteligente das alterações metabólicas no cancro retal com base em modelos de machine e deep learning

    Get PDF
    Machine learning, broadly speaking, applies statistical methods to training data to automatically adjust the parameters of a model, rather than a programmer needing to set them manually. Deep Learning is a sub-area of Machine Learning that studies how to solve complex and intuitive problems. The methodologies adopted, using computational means, such as the machines learned and those understood in the world in specific contexts from previous experiences and based on the hierarchy of concepts, use the most used concepts for the form and efficient solution of more varied complex problems. The main objective in this work is to study various classification algorithms in the area of machine learning, and validate until these points can use a solution for choosing more accurate methods in the selection of tests and in new statistics to improve the therapeutic response. The data involved in the training of classification algorithms refer to all patients with metabolic diseases shredding between the years 2003-2021 and the retrospective part. The best classification algorithms to develop are used in the decision support system in the most effective way in choosing the appropriate therapy for each of the future patients who predicted an approximate rate of 20 patients per year.Machine Learning, em termos gerais, aplica métodos estatísticos aos dados de treino para ajustar automaticamente os parâmetros de um modelo, em vez de um programador necessitar de defini-los manualmente. Deep Learning é uma subárea de Machine Learning que estuda como solucionar problemas complexos e intuitivos. As metodologias propostas permitem, com recurso a meios computacionais, que as máquinas aprendam e compreendam o mundo em determinados contextos a partir de experiências anteriores e com base na hierarquia de conceitos possam compreender conceitos mais complexos de forma a solucionarem eficientemente A mais variadíssima gama de problemas. O principal objetivo neste trabalho consiste no estudo de vários algoritmos de classificação na área de machine learning de forma a validar até que ponto estes podem representar uma solução para a escolha de métodos mais precisos na selecção dos doentes e em novas estratégias para melhorar a resposta terapêutica. Os dados envolvidos para treino dos algoritmos de classificação referem-se a todos os doentes tratados com doenças metabólicas entre os anos 2003-2021 na parte retrospectiva. Os melhores algoritmos de classificação a desenvolver serão usados num sistema de apoio à decisão que ajude de forma mais efetiva na escolha da terapia adequada para cada um dos futuros pacientes que se prevê surgirem a uma taxa aproximada de 20 pacientes por ano

    Enhancing Intrusion Detection Systems with a Hybrid Deep Learning Model and Optimized Feature Composition

    Get PDF
    Systems for detecting intrusions (IDS) are essential for protecting network infrastructures from hostile activity. Advanced methods are required since traditional IDS techniques frequently fail to properly identify sophisticated and developing assaults. In this article, we suggest a novel method for improving IDS performance through the use of a hybrid deep learning model and feature composition optimization. RNN and CNN has strengths that the proposed hybrid deep learning model leverages to efficiently capture both spatial and temporal correlations in network traffic data. The model can extract useful features from unprocessed network packets using CNNs and RNNs, giving a thorough picture of network behaviour. To increase the IDS's ability to discriminate, we also offer feature optimization strategies. We uncover the most pertinent and instructive features that support precise intrusion detection through a methodical feature selection and engineering process. In order to reduce the computational load and improve the model's efficiency without compromising detection accuracy, we also use dimensionality reduction approaches. We carried out extensive experiments using a benchmark dataset that is frequently utilized in intrusion detection research to assess the suggested approach. The outcomes show that the hybrid deep learning model performs better than conventional IDS methods, obtaining noticeably greater detection rates and lower false positive rates. The performance of model is further improved by the optimized feature composition, which offers a more accurate depiction of network traffic patterns

    Application of Predicted Models in Debt Management: Developing a Machine Learning Algorithm to Predict Customer Risk at EDP Comercial

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThis report is a result of a nine-month internship at EDP Comercial where the main project of research was the application of artificial intelligence tools in the field of debt management. Debt management involves a set of strategies and processes aimed at reducing or eliminating debt and the use of artificial intelligence has shown great potential to optimize these processes and minimize the risk of debt for individuals and organizations. In terms of monitoring and controlling the creditworthiness and quality of clients, debt management has mainly been responsive and reactive, attempting to recover losses after a client has become delinquent. There is a gap in the knowledge of how to proactively identify at-risk accounts before they fall behind on payments. To avoid the constant reactive response in the field, it was developed a machine-learning algorithm that predicts the risk of a client becoming in debt by analyzing their scorecard, which measures the quality of a client based on their infringement history. After preprocessing the data, XGBoost was implemented to a dataset of 3M customers with at least one active contract on EDP, on electricity or gas. Hyperparameter tuning was performed on the model to reach an F1 score of 0.7850 on the training set and 0.7835 on the test set. The results were discussed and based on those, recommendations and improvements were also identified

    Sviluppo di un metodo innovativo per la misura del comfort termico attraverso il monitoraggio di parametri fisiologici e ambientali in ambienti indoor

    Get PDF
    openLa misura del comfort termico in ambienti indoor è un argomento di interesse per la comunità scientifica, poiché il comfort termico incide profondamente sul benessere degli utenti ed inoltre, per garantire condizioni di comfort ottimali, gli edifici devono affrontare costi energetici elevati. Anche se esistono norme nel campo dell'ergonomia del comfort che forniscono linee guida per la valutazione del comfort termico, può succedere che in contesti reali sia molto difficile ottenere una misurazione accurata. Pertanto, per migliorare la misura del comfort termico negli edifici, la ricerca si sta concentrando sulla valutazione dei parametri personali e fisiologici legati al comfort termico, per creare ambienti su misura per l’utente. Questa tesi presenta diversi contributi riguardo questo argomento. Infatti, in questo lavoro di ricerca, sono stati implementati una serie di studi per sviluppare e testare procedure di misurazione in grado di valutare quantitativamente il comfort termico umano, tramite parametri ambientali e fisiologici, per catturare le peculiarità che esistono tra i diversi utenti. In primo luogo, è stato condotto uno studio in una camera climatica controllata, con un set di sensori invasivi utilizzati per la misurazione dei parametri fisiologici. L'esito di questa ricerca è stato utile per ottenere una prima accuratezza nella misurazione del comfort termico dell'82%, ottenuta mediante algoritmi di machine learning (ML) che forniscono la sensazione termica (TSV) utilizzando la variabilità della frequenza cardiaca (HRV) , parametro che la letteratura ha spesso riportato legato sia al comfort termico dell'utenza che alle grandezze ambientali. Questa ricerca ha dato origine a uno studio successivo in cui la valutazione del comfort termico è stata effettuata utilizzando uno smartwatch minimamente invasivo per la raccolta dell’HRV. Questo secondo studio consisteva nel variare le condizioni ambientali di una stanza semi-controllata, mentre i partecipanti potevano svolgere attività di ufficio ma in modo limitato, ovvero evitando il più possibile i movimenti della mano su cui era indossato lo smartwatch. Con questa configurazione, è stato possibile stabilire che l'uso di algoritmi di intelligenza artificiale (AI) e il set di dati eterogeneo creato aggregando parametri ambientali e fisiologici, può fornire una misura di TSV con un errore medio assoluto (MAE) di 1.2 e un errore percentuale medio assoluto (MAPE) del 20%. Inoltre, tramite il Metodo Monte Carlo (MCM) è stato possibile calcolare l'impatto delle grandezze in ingresso sul calcolo del TSV. L'incertezza più alta è stata raggiunta a causa dell'incertezza nella misura della temperatura dell'aria (U = 14%) e dell'umidità relativa (U = 10,5%). L'ultimo contributo rilevante ottenuto con questa ricerca riguarda la misura del comfort termico in ambiente reale, semi controllato, in cui il partecipante non è stato costretto a limitare i propri movimenti. La temperatura della pelle è stata inclusa nel set-up sperimentale, per migliorare la misurazione del TSV. I risultati hanno mostrato che l'inclusione della temperatura della pelle per la creazione di modelli personalizzati, realizzati utilizzando i dati provenienti dal singolo partecipante, porta a risultati soddisfacenti (MAE = 0,001±0,0003 e MAPE = 0,02%±0,09%). L'approccio più generalizzato, invece, che consiste nell'addestrare gli algoritmi sull'intero gruppo di partecipanti tranne uno, e utilizzare quello tralasciato per il test, fornisce prestazioni leggermente inferiori (MAE = 1±0.2 e MAPE = 25% ±6%). Questo risultato evidenzia come in condizioni semi-controllate, la previsione di TSV utilizzando la temperatura della pelle e l'HRV possa essere eseguita con un certo grado di incertezza.Measuring human thermal comfort in indoor environments is a topic of interest in the scientific community, since thermal comfort deeply affects the well-being of occupants and furthermore, to guarantee optimal comfort conditions, buildings must face high energy costs. Even if there are standards in the field of the ergonomics of the thermal environment that provide guidelines for thermal comfort assessment, it can happen that in real-world settings it is very difficult to obtain an accurate measurement. Therefore, to improve the measurement of thermal comfort of occupants in buildings, research is focusing on the assessment of personal and physiological parameters related to thermal comfort, to create environments carefully tailored to the occupant that lives in it. This thesis presents several contributions to this topic. In fact, in the following research work, a set of studies were implemented to develop and test measurement procedures capable of quantitatively assessing human thermal comfort, by means of environmental and physiological parameters, to capture peculiarities among different occupants. Firstly, it was conducted a study in a controlled climatic chamber with an invasive set of sensors used for measuring physiological parameters. The outcome of this research was helpful to achieve a first accuracy in the measurement of thermal comfort of 82%, obtained by training machine learning (ML) algorithms that provide the thermal sensation vote (TSV) by means of environmental quantities and heart rate variability (HRV), a parameter that literature has often reported being related to both users' thermal comfort. This research gives rise to a subsequent study in which thermal comfort assessment was made by using a minimally invasive smartwatch for collecting HRV. This second study consisted in varying the environmental conditions of a semi-controlled test-room, while participants could carry out light-office activities but in a limited way, i.e. avoiding the movements of the hand on which the smartwatch was worn as much as possible. With this experimental setup, it was possible to establish that the use of artificial intelligence (AI) algorithms (such as random forest or convolutional neural networks) and the heterogeneous dataset created by aggregating environmental and physiological parameters, can provide a measure of TSV with a mean absolute error (MAE) of 1.2 and a mean absolute percentage error (MAPE) of 20%. In addition, by using of Monte Carlo Method (MCM), it was possible to compute the impact of the uncertainty of the input quantities on the computation of the TSV. The highest uncertainty was reached due to the air temperature uncertainty (U = 14%) and relative humidity (U = 10.5%). The last relevant contribution obtained with this research work concerns the measurement of thermal comfort in a real-life setting, semi-controlled environment, in which the participant was not forced to limit its movements. Skin temperature was included in the experimental set-up, to improve the measurement of TSV. The results showed that the inclusion of skin temperature for the creation of personalized models, made by using data coming from the single participant brings satisfactory results (MAE = 0.001±0.0003 and MAPE = 0.02%±0.09%). On the other hand, the more generalized approach, which consists in training the algorithms on the whole bunch of participants except one, and using the one left out for the test, provides slightly lower performances (MAE = 1±0.2 and MAPE = 25%±6%). This result highlights how in semi-controlled conditions, the prediction of TSV using skin temperature and HRV can be performed with acceptable accuracy.INGEGNERIA INDUSTRIALEembargoed_20220321Morresi, Nicol

    Forest cover monitoring in Southwestern Ghana with remote sensing and GIS

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesObuasi is one of the major municipalities in southwestern Ghana, Forest resources play a major significant role in the day-to-day activities of the locals due to their high dependency on it. Despite this contribution, the annual rate of current deforestation in Obuasi is about 50 hectares. At this rate the municipality may lose its substantial forest cover completely in the next 25 years. GIS and remote sensing techniques have proven to be efficient ways to monitor forest cover, especially on a large-scale using satellite imagery. In this study, a post-classification comparison change detection algorithm was used to determine the change in forest cover in the 1991-2021 period. The methodology includes a statistical analysis of rainfall and temperature variability for a period of 30years as well as the analysis of perceptions and knowledge of locals on forest modifications. MOLUSCE plugin in QGIS was used to model and generate maps of forest cover and predict future changes in land use/land cover. The land-use/landcover maps showed that between 1991 to 2000 forest areas decline at the rate of 17.1% while another class such as agricultural, built-up, and mining sites has a significant increase of 14%, 4%, and 2% respectively. Between 2000 to 2021, forest areas and agricultural lands decrease from 67% to 60% and 26% to 20% respectively while built-up and mining areas increase from 4% to 12% and 3% to 7% through forest areas remain the dominant landcover class in the area. During the same study period, there was a fluctuation in climatic conditions. Rainfall between 1991 to 2021 has reduced by an amount of 24 mm while temperature has increased to 0.037°C per annum. The majority of the locals believe that cultivated land expansion and mining are the driving forces of forest cover change in the area and the only solutions to these issues are through enrichment planting, and strengthening forest protection laws and mining regulations. Future prediction on forest cover in the area for 2030 map shows that forest areas will be the major contributor of land to other land use/landcover class, henceforth causing it to decline if no intervention is made. These findings can be used to inform conservation and management strategies to mitigate the impact of forest cover change and protect the ecological integrity of forests in the municipality

    Big Data Analytics for Complex Systems

    Get PDF
    The evolution of technology in all fields led to the generation of vast amounts of data by modern systems. Using data to extract information, make predictions, and make decisions is the current trend in artificial intelligence. The advancement of big data analytics tools made accessing and storing data easier and faster than ever, and machine learning algorithms help to identify patterns in and extract information from data. The current tools and machines in health, computer technologies, and manufacturing can generate massive raw data about their products or samples. The author of this work proposes a modern integrative system that can utilize big data analytics, machine learning, super-computer resources, and industrial health machines’ measurements to build a smart system that can mimic the human intelligence skills of observations, detection, prediction, and decision-making. The applications of the proposed smart systems are included as case studies to highlight the contributions of each system. The first contribution is the ability to utilize big data revolutionary and deep learning technologies on production lines to diagnose incidents and take proper action. In the current digital transformational industrial era, Industry 4.0 has been receiving researcher attention because it can be used to automate production-line decisions. Reconfigurable manufacturing systems (RMS) have been widely used to reduce the setup cost of restructuring production lines. However, the current RMS modules are not linked to the cloud for online decision-making to take the proper decision; these modules must connect to an online server (super-computer) that has big data analytics and machine learning capabilities. The online means that data is centralized on cloud (supercomputer) and accessible in real-time. In this study, deep neural networks are utilized to detect the decisive features of a product and build a prediction model in which the iFactory will make the necessary decision for the defective products. The Spark ecosystem is used to manage the access, processing, and storing of the big data streaming. This contribution is implemented as a closed cycle, which for the best of our knowledge, no one in the literature has introduced big data analysis using deep learning on real-time applications in the manufacturing system. The code shows a high accuracy of 97% for classifying the normal versus defective items. The second contribution, which is in Bioinformatics, is the ability to build supervised machine learning approaches based on the gene expression of patients to predict proper treatment for breast cancer. In the trial, to personalize treatment, the machine learns the genes that are active in the patient cohort with a five-year survival period. The initial condition here is that each group must only undergo one specific treatment. After learning about each group (or class), the machine can personalize the treatment of a new patient by diagnosing the patients’ gene expression. The proposed model will help in the diagnosis and treatment of the patient. The future work in this area involves building a protein-protein interaction network with the selected genes for each treatment to first analyze the motives of the genes and target them with the proper drug molecules. In the learning phase, a couple of feature-selection techniques and supervised standard classifiers are used to build the prediction model. Most of the nodes show a high-performance measurement where accuracy, sensitivity, specificity, and F-measure ranges around 100%. The third contribution is the ability to build semi-supervised learning for the breast cancer survival treatment that advances the second contribution. By understanding the relations between the classes, we can design the machine learning phase based on the similarities between classes. In the proposed research, the researcher used the Euclidean matrix distance among each survival treatment class to build the hierarchical learning model. The distance information that is learned through a non-supervised approach can help the prediction model to select the classes that are away from each other to maximize the distance between classes and gain wider class groups. The performance measurement of this approach shows a slight improvement from the second model. However, this model reduced the number of discriminative genes from 47 to 37. The model in the second contribution studies each class individually while this model focuses on the relationships between the classes and uses this information in the learning phase. Hierarchical clustering is completed to draw the borders between groups of classes before building the classification models. Several distance measurements are tested to identify the best linkages between classes. Most of the nodes show a high-performance measurement where accuracy, sensitivity, specificity, and F-measure ranges from 90% to 100%. All the case study models showed high-performance measurements in the prediction phase. These modern models can be replicated for different problems within different domains. The comprehensive models of the newer technologies are reconfigurable and modular; any newer learning phase can be plugged-in at both ends of the learning phase. Therefore, the output of the system can be an input for another learning system, and a newer feature can be added to the input to be considered for the learning phase

    Development of machine learning models for short-term water level forecasting

    Get PDF
    The impact of precise river flood forecasting and warnings in preventing potential victims along with promoting awareness and easing evacuation is realized in the reduction of flood damage and avoidance of loss of life. Machine learning models have been used widely in flood forecasting through discharge. However the usage of discharge can be inconvenient in terms of issuing a warning since discharge is not the direct measure for the early warning system. This paper focuses on water level prediction on the Storå River, Denmark utilizing several machine learning models. The study revealed that the transformation of features to follow a Gaussian-like distribution did not improve the prediction accuracy further. Additional data through different feature sets resulted in increased prediction performance of the machine learning models. Using a hybrid method for the feature selection improved the prediction performance as well. The Feed-Forward Neural Network gave the lowest mean absolute error and highest coefficient of determination value. The results indicated the difference in prediction performance in terms of mean absolute error term between the Feed-Forward Neural Network and the Multiple Linear Regression model was 0.003 cm. It was concluded that the Multiple Linear Regression model would be a good alternative when time, resources, or expert knowledge is limited

    Monitoring Cloud-prone Complex Landscapes At Multiple Spatial Scales Using Medium And High Resolution Optical Data: A Case Study In Central Africa

    Get PDF
    Tracking land surface dynamics over cloud-prone areas with complex mountainous terrain and a landscape that is heterogeneous at a scale of approximately 10 m, is an important challenge in the remote sensing of tropical regions in developing nations, due to the small plot sizes. Persistent monitoring of natural resources in these regions at multiple spatial scales requires development of tools to identify emerging land cover transformation due to anthropogenic causes, such as agricultural expansion and climate change. Along with the cloud cover and obstructions by topographic distortions due to steep terrain, there are limitations to the accuracy of monitoring change using available historical satellite imagery, largely due to sparse data access and the lack of high quality ground truth for classifier training. One such complex region is the Lake Kivu region in Central Africa. This work addressed these problems to create an effective process for monitoring the Lake Kivu region located in Central Africa. The Lake Kivu region is a biodiversity hotspot with a complex and heterogeneous landscape and intensive agricultural development, where individual plot sizes are often at the scale of 10m. Procedures were developed that use optical data from satellite and aerial observations at multiple scales to tackle the monitoring challenges. First, a novel processing chain was developed to systematically monitor the spatio-temporal land cover dynamics of this region over the years 1988, 2001, and 2011 using Landsat data, complemented by ancillary data. Topographic compensation was performed on Landsat reflectances to avoid the strong illumination angle impacts and image compositing was used to compensate for frequent cloud cover and thus incomplete annual data availability in the archive. A systematic supervised classification, using the state-of-the-art machine learning classifier Random Forest, was applied to the composite Landsat imagery to obtain land cover thematic maps with overall accuracies of 90% and higher. Subsequent change analysis between these years found extensive conversions of the natural environment as a result of human related activities. The gross forest cover loss for 1988-2001 and 2001- 2011 periods was 216.4 and 130.5 thousand hectares, respectively, signifying significant deforestation in the period of civil war and a relatively stable and lower deforestation rate later, possibly due to conservation and reforestation efforts in the region. The other dominant land cover changes in the region were aggressive subsistence farming and urban expansion displacing natural vegetation and arable lands. Despite limited data availability, this study fills the gap of much needed detailed and updated land cover change information for this biologically important region of Central Africa. While useful on a regional scale, Landsat data can be inadequate for more detailed studies of land cover change. Based on an increasing availability of high resolution imagery and light detection and ranging (LiDAR) data from manned and unmanned aerial platforms (\u3c1m \u3eresolution), a study was performed leading to a novel generic framework for land cover monitoring at fine spatial scales. The approach fuses high spatial resolution aerial imagery and LiDAR data to produce land cover maps with high spatial detail using object-based image analysis techniques. The classification framework was tested for a scene with both natural and cultural features and was found to be more than 90 percent accurate, sufficient for detailed land cover change studies

    Dimensionality reduction by kernel CCA in reproducing kernel hilbert spaces

    Get PDF
    Master'sMASTER OF SCIENC
    corecore