Search CORE

132 research outputs found

Monitoring the waste to energy plant using the latest AI methods and tools

Author: Kabugo James
Publication venue
Publication date: 27/02/2018
Field of study

Solid wastes for instance, municipal and industrial wastes present great environmental concerns and challenges all over the world. This has led to development of innovative waste-to-energy process technologies capable of handling different waste materials in a more sustainable and energy efficient manner. However, like in many other complex industrial process operations, waste-to-energy plants would require sophisticated process monitoring systems in order to realize very high overall plant efficiencies. Conventional data-driven statistical methods which include principal component analysis, partial least squares, multivariable linear regression and so forth, are normally applied in process monitoring. But recently, latest artificial intelligence (AI) methods in particular deep learning algorithms have demostrated remarkable performances in several important areas such as machine vision, natural language processing and pattern recognition. The new AI algorithms have gained increasing attention from the process industrial applications for instance in areas such as predictive product quality control and machine health monitoring. Moreover, the availability of big-data processing tools and cloud computing technologies further support the use of deep learning based algorithms for process monitoring. In this work, a process monitoring scheme based on the state-of-the-art artificial intelligence methods and cloud computing platforms is proposed for a waste-to-energy industrial use case. The monitoring scheme supports use of latest AI methods, laveraging big-data processing tools and taking advantage of available cloud computing platforms. Deep learning algorithms are able to describe non-linear, dynamic and high demensionality systems better than most conventional data-based process monitoring methods. Moreover, deep learning based methods are best suited for big-data analytics unlike traditional statistical machine learning methods which are less efficient. Furthermore, the proposed monitoring scheme emphasizes real-time process monitoring in addition to offline data analysis. To achieve this the monitoring scheme proposes use of big-data analytics software frameworks and tools such as Microsoft Azure stream analytics, Apache storm, Apache Spark, Hadoop and many others. The availability of open source in addition to proprietary cloud computing platforms, AI and big-data software tools, all support the realization of the proposed monitoring scheme

Data Science for Decision Support: Using Machine Learning and Big data in Sales Forecasting for Production and Retail

Author: Khakpour Alireza
Publication venue
Publication date: 01/01/2020
Field of study

Desarrollo de modelos basados en patrones para la predicción de series temporales en entornos Big Data

Author: Pérez Chacón Rubén
Publication venue
Publication date: 01/01/2021
Field of study

Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111Esta Tesis Doctoral se presenta mediante la modalidad de compendio de publicaciones y en ella se aportan distintas contribuciones científicas en Congresos Internacionales y revistas con alto índice de impacto en el Journal of Citation Reports (JCR). Durante los cinco años de investigación a tiempo parcial, se ha realizado una investigación encaminada al estudio, análisis y predicción de grandes conjuntos de series temporales, principalmente de tipo energético. Para ello, se han seguido las últimas tendencias tecnológicas en el ámbito de la computación distribuida, desarrollando la experimentación íntegramente en Scala, el lenguaje nativo del framework Apache Spark, realizando las pruebas experimentales en entornos reales como Amazon Web Services u Open Telekom Cloud. La primera fase de la Tesis Doctoral se centra en el desarrollo y aplicación de una metodología que permite analizar de manera eficiente conjuntos de datos que contienen series temporales de consumo eléctrico, generados por la red de contadores eléctricos inteligentes instalados en la Universidad Pablo de Olavide. La metodología propuesta se enfoca principalmente en la correcta aplicación en entornos distribuidos del algoritmo de clustering K-means a grandes conjuntos de datos, permitiendo segmentar conjuntos de

n

observaciones en

k

grupos distintos con características similares. Esta tarea se realiza utilizando una versión paralelizada del algoritmo llamado K-means++, incluido en la Machine Learning Library de Apache Spark. Para la elección del número óptimo de clusters, se adopta una estrategia en la que se evalúan distintos índices de validación de clusters tales como el Within Set Sum of Squared Error, Davies-Bouldin, Dunn y Silhouette, todos ellos desarrollados para su aplicación en entornos distribuidos. Los resultados de esta experimentación se expusieron en 13th International Conference on Distributed Computing and Artificial Intelligence. Posteriormente, se amplió la experimentación y la metodología, resultando en un artículo publicado en la revista Energies, indexada en JCR con categoría Q3. La segunda parte del trabajo realizado consiste en el diseño de una metodología y desarrollo de un algoritmo capaz de pronosticar eficazmente series temporales en entornos Big Data. Para ello, se analizó el conocido algoritmo Pattern Sequence-based Forecasting (PSF), con dos objetivos principales: por un lado, su adaptación para aplicarlo en entornos escalables y distribuidos y, por otro lado, la mejora de las predicciones que realiza, enfocándolo a la explotación de grandes conjuntos de datos de una manera eficiente. En este sentido, se ha desarrollado en lenguaje Scala un algoritmo llamado bigPSF, que se integra en el marco de una completa metodología diseñada para a pronosticar el consumo energético de una Smart City. Finalmente, se desarrolló una variante del algoritmo bigPSF llamada MV-bigPSF, capaz de predecir series temporales multivariables. Esta experimentación se ha plasmado en dos artículos científicos publicados en las revistas Information Sciences (para el artículo relativo al algoritmo bigPSF) y Applied Energy (relativo al estudio de la versión multivariable del mismo), ambas con un índice de impacto JCR con categoría Q1.Universidad Pablo de Olavide de Sevilla. Escuela de Doctorad

Data Science in Healthcare

Author
Publication venue: 'MDPI AG'
Publication date: 21/06/2022
Field of study

Data science is an interdisciplinary field that applies numerous techniques, such as machine learning, neural networks, and deep learning, to create value based on extracting knowledge and insights from available data. Advances in data science have a significant impact on healthcare. While advances in the sharing of medical information result in better and earlier diagnoses as well as more patient-tailored treatments, information management is also affected by trends such as increased patient centricity (with shared decision making), self-care (e.g., using wearables), and integrated care delivery. The delivery of health services is being revolutionized through the sharing and integration of health data across organizational boundaries. Via data science, researchers can deliver new approaches to merge, analyze, and process complex data and gain more actionable insights, understanding, and knowledge at the individual and population levels. This Special Issue focuses on how data science is used in healthcare (e.g., through predictive modeling) and on related topics, such as data sharing and data management

ADDRESSING GEOGRAPHICAL CHALLENGES IN THE BIG DATA ERA UTILIZING CLOUD COMPUTING

Author: Lan Hai
Publication venue
Publication date: 01/01/2020
Field of study

Processing, mining and analyzing big data adds significant value towards solving previously unverified research questions or improving our ability to understand problems in geographical sciences. This dissertation contributes to developing a solution that supports researchers who may not otherwise have access to traditional high-performance computing resources so they benefit from the “big data” era, and implement big geographical research in ways that have not been previously possible. Using approaches from the fields of geographic information science, remote sensing and computer science, this dissertation addresses three major challenges in big geographical research: 1) how to exploit cloud computing to implement a universal scalable solution to classify multi-sourced remotely sensed imagery datasets with high efficiency; 2) how to overcome the missing data issue in land use land cover studies with a high-performance framework on the cloud through the use of available auxiliary datasets; and 3) the design considerations underlying a universal massive scale voxel geographical simulation model to implement complex geographical systems simulation using a three dimensional spatial perspective. This dissertation implements an in-memory distributed remotely sensed imagery classification framework on the cloud using both unsupervised and supervised classifiers, and classifies remotely sensed imagery datasets of the Suez Canal area, Egypt and Inner Mongolia, China under different cloud environments. This dissertation also implements and tests a cloud-based gap filling model with eleven auxiliary datasets in biophysical and social-economics in Inner Mongolia, China. This research also extends a voxel-based Cellular Automata model using graph theory and develops this model as a massive scale voxel geographical simulation framework to simulate dynamic processes, such as air pollution particles dispersal on cloud

Short Papers of the 11th Conference on Cloud Computing Conference, Big Data & Emerging Topics (JCC-BD&ET 2023)

Author: Chichizola Franco
De Giusti Armando Eduardo
De Giusti Laura Cristina
Naiouf Marcelo
Rucci Enzo
Publication venue: Facultad de Informática (UNLP)
Publication date: 07/07/2023
Field of study

Compilación de los short papers presentados en las 11vas Jornadas de Cloud Computing, Big Data & Emerging Topics (JCC-BD&ET2023), llevadas a cabo en modalidad híbrida durante junio de 2023 y organizadas por el Instituto de Investigación en Informática LIDI (III-LIDI) y la Secretaría de Posgrado de la Facultad de Informática de la UNLP en colaboración con universidades de Argentina y del exterior.Facultad de Informátic

Servicio de Difusión de la Creación Intelectual

The Convergence of Human and Artificial Intelligence on Clinical Care - Part I

Author
Publication venue: 'MDPI AG'
Publication date: 21/03/2022
Field of study

This edited book contains twelve studies, large and pilots, in five main categories: (i) adaptive imputation to increase the density of clinical data for improving downstream modeling; (ii) machine-learning-empowered diagnosis models; (iii) machine learning models for outcome prediction; (iv) innovative use of AI to improve our understanding of the public view; and (v) understanding of the attitude of providers in trusting insights from AI for complex cases. This collection is an excellent example of how technology can add value in healthcare settings and hints at some of the pressing challenges in the field. Artificial intelligence is gradually becoming a go-to technology in clinical care; therefore, it is important to work collaboratively and to shift from performance-driven outcomes to risk-sensitive model optimization, improved transparency, and better patient representation, to ensure more equitable healthcare for all

Modelos predictivos basados en deep learning para datos temporales masivos

Author: Torres Maldonado José Francisco
Publication venue
Publication date: 01/01/2022
Field of study

Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111El avance en el mundo del hardware ha revolucionado el campo de la inteligencia artificial, abriendo nuevos frentes y áreas que hasta hoy estaban limitadas. El área del deep learning es quizás una de las mas afectadas por este avance, ya que estos modelos requieren de una gran capacidad de computación debido al número de operaciones y complejidad de las mismas, motivo por el cual habían caído en desuso hasta los últimos años. Esta Tesis Doctoral ha sido presentada mediante la modalidad de compendio de publicaciones, con un total de diez aportaciones científicas en Congresos Internacionales y revistas con alto índice de impacto en el Journal of Citation Reports (JCR). En ella se recoge una investigación orientada al estudio, análisis y desarrollo de las arquitecturas deep learning mas extendidas en la literatura para la predicción de series temporales, principalmente de tipo energético, como son la demanda eléctrica y la generación de energía solar. Además, se ha centrado gran parte de la investigación en la optimización de estos modelos, tarea primordial para la obtención de un modelo predictivo fiable. En una primera fase, la tesis se centra en el desarrollo de modelos predictivos basados en deep learning para la predicción de series temporales aplicadas a dos fuentes de datos reales. En primer lugar se diseñó una metodología que permitía realizar la predicción multipaso de un modelo Feed-Forward, cuyos resultados fueron publicados en el International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC). Esta misma metodología se aplicó y comparó con otros modelos clásicos, implementados de manera distribuida, cuyos resultados fueron publicados en el 14th International Work-Conference on Artificial Neural Networks (IWANN). Fruto de la diferencia en tiempo de computación y escalabilidad del método de deep learning con los otros modelos comparados, se diseñó una versión distribuida, cuyos resultados fueron publicados en dos revistas indexadas con categoría Q1, como son Integrated Computer-Aided Engineering e Information Sciences. Todas estas aportaciones fueron probadas utilizando un conjunto de datos de demanda eléctrica en España. De forma paralela, y con el objetivo de comprobar la generalidad de la metodología, se aplicó el mismo enfoque sobre un conjunto de datos correspondiente a la generación de energía solar en Australia en dos versiones: univariante, cuyos resultados se publicaron en International on Soft Computing Models in Industrial and Environment Applications (SOCO), y la versión multivariante, que fué publicada en la revista Expert Systems, indexada con categoría Q2. A pesar de los buenos resultados obtenidos, la estrategia de optimización de los modelos no era óptima para entornos big data debido a su carácter exhaustivo y al coste computacional que conllevaba. Motivado por esto, la segunda fase de la Tesis Doctoral se basó en la optimización de los modelos deep learning. Se diseñó una estrategia de búsqueda aleatoria aplicada a la metodología propuesta en la primera fase, cuyos resultados fueron publicados en el IWANN. Posteriormente, se centró la atención en modelos de optimización basado en heurísticas, donde se desarrolló un algoritmo genético para optimizar el modelo feed-forward. Los resultados de esta investigación se presentaron en la revista Applied Sciences, indexada con categoría Q2. Además, e influenciado por la situación pandémica del 2020, se decidió diseñar e implementar una heurística basada en el modelo de propagación de la COVID-19. Esta estrategia de optimización se integró con una red Long-Short-Term-Memory, ofreciendo resultados altamente competitivos que fueron publicados en la revista Big Data, indexada en el JCR con categoría Q1. Para finalizar el trabajo de tesis, toda la información y conocimientos adquiridos fueron recopilados en un artículo a modo de survey, que fue publicado en la revista indexada con categoría Q1 Big Data.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e Informátic

Big Data Analytics for Complex Systems

Author: Abou Tabl Ashraf Mohamed
Publication venue: 'University of Windsor Leddy Library'
Publication date: 10/09/2019
Field of study

The evolution of technology in all fields led to the generation of vast amounts of data by modern systems. Using data to extract information, make predictions, and make decisions is the current trend in artificial intelligence. The advancement of big data analytics tools made accessing and storing data easier and faster than ever, and machine learning algorithms help to identify patterns in and extract information from data. The current tools and machines in health, computer technologies, and manufacturing can generate massive raw data about their products or samples. The author of this work proposes a modern integrative system that can utilize big data analytics, machine learning, super-computer resources, and industrial health machines’ measurements to build a smart system that can mimic the human intelligence skills of observations, detection, prediction, and decision-making. The applications of the proposed smart systems are included as case studies to highlight the contributions of each system. The first contribution is the ability to utilize big data revolutionary and deep learning technologies on production lines to diagnose incidents and take proper action. In the current digital transformational industrial era, Industry 4.0 has been receiving researcher attention because it can be used to automate production-line decisions. Reconfigurable manufacturing systems (RMS) have been widely used to reduce the setup cost of restructuring production lines. However, the current RMS modules are not linked to the cloud for online decision-making to take the proper decision; these modules must connect to an online server (super-computer) that has big data analytics and machine learning capabilities. The online means that data is centralized on cloud (supercomputer) and accessible in real-time. In this study, deep neural networks are utilized to detect the decisive features of a product and build a prediction model in which the iFactory will make the necessary decision for the defective products. The Spark ecosystem is used to manage the access, processing, and storing of the big data streaming. This contribution is implemented as a closed cycle, which for the best of our knowledge, no one in the literature has introduced big data analysis using deep learning on real-time applications in the manufacturing system. The code shows a high accuracy of 97% for classifying the normal versus defective items. The second contribution, which is in Bioinformatics, is the ability to build supervised machine learning approaches based on the gene expression of patients to predict proper treatment for breast cancer. In the trial, to personalize treatment, the machine learns the genes that are active in the patient cohort with a five-year survival period. The initial condition here is that each group must only undergo one specific treatment. After learning about each group (or class), the machine can personalize the treatment of a new patient by diagnosing the patients’ gene expression. The proposed model will help in the diagnosis and treatment of the patient. The future work in this area involves building a protein-protein interaction network with the selected genes for each treatment to first analyze the motives of the genes and target them with the proper drug molecules. In the learning phase, a couple of feature-selection techniques and supervised standard classifiers are used to build the prediction model. Most of the nodes show a high-performance measurement where accuracy, sensitivity, specificity, and F-measure ranges around 100%. The third contribution is the ability to build semi-supervised learning for the breast cancer survival treatment that advances the second contribution. By understanding the relations between the classes, we can design the machine learning phase based on the similarities between classes. In the proposed research, the researcher used the Euclidean matrix distance among each survival treatment class to build the hierarchical learning model. The distance information that is learned through a non-supervised approach can help the prediction model to select the classes that are away from each other to maximize the distance between classes and gain wider class groups. The performance measurement of this approach shows a slight improvement from the second model. However, this model reduced the number of discriminative genes from 47 to 37. The model in the second contribution studies each class individually while this model focuses on the relationships between the classes and uses this information in the learning phase. Hierarchical clustering is completed to draw the borders between groups of classes before building the classification models. Several distance measurements are tested to identify the best linkages between classes. Most of the nodes show a high-performance measurement where accuracy, sensitivity, specificity, and F-measure ranges from 90% to 100%. All the case study models showed high-performance measurements in the prediction phase. These modern models can be replicated for different problems within different domains. The comprehensive models of the newer technologies are reconfigurable and modular; any newer learning phase can be plugged-in at both ends of the learning phase. Therefore, the output of the system can be an input for another learning system, and a newer feature can be added to the input to be considered for the learning phase