4,438 research outputs found

    Clustering: Methodology, hybrid systems, visualization, validation and implementation

    Get PDF
    Unsupervised learning is one of the most important steps of machine learning applications. Besides its ability to obtain the insight of the data distribution, unsupervised learning is used as a preprocessing step for other machine learning algorithm. This dissertation investigates the application of unsupervised learning into various types of data for many machine learning tasks such as clustering, regression and classification. The dissertation is organized into three papers. In the first paper, unsupervised learning is applied to mixed categorical and numerical feature data type to transform the data objects from the mixed type feature domain into a new sparser numerical domain. By making use of the data fusion capacity of adaptive resonance theory clustering, the approach is able to reduce the distinction between the numerical and categorical features. The second paper presents a novel method to improve the performance of wind forecast by clustering the time series of the surrounding wind mills into the similar group by using hidden Markov model clustering and using the clustering information to enhance the forecast. A fast forecast method is also introduced by using extreme learning machine which can be trained by analytic form to choose the optimal value of past samples for prediction and appropriate size of the neural network. In the third paper, unsupervised learning is used to automatically learn the feature from the dataset itself without human design of sophisticated feature extractors. The paper points out that by using unsupervised feature learning with multi-quadric radial basis function extreme learning machine the performance of the classifier is better than several other supervised learning methods. The paper further improves the speed of training the neural network by presenting an algorithm that runs parallel on GPU --Abstract, page iv

    Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms

    Get PDF
    [EN] Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as "dark data", i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the next generation of IoT applications that benefits society. Artificial Intelligence (AI) techniques are very well suited to identifying hidden patterns and correlations in this data deluge. In particular, clustering algorithms are of the utmost importance for performing exploratory data analysis to identify a set (a.k.a., cluster) of similar objects. Clustering algorithms are computationally heavy workloads and require to be executed on high-performance computing clusters, especially to deal with large datasets. This execution on HPC infrastructures is an energy hungry procedure with additional issues, such as high-latency communications or privacy. Edge computing is a paradigm to enable light-weight computations at the edge of the network that has been proposed recently to solve these issues. In this paper, we provide an in-depth analysis of emergent edge computing architectures that include low-power Graphics Processing Units (GPUs) to speed-up these workloads. Our analysis includes performance and power consumption figures of the latest Nvidia's AGX Xavier to compare the energy-performance ratio of these low-cost platforms with a high-performance cloud-based counterpart version. Three different clustering algorithms (i.e., k-means, Fuzzy Minimals (FM), and Fuzzy C-Means (FCM)) are designed to be optimally executed on edge and cloud platforms, showing a speed-up factor of up to 11x for the GPU code compared to sequential counterpart versions in the edge platforms and energy savings of up to 150% between the edge computing and HPC platforms.This work has been partially supported by the Spanish Ministry of Science and Innovation, under the Ramon y Cajal Program (Grant No. RYC2018-025580-I) and under grants RTI2018-096384-B-I00, RTC-2017-6389-5 and RTC2019-007159-5 and by the Fundacion Seneca del Centro de Coordinacion de la Investigacion de la Region de Murcia under Project 20813/PI/18.Cecilia-Canales, JM.; Cano, J.; Morales-García, J.; Llanes, A.; Imbernón, B. (2020). Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms. Sensors. 20(21):1-19. https://doi.org/10.3390/s20216335S1192021Gebauer, H., Fleisch, E., Lamprecht, C., & Wortmann, F. (2020). Growth paths for overcoming the digitalization paradox. Business Horizons, 63(3), 313-323. doi:10.1016/j.bushor.2020.01.005Guillén, M. A., Llanes, A., Imbernón, B., Martínez-España, R., Bueno-Crespo, A., Cano, J.-C., & Cecilia, J. M. (2020). Performance evaluation of edge-computing platforms for the prediction of low temperatures in agriculture using deep learning. The Journal of Supercomputing, 77(1), 818-840. doi:10.1007/s11227-020-03288-wWang, J., Ma, Y., Zhang, L., Gao, R. X., & Wu, D. (2018). Deep learning for smart manufacturing: Methods and applications. Journal of Manufacturing Systems, 48, 144-156. doi:10.1016/j.jmsy.2018.01.003Gretzel, U., Sigala, M., Xiang, Z., & Koo, C. (2015). Smart tourism: foundations and developments. Electronic Markets, 25(3), 179-188. doi:10.1007/s12525-015-0196-8Pramanik, M. I., Lau, R. Y. K., Demirkan, H., & Azad, M. A. K. (2017). Smart health: Big data enabled health paradigm within smart cities. Expert Systems with Applications, 87, 370-383. doi:10.1016/j.eswa.2017.06.027Weber, M., & Podnar Žarko, I. (2019). A Regulatory View on Smart City Services. Sensors, 19(2), 415. doi:10.3390/s19020415Ghosh, A., Chakraborty, D., & Law, A. (2018). Artificial intelligence in Internet of things. CAAI Transactions on Intelligence Technology, 3(4), 208-218. doi:10.1049/trit.2018.1008Monti, L., Vincenzi, M., Mirri, S., Pau, G., & Salomoni, P. (2020). RaveGuard: A Noise Monitoring Platform Using Low-End Microphones and Machine Learning. Sensors, 20(19), 5583. doi:10.3390/s20195583Kumar, P., Sinha, K., Nere, N. K., Shin, Y., Ho, R., Mlinar, L. B., & Sheikh, A. Y. (2020). A machine learning framework for computationally expensive transient models. Scientific Reports, 10(1). doi:10.1038/s41598-020-67546-wMittal, S., & Vetter, J. S. (2015). A Survey of CPU-GPU Heterogeneous Computing Techniques. ACM Computing Surveys, 47(4), 1-35. doi:10.1145/2788396Singh, D., & Reddy, C. K. (2014). A survey on platforms for big data analytics. Journal of Big Data, 2(1). doi:10.1186/s40537-014-0008-6Khayyat, M., Elgendy, I. A., Muthanna, A., Alshahrani, A. S., Alharbi, S., & Koucheryavy, A. (2020). Advanced Deep Learning-Based Computational Offloading for Multilevel Vehicular Edge-Cloud Computing Networks. IEEE Access, 8, 137052-137062. doi:10.1109/access.2020.3011705Satyanarayanan, M. (2017). The Emergence of Edge Computing. Computer, 50(1), 30-39. doi:10.1109/mc.2017.9Capra, M., Peloso, R., Masera, G., Roch, M. R., & Martina, M. (2019). Edge Computing: A Survey On the Hardware Requirements in the Internet of Things World. Future Internet, 11(4), 100. doi:10.3390/fi11040100Lu, H., Gu, C., Luo, F., Ding, W., & Liu, X. (2020). Optimization of lightweight task offloading strategy for mobile edge computing based on deep reinforcement learning. Future Generation Computer Systems, 102, 847-861. doi:10.1016/j.future.2019.07.019Mimmack, G. M., Mason, S. J., & Galpin, J. S. (2001). Choice of Distance Matrices in Cluster Analysis: Defining Regions. Journal of Climate, 14(12), 2790-2797. doi:10.1175/1520-0442(2001)0142.0.co;2Gimenez, C. (2006). Logistics integration processes in the food industry. International Journal of Physical Distribution & Logistics Management, 36(3), 231-249. doi:10.1108/09600030610661813Chang, P.-C., Liu, C.-H., & Fan, C.-Y. (2009). Data clustering and fuzzy neural network for sales forecasting: A case study in printed circuit board industry. Knowledge-Based Systems, 22(5), 344-355. doi:10.1016/j.knosys.2009.02.005Zheng, B., Yoon, S. W., & Lam, S. S. (2014). Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Systems with Applications, 41(4), 1476-1482. doi:10.1016/j.eswa.2013.08.044Woodley, A., Tang, L.-X., Geva, S., Nayak, R., & Chappell, T. (2019). Parallel K-Tree: A multicore, multinode solution to extreme clustering. Future Generation Computer Systems, 99, 333-345. doi:10.1016/j.future.2018.09.038Kwedlo, W., & Czochanski, P. J. (2019). A Hybrid MPI/OpenMP Parallelization of KK -Means Algorithms Accelerated Using the Triangle Inequality. IEEE Access, 7, 42280-42297. doi:10.1109/access.2019.2907885Liu, B., He, S., He, D., Zhang, Y., & Guizani, M. (2019). A Spark-Based Parallel Fuzzy cc -Means Segmentation Algorithm for Agricultural Image Big Data. IEEE Access, 7, 42169-42180. doi:10.1109/access.2019.2907573Baydoun, M., Ghaziri, H., & Al-Husseini, M. (2018). CPU and GPU parallelized kernel K-means. The Journal of Supercomputing, 74(8), 3975-3998. doi:10.1007/s11227-018-2405-7Li, Y., Zhao, K., Chu, X., & Liu, J. (2013). Speeding up k-Means algorithm by GPUs. Journal of Computer and System Sciences, 79(2), 216-229. doi:10.1016/j.jcss.2012.05.004Cuomo, S., De Angelis, V., Farina, G., Marcellino, L., & Toraldo, G. (2019). A GPU-accelerated parallel K-means algorithm. Computers & Electrical Engineering, 75, 262-274. doi:10.1016/j.compeleceng.2017.12.002Al-Ayyoub, M., Abu-Dalo, A. M., Jararweh, Y., Jarrah, M., & Sa’d, M. A. (2015). A GPU-based implementations of the fuzzy C-means algorithms for medical image segmentation. The Journal of Supercomputing, 71(8), 3149-3162. doi:10.1007/s11227-015-1431-yAit Ali, N., Cherradi, B., El Abbassi, A., Bouattane, O., & Youssfi, M. (2018). GPU fuzzy c-means algorithm implementations: performance analysis on medical image segmentation. Multimedia Tools and Applications, 77(16), 21221-21243. doi:10.1007/s11042-017-5589-6Timón, I., Soto, J., Pérez-Sánchez, H., & Cecilia, J. M. (2016). Parallel implementation of fuzzy minimals clustering algorithm. Expert Systems with Applications, 48, 35-41. doi:10.1016/j.eswa.2015.11.011Cebrian, J. M., Imbernón, B., Soto, J., García, J. M., & Cecilia, J. M. (2020). High-throughput fuzzy clustering on heterogeneous architectures. Future Generation Computer Systems, 106, 401-411. doi:10.1016/j.future.2020.01.022Cecilia, J. M., Timon, I., Soto, J., Santa, J., Pereniguez, F., & Munoz, A. (2018). High-Throughput Infrastructure for Advanced ITS Services: A Case Study on Air Pollution Monitoring. IEEE Transactions on Intelligent Transportation Systems, 19(7), 2246-2257. doi:10.1109/tits.2018.2816741Sriramakrishnan, P., Kalaiselvi, T., & Rajeswaran, R. (2019). Modified local ternary patterns technique for brain tumour segmentation and volume estimation from MRI multi-sequence scans with GPU CUDA machine. Biocybernetics and Biomedical Engineering, 39(2), 470-487. doi:10.1016/j.bbe.2019.02.002Fang, Y., Chen, Q., & Xiong, N. (2019). A multi-factor monitoring fault tolerance model based on a GPU cluster for big data processing. Information Sciences, 496, 300-316. doi:10.1016/j.ins.2018.04.053Rodriguez, M. Z., Comin, C. H., Casanova, D., Bruno, O. M., Amancio, D. R., Costa, L. da F., & Rodrigues, F. A. (2019). Clustering algorithms: A comparative approach. PLOS ONE, 14(1), e0210236. doi:10.1371/journal.pone.0210236Pandove, D., Goel, S., & Rani, R. (2018). Systematic Review of Clustering High-Dimensional and Large Datasets. ACM Transactions on Knowledge Discovery from Data, 12(2), 1-68. doi:10.1145/3132088Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203. doi:10.1016/0098-3004(84)90020-7Soto, J., Flores-Sintas, A., & Palarea-Albaladejo, J. (2008). Improving probabilities in a fuzzy clustering partition. Fuzzy Sets and Systems, 159(4), 406-421. doi:10.1016/j.fss.2007.08.016Kolen, J. F., & Hutcheson, T. (2002). Reducing the time complexity of the fuzzy c-means algorithm. IEEE Transactions on Fuzzy Systems, 10(2), 263-267. doi:10.1109/91.99512

    Recent Developments in Document Clustering

    Get PDF
    This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed

    Dynamic non-linear system modelling using wavelet-based soft computing techniques

    Get PDF
    The enormous number of complex systems results in the necessity of high-level and cost-efficient modelling structures for the operators and system designers. Model-based approaches offer a very challenging way to integrate a priori knowledge into the procedure. Soft computing based models in particular, can successfully be applied in cases of highly nonlinear problems. A further reason for dealing with so called soft computational model based techniques is that in real-world cases, many times only partial, uncertain and/or inaccurate data is available. Wavelet-Based soft computing techniques are considered, as one of the latest trends in system identification/modelling. This thesis provides a comprehensive synopsis of the main wavelet-based approaches to model the non-linear dynamical systems in real world problems in conjunction with possible twists and novelties aiming for more accurate and less complex modelling structure. Initially, an on-line structure and parameter design has been considered in an adaptive Neuro- Fuzzy (NF) scheme. The problem of redundant membership functions and consequently fuzzy rules is circumvented by applying an adaptive structure. The growth of a special type of Fungus (Monascus ruber van Tieghem) is examined against several other approaches for further justification of the proposed methodology. By extending the line of research, two Morlet Wavelet Neural Network (WNN) structures have been introduced. Increasing the accuracy and decreasing the computational cost are both the primary targets of proposed novelties. Modifying the synoptic weights by replacing them with Linear Combination Weights (LCW) and also imposing a Hybrid Learning Algorithm (HLA) comprising of Gradient Descent (GD) and Recursive Least Square (RLS), are the tools utilised for the above challenges. These two models differ from the point of view of structure while they share the same HLA scheme. The second approach contains an additional Multiplication layer, plus its hidden layer contains several sub-WNNs for each input dimension. The practical superiority of these extensions is demonstrated by simulation and experimental results on real non-linear dynamic system; Listeria Monocytogenes survival curves in Ultra-High Temperature (UHT) whole milk, and consolidated with comprehensive comparison with other suggested schemes. At the next stage, the extended clustering-based fuzzy version of the proposed WNN schemes, is presented as the ultimate structure in this thesis. The proposed Fuzzy Wavelet Neural network (FWNN) benefitted from Gaussian Mixture Models (GMMs) clustering feature, updated by a modified Expectation-Maximization (EM) algorithm. One of the main aims of this thesis is to illustrate how the GMM-EM scheme could be used not only for detecting useful knowledge from the data by building accurate regression, but also for the identification of complex systems. The structure of FWNN is based on the basis of fuzzy rules including wavelet functions in the consequent parts of rules. In order to improve the function approximation accuracy and general capability of the FWNN system, an efficient hybrid learning approach is used to adjust the parameters of dilation, translation, weights, and membership. Extended Kalman Filter (EKF) is employed for wavelet parameters adjustment together with Weighted Least Square (WLS) which is dedicated for the Linear Combination Weights fine-tuning. The results of a real-world application of Short Time Load Forecasting (STLF) further re-enforced the plausibility of the above technique

    Development of Neurofuzzy Architectures for Electricity Price Forecasting

    Get PDF
    In 20th century, many countries have liberalized their electricity market. This power markets liberalization has directed generation companies as well as wholesale buyers to undertake a greater intense risk exposure compared to the old centralized framework. In this framework, electricity price prediction has become crucial for any market player in their decision‐making process as well as strategic planning. In this study, a prototype asymmetric‐based neuro‐fuzzy network (AGFINN) architecture has been implemented for short‐term electricity prices forecasting for ISO New England market. AGFINN framework has been designed through two different defuzzification schemes. Fuzzy clustering has been explored as an initial step for defining the fuzzy rules while an asymmetric Gaussian membership function has been utilized in the fuzzification part of the model. Results related to the minimum and maximum electricity prices for ISO New England, emphasize the superiority of the proposed model over well‐established learning‐based models

    Online Multi-Stage Deep Architectures for Feature Extraction and Object Recognition

    Get PDF
    Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. Large datasets with high-dimensional features complicate the implementation of visual architectures in memory constrained environments. This dissertation constructs online learning replacements for the components within a multi-stage architecture and demonstrates that the proposed replacements (namely fuzzy competitive clustering, an incremental covariance estimator, and multi-layer neural network) can offer performance competitive with their offline batch counterparts while providing a reduced memory footprint. The online nature of this solution allows for the development of a method for adjusting parameters within the architecture via stochastic gradient descent. Testing over multiple datasets shows the potential benefits of this methodology when appropriate priors on the initial parameters are unknown. Alternatives to batch based decompositions for a whitening preprocessing stage which take advantage of natural image statistics and allow simple dictionary learners to work well in the problem domain are also explored. Expansions of the architecture using additional pooling statistics and multiple layers are presented and indicate that larger codebook sizes are not the only step forward to higher classification accuracies. Experimental results from these expansions further indicate the important role of sparsity and appropriate encodings within multi-stage visual feature extraction architectures
    corecore