31 research outputs found

    Evaluation of Clustering Algorithms on HPC Platforms

    Full text link
    [EN] Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson-Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.This work has been partially supported by the Spanish Ministry of Science and Innovation, under the Ramon y Cajal Program (Grant No. RYC2018-025580-I) and by the Spanish "Agencia Estatal de Investigacion" under grant PID2020-112827GB-I00 /AEI/ 10.13039/501100011033, and under grants RTI2018-096384-B-I00, RTC-2017-6389-5 and RTC2019-007159-5, by the Fundacion Seneca del Centro de Coordinacion de la Investigacion de la Region de Murcia under Project 20813/PI/18, and by the "Conselleria de Educacion, Investigacion, Cultura y Deporte, Direccio General de Ciencia i Investigacio, Proyectos AICO/2020", Spain, under Grant AICO/2020/302.Cebrian, JM.; Imbernón, B.; Soto, J.; Cecilia-Canales, JM. (2021). Evaluation of Clustering Algorithms on HPC Platforms. Mathematics. 9(17):1-20. https://doi.org/10.3390/math917215612091

    High-throughput fuzzy clustering on heterogeneous architectures

    Full text link
    [EN] The Internet of Things (IoT) is pushing the next economic revolution in which the main players are data and immediacy. IoT is increasingly producing large amounts of data that are now classified as "dark data'' because most are created but never analyzed. The efficient analysis of this data deluge is becoming mandatory in order to transform it into meaningful information. Among the techniques available for this purpose, clustering techniques, which classify different patterns into groups, have proven to be very useful for obtaining knowledge from the data. However, clustering algorithms are computationally hard, especially when it comes to large data sets and, therefore, they require the most powerful computing platforms on the market. In this paper, we investigate coarse and fine grain parallelization strategies in Intel and Nvidia architectures of fuzzy minimals (FM) algorithm; a fuzzy clustering technique that has shown very good results in the literature. We provide an in-depth performance analysis of the FM's main bottlenecks, reporting a speed-up factor of up to 40x compared to the sequential counterpart version.This work was partially supported by the Fundacion Seneca del Centro de Coordinacion de la Investigacion de la Region de Murcia under Project 20813/PI/18, and by Spanish Ministry of Science, Innovation and Universities under grants TIN2016-78799-P (AEI/FEDER, UE), RTI2018-096384-B-I00, RTI2018-098156-B-C53 and RTC-2017-6389-5.Cebrian, JM.; Imbernón, B.; Soto, J.; García, JM.; Cecilia-Canales, JM. (2020). High-throughput fuzzy clustering on heterogeneous architectures. Future Generation Computer Systems. 106:401-411. https://doi.org/10.1016/j.future.2020.01.022S401411106Waldrop, M. M. (2016). The chips are down for Moore’s law. Nature, 530(7589), 144-147. doi:10.1038/530144aCecilia, J. M., Timon, I., Soto, J., Santa, J., Pereniguez, F., & Munoz, A. (2018). High-Throughput Infrastructure for Advanced ITS Services: A Case Study on Air Pollution Monitoring. IEEE Transactions on Intelligent Transportation Systems, 19(7), 2246-2257. doi:10.1109/tits.2018.2816741Singh, D., & Reddy, C. K. (2014). A survey on platforms for big data analytics. Journal of Big Data, 2(1). doi:10.1186/s40537-014-0008-6Stephens, N., Biles, S., Boettcher, M., Eapen, J., Eyole, M., Gabrielli, G., … Walker, P. (2017). The ARM Scalable Vector Extension. IEEE Micro, 37(2), 26-39. doi:10.1109/mm.2017.35Wright, S. A. (2019). Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems. Future Generation Computer Systems, 92, 900-902. doi:10.1016/j.future.2018.11.020Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering. ACM Computing Surveys, 31(3), 264-323. doi:10.1145/331499.331504Lee, J., Hong, B., Jung, S., & Chang, V. (2018). Clustering learning model of CCTV image pattern for producing road hazard meteorological information. Future Generation Computer Systems, 86, 1338-1350. doi:10.1016/j.future.2018.03.022Pérez-Garrido, A., Girón-Rodríguez, F., Bueno-Crespo, A., Soto, J., Pérez-Sánchez, H., & Helguera, A. M. (2017). Fuzzy clustering as rational partition method for QSAR. Chemometrics and Intelligent Laboratory Systems, 166, 1-6. doi:10.1016/j.chemolab.2017.04.006H.S. Nagesh, S. Goil, A. Choudhary, A scalable parallel subspace clustering algorithm for massive data sets, in: Proceedings 2000 International Conference on Parallel Processing, 2000, pp. 477–484.Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203. doi:10.1016/0098-3004(84)90020-7Havens, T. C., Bezdek, J. C., Leckie, C., Hall, L. O., & Palaniswami, M. (2012). Fuzzy c-Means Algorithms for Very Large Data. IEEE Transactions on Fuzzy Systems, 20(6), 1130-1146. doi:10.1109/tfuzz.2012.2201485Flores-Sintas, A., Cadenas, J., & Martin, F. (1998). A local geometrical properties application to fuzzy clustering. Fuzzy Sets and Systems, 100(1-3), 245-256. doi:10.1016/s0165-0114(97)00038-9Soto, J., Flores-Sintas, A., & Palarea-Albaladejo, J. (2008). Improving probabilities in a fuzzy clustering partition. Fuzzy Sets and Systems, 159(4), 406-421. doi:10.1016/j.fss.2007.08.016Timón, I., Soto, J., Pérez-Sánchez, H., & Cecilia, J. M. (2016). Parallel implementation of fuzzy minimals clustering algorithm. Expert Systems with Applications, 48, 35-41. doi:10.1016/j.eswa.2015.11.011Flores-Sintas, A., M. Cadenas, J., & Martin, F. (2001). Detecting homogeneous groups in clustering using the Euclidean distance. Fuzzy Sets and Systems, 120(2), 213-225. doi:10.1016/s0165-0114(99)00110-4Wang, H., Potluri, S., Luo, M., Singh, A. K., Sur, S., & Panda, D. K. (2011). MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Computer Science - Research and Development, 26(3-4), 257-266. doi:10.1007/s00450-011-0171-3Kaltofen, E., & Villard, G. (2005). On the complexity of computing determinants. computational complexity, 13(3-4), 91-130. doi:10.1007/s00037-004-0185-3Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241-254. doi:10.1007/bf02289588Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O. P., Tiwari, A., … Lin, C.-T. (2017). A review of clustering techniques and developments. Neurocomputing, 267, 664-681. doi:10.1016/j.neucom.2017.06.053Woodley, A., Tang, L.-X., Geva, S., Nayak, R., & Chappell, T. (2019). Parallel K-Tree: A multicore, multinode solution to extreme clustering. Future Generation Computer Systems, 99, 333-345. doi:10.1016/j.future.2018.09.038Kwedlo, W., & Czochanski, P. J. (2019). A Hybrid MPI/OpenMP Parallelization of KK -Means Algorithms Accelerated Using the Triangle Inequality. IEEE Access, 7, 42280-42297. doi:10.1109/access.2019.2907885Li, Y., Zhao, K., Chu, X., & Liu, J. (2013). Speeding up k-Means algorithm by GPUs. Journal of Computer and System Sciences, 79(2), 216-229. doi:10.1016/j.jcss.2012.05.004Saveetha, V., & Sophia, S. (2018). Optimal Tabu K-Means Clustering Using Massively Parallel Architecture. Journal of Circuits, Systems and Computers, 27(13), 1850199. doi:10.1142/s0218126618501992Djenouri, Y., Djenouri, D., Belhadi, A., & Cano, A. (2019). Exploiting GPU and cluster parallelism in single scan frequent itemset mining. Information Sciences, 496, 363-377. doi:10.1016/j.ins.2018.07.020Krawczyk, B. (2016). GPU-Accelerated Extreme Learning Machines for Imbalanced Data Streams with Concept Drift. Procedia Computer Science, 80, 1692-1701. doi:10.1016/j.procs.2016.05.509Fang, Y., Chen, Q., & Xiong, N. (2019). A multi-factor monitoring fault tolerance model based on a GPU cluster for big data processing. Information Sciences, 496, 300-316. doi:10.1016/j.ins.2018.04.053Tanweer, S., & Rao, N. (2019). Novel Algorithm of CPU-GPU hybrid system for health care data classification. Journal of Drug Delivery and Therapeutics, 9(1-s), 355-357. doi:10.22270/jddt.v9i1-s.244

    Slanted Stixels: A way to represent steep streets

    Get PDF
    This work presents and evaluates a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced in order to significantly reduce the computational complexity of the Stixel algorithm, and then achieve real-time computation capabilities. The idea is to first perform an over-segmentation of the image, discarding the unlikely Stixel cuts, and apply the algorithm only on the remaining Stixel cuts. This work presents a novel over-segmentation strategy based on a Fully Convolutional Network (FCN), which outperforms an approach based on using local extrema of the disparity map. We evaluate the proposed methods in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset.Comment: Journal preprint (published in IJCV 2019: https://link.springer.com/article/10.1007/s11263-019-01226-9). arXiv admin note: text overlap with arXiv:1707.0539

    Severe cardiac and abdominal manifestations without lung involvement in a child With COVID-19

    Get PDF
    Coronavirus disease 2019 (COVID-19) has become a worldwide pandemic, affecting humans of all ages. Clinical features of the pediatric population have been published, but there is not yet enough information to make a definitive description. Fever is typical, as it is respiratory symptom. Rarely are the infection and complications severe, and, when they are, it is almost always in a patient with another underlying disease. However, some otherwise healthy children with COVID-19 do suffer critical organ injury, such as acute myocarditis, heart failure and gastrointestinal inflammation. The mechanism of these organ damages remains unclear. An otherwise normally healthy 13-year-old male was admitted to the pediatric intensive care unit with acute abdomen pain, possible myocarditis and a suspected diagnosis of COVID-19. Noteworthy basal findings were ventricular extrasystoles in the electrocardiogram (EKG) and moderate left ventricular systolic dysfunction. Chest X-ray was normal. Blood tests revealed altered levels of inflammation factors (C-reactive protein (CRP), D-dimer, fibrinogen, interleukin 6 (IL-6)), lymphopenia and elevated cardiac enzymes. The first test for polymerase chain reaction (PCR) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was negative. The patient’s condition worsened, and he entered cardiogenic shock (hypotension, tachycardia and oliguria). He was vomiting continuously, which made pain control difficult; imaging of his abdomen was undertaken. There was no response to fluid resuscitation, and so milrinone and epinephrine were administered. Empiric treatment began with azithromycin, foscarnet, carnitine and immunoglobulins. Hydroxychloroquine was given before the results of repeated SARSCoV-2 and serology tests were available. Tocilizumab was administered once COVID-19 had been confirmed and massive inflammation had been observed. Progressively the clinical situation and the levels of the parameters studied improved. The patient was discharged 8 days after admission. Most children with SARS-CoV-2 infection are asymptomatic or present only mild symptoms. However, physicians should be aware of atypical and severe manifestations that may occur in the hyperinflammatory phase of the illness

    Quantifying Social Influence in an Online Cultural Market

    Get PDF
    We revisit experimental data from an online cultural market in which 14,000 users interact to download songs, and develop a simple model that can explain seemingly complex outcomes. Our results suggest that individual behavior is characterized by a two-step process–the decision to sample and the decision to download a song. Contrary to conventional wisdom, social influence is material to the first step only. The model also identifies the role of placement in mediating social signals, and suggests that in this market with anonymous feedback cues, social influence serves an informational rather than normative role

    Quantifying and addressing the prevalence and bias of study designs in the environmental and social sciences

    Get PDF
    Building trust in science and evidence-based decision-making depends heavily on the credibility of studies and their findings. Researchers employ many different study designs that vary in their risk of bias to evaluate the true effect of interventions or impacts. Here, we empirically quantify, on a large scale, the prevalence of different study designs and the magnitude of bias in their estimates. Randomised designs and controlled observational designs with pre-intervention sampling were used by just 23% of intervention studies in biodiversity conservation, and 36% of intervention studies in social science. We demonstrate, through pairwise within-study comparisons across 49 environmental datasets, that these types of designs usually give less biased estimates than simpler observational designs. We propose a model-based approach to combine study estimates that may suffer from different levels of study design bias, discuss the implications for evidence synthesis, and how to facilitate the use of more credible study designs.Fil: Christie, Alec P.. University of Cambridge; Reino UnidoFil: Abecasis, David. Universidad de Algarve. Centro de Ciencias del Mar; PortugalFil: Adjeroud, Mehdi. Université de Perpignan; Francia. Institut de Recherche Pour Le Developpement; FranciaFil: Alonso, Juan Carlos. Consejo Superior de Investigaciones Científicas. Museo Nacional de Ciencias Naturales; EspañaFil: Amano, Tatsuya. University of Queensland; AustraliaFil: Anton, Alvaro. Universidad del País Vasco. Facultad de Educación de Bilbao; EspañaFil: Baldigo, Barry P.. United States Geological Survey; Estados UnidosFil: Barrientos, Rafael. Universidad Complutense de Madrid; EspañaFil: Bicknell, Jake E.. University of Kent; Reino UnidoFil: Buhl, Deborah A.. United States Geological Survey; Estados UnidosFil: Cebrian, Just. Mississippi State University; Estados UnidosFil: Ceia, Ricardo S.. Universidad de Coimbra; PortugalFil: Cibils Martina, Luciana. Universidad Nacional de Río Cuarto. Facultad de Ciencias Exactas, Fisicoquímicas y Naturales. Departamento de Ciencias Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaFil: Clarke, Sarah. Marine Institute; IrlandaFil: Claudet, Joachim. Universite de Paris; Francia. Centre National de la Recherche Scientifique; FranciaFil: Craig, Michael D.. University of Western Australia; Australia. Murdoch University; AustraliaFil: Davoult, Dominique. Sorbonne University; FranciaFil: De Backer, Annelies. Flanders Research Institute for Agriculture, Fisheries and Food; BélgicaFil: Donovan, Mary K.. University of California; Estados Unidos. University of Hawaii at Manoa; Estados UnidosFil: Eddy, Tyler D.. University of South Carolina; Estados Unidos. Memorial University of Newfoundland; Canadá. Victoria University of Wellington; Nueva ZelandaFil: França, Filipe M.. Lancaster University; Reino UnidoFil: Gardner, Jonathan P. A.. Victoria University of Wellington; Nueva ZelandaFil: Harris, Bradley P.. Alaska Pacific University; Estados UnidosFil: Huusko, Ari. Natural Resources Institute Finland; FinlandiaFil: Jones, Ian L.. Memorial University of Newfoundland; CanadáFil: Kelaher, Brendan P.. Southern Cross University; AustraliaFil: Kotiaho, Janne S.. Universidad de Jyvaskyla; FinlandiaFil: López Baucells, Adrià. Universidad de Lisboa; Portugal. Smithsonian Tropical Research Institute; Panamá. Universidad Nacional de Colombia. Instituto de Investigaciones Amazonicas; Colombia. Museo de Ciencias Naturales de Granollers; EspañaFil: Major, Heather L.. University of New Brunswick; CanadáFil: Mäki Petäys, Aki. Voimalohi Oy; Finlandia. University of Oulu; Finlandi

    Cut-offs and response criteria for the Hospital Universitario la Princesa Index (HUPI) and their comparison to widely-used indices of disease activity in rheumatoid arthritis

    Get PDF
    Objective To estimate cut-off points and to establish response criteria for the Hospital Universitario La Princesa Index (HUPI) in patients with chronic polyarthritis. Methods Two cohorts, one of early arthritis (Princesa Early Arthritis Register Longitudinal PEARL] study) and other of long-term rheumatoid arthritis (Estudio de la Morbilidad y Expresión Clínica de la Artritis Reumatoide EMECAR]) including altogether 1200 patients were used to determine cut-off values for remission, and for low, moderate and high activity through receiver operating curve (ROC) analysis. The areas under ROC (AUC) were compared to those of validated indexes (SDAI, CDAI, DAS28). ROC analysis was also applied to establish minimal and relevant clinical improvement for HUPI. Results The best cut-off points for HUPI are 2, 5 and 9, classifying RA activity as remission if =2, low disease activity if >2 and =5), moderate if >5 and <9 and high if =9. HUPI''s AUC to discriminate between low-moderate activity was 0.909 and between moderate-high activity 0.887. DAS28''s AUCs were 0.887 and 0.846, respectively; both indices had higher accuracy than SDAI (AUCs: 0.832 and 0.756) and CDAI (AUCs: 0.789 and 0.728). HUPI discriminates remission better than DAS28-ESR in early arthritis, but similarly to SDAI. The HUPI cut-off for minimal clinical improvement was established at 2 and for relevant clinical improvement at 4. Response criteria were established based on these cut-off values. Conclusions The cut-offs proposed for HUPI perform adequately in patients with either early or long term arthritis

    Quantifying and addressing the prevalence and bias of study designs in the environmental and social sciences

    Get PDF
    Abstract: Building trust in science and evidence-based decision-making depends heavily on the credibility of studies and their findings. Researchers employ many different study designs that vary in their risk of bias to evaluate the true effect of interventions or impacts. Here, we empirically quantify, on a large scale, the prevalence of different study designs and the magnitude of bias in their estimates. Randomised designs and controlled observational designs with pre-intervention sampling were used by just 23% of intervention studies in biodiversity conservation, and 36% of intervention studies in social science. We demonstrate, through pairwise within-study comparisons across 49 environmental datasets, that these types of designs usually give less biased estimates than simpler observational designs. We propose a model-based approach to combine study estimates that may suffer from different levels of study design bias, discuss the implications for evidence synthesis, and how to facilitate the use of more credible study designs
    corecore