4 research outputs found
Parallel-FST: aceleración de algoritmos de selección de características mediante computación paralela
[Resumo]
Na actualidade estase a producir un auxe da produción e consumo de grandes cantidades
de información (big data), que deben procesarse e prepararse para o seu posterior uso. Entre
as ferramentas que se utilizan para analizar estes datos atópanse as de aprendizaxe máquina
(machine learning), o que constitúe outro campo de investigación que gañou importancia nos
últimos anos. A pesar dos seus bos resultados, as técnicas de aprendizaxe automática contan
cun custo computacional alto, que se incrementa notablemente ao aumentar a cantidade de
datos a procesar. Para reducir a dimensionalidade destes datos, existen algoritmos de selección
de características que, a través de modelos matemáticos, son capaces de eliminar información
redundante e innecesaria. Porén, a selección de características tamén é un proceso custoso,
pero que pode acelerarse adaptando os algoritmos e técnicas xa existentes para o seu uso en
sistemas de computación paralela (coñecidos como HPC polas súas siglas en inglés).
Ao longo dos últimos anos xurdiron moitos traballos de investigación centrados no desenvolvemento
de diferentes métodos de selección de características, cada un aplicando uns criterios
de cara á devandita selección. Polo xeral, estes criterios deben tentar maximizar a relevancia
das características seleccionadas e minimizar a redundancia entre as mesmas, de forma
que o subconxunto escollido represente da mellor forma posible ao dataset orixinal. Tamén
existen estudos que traballan con varios destes métodos para atopar o grao de conformidade
entre os mesmos, para buscar similitudes a nivel de estrutura ou con intención de determinar
cal presenta un mellor comportamento en termos de precisión, estabilidade e flexibilidade ante
datasets de certas propiedades. Para este tipo de estudos moitas veces é necesario o desenvolvemento
de librarías que conteñan os métodos de selección de características a estudar, de
forma que se poidan comparar os resultados. Este é o caso de FEAST, unha libraría que conta
con oito métodos de selección de características baseada en información mutua.
Neste Traballo Fin de Grao desenvolveuse unha optimización de FEAST con técnicas paralelas,
adaptando os seus métodos para que poidan ser executados e aproveiten as vantaxes
dos sistemas HPC. As paralelizacións implementadas desenvolvéronse aplicando unha distribución
da carga de traballo entre elementos de procesado. Dado que os sistemas HPC adoitan
ser sistemas multinodo con nodos multinúcleo, esta nova versión aproveita as posibilidades
que achegan ambos cunha aproximación híbrida baseada en MPI e tecnoloxías multifío. A
estratexia aplicada en ambos niveis foi a descomposición de dominio, i.e. a distribución dos
datos cos que traballa o programa para que cada elemento de procesado realice os cálculos
sobre un anaco diferente. Deste xeito conseguiuse, por unha parte, reducir o tempo de cómputo;
e por outra, posibilitar a análise de datasets de gran tamaño que exceden as limitacións de
memoria dos sistemas habituais.
As probas de rendemento realizáronse nun clúster de 16 nodos, con 64GB de memoria e 16
núcleos por nodo (256 núcleos en total). Os resultados obtidos foron moi satisfactorios, xa que
se acadaron unhas aceleracións de ata 229x para catro datasets representativos. A maiores,
conseguiuse executar cada algoritmo cun dataset de 512GB de tamaño, o que non sería posible
nun único nodo.[Abstract]
Currently, there is a boom in the production and consumption of large amounts of information
(big data), which must be processed and prepared for later use. Machine learning
techniques are among the tools used to analyze this data. Therefore, it is another field of
research that has gained importance in recent years. Despite their good results, machine
learning techniques have a high computational cost, which is significantly increased as the
amount of data to be processed grows. To reduce the dimensionality of this data, there are feature
selection algorithms able to remove redundant and unnecessary information with the use
of mathematical models. However, feature selection is also an expensive process, but it can
be accelerated by adapting existing algorithms and techniques to be run in high performance
computing systems (HPC).
In recent years, many research projects have been focused on the development of different
methods for feature selection, which apply some specific criteria to this selection. Usually,
these criteria should try to maximize the relevance of the selected features and minimize the
redundancy between them, so that the chosen subset represents the original data set in the
best possible way. There are also studies that take into account several of these methods to
find the degree of conformity between them, to look for similarities at the structure level or to
determine which one performs best in terms of precision, stability and flexibility when applied
to data sets of certain properties. For this kind of research, the development of libraries with
several feature selection methods to be studied is often necessary in order to compare their
results. This is the case of FEAST, a library that presents eight feature selection methods based
on mutual information.
In this work a parallelization of the FEAST library has been developed, adapting its methods
so that they can be executed and take advantage of HPC systems. The implemented parallelizations
were developed by applying a workload distribution among processing elements.
Since HPC systems are often multinode systems with multicore nodes, this new version takes
advantage of the possibilities that both offer with a hybrid approach based on MPI and multithreading
technologies. The strategy applied at both levels was the domain decomposition,
that is, the distribution of the data used in the program, so that each processing element performs
the calculations on a different part. This way, it was possible, on the one hand, to reduce
execution times; and, on the other hand, to allow the analysis of large data sets that exceed
memory limitations of common systems.
Performance tests were carried out on a 16-node cluster with 64GB of memory and 16
cores per node (256 total cores). The obtained results are very satisfactory, since accelerations
of up to 229x were achieved for four representative data sets. In addition, every algorithm
was able to analyze a 512GB dataset, which would not have been possible on a single node.Traballo fin de grao (UDC.FIC). Enxeñaría informática. Curso 2019/202
Parallel-FST: A feature selection library for multicore clusters
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract]: Feature selection is a subfield of machine learning focused on reducing the dimensionality of datasets by performing a computationally intensive process. This work presents Parallel-FST, a publicly available parallel library for feature selection that includes seven methods which follow a hybrid MPI/multithreaded approach to reduce their runtime when executed on high performance computing systems. Performance tests were carried out on a 256-core cluster, where Parallel-FST obtained speedups of up to 229x for representative datasets and it was able to analyze a 512 GB dataset, which was not previously possible with a sequential counterpart library due to memory constraints.This research was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00/AEI/10.13039/ 501100011033), by the Ministry of Universities of Spain under grant FPU20/00997, and by Xunta de Galicia and FEDER funds of the EU (CITIC, Centro de Investigación de Galicia accreditation 2019-2022, ref. ED431G 2019/01; Consolidation Program of Competitive Reference Groups, ED431C 2021/30).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
Acceleration of a Feature Selection Algorithm Using High Performance Computing
[Abstract]
Feature selection is a subfield of data analysis that is on reducing the dimensionality of datasets, so that subsequent analyses over them can be performed in affordable execution times while keeping the same results. Joint Mutual Information (JMI) is a highly used feature selection method that removes irrelevant and redundant characteristics. Nevertheless, it has high computational complexity. In this work, we present a multithreaded MPI parallel implementation of JMI to accelerate its execution on distributed memory systems, reaching speedups of up to 198.60 when running on 256 cores, and allowing for the analysis of very large datasets that do not fit in the main memory of a single node.This research was funded by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00/AEI/10.13039/501100011033), and by Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019-2022, ref. ED431G2019/01; Consolidation Program of Competitive Reference Groups, ED431C 2017/04)Xunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C 2017/0
CUDA acceleration of MI-based feature selection methods
[Abstract]: Feature selection algorithms are necessary nowadays for machine learning as they are capable of removing irrelevant and redundant information to reduce the dimensionality of the data and improve the quality of subsequent analyses. The problem with current feature selection approaches is that they are computationally expensive when processing large datasets. This work presents parallel implementations for Nvidia GPUs of three highly-used feature selection methods based on the Mutual Information (MI) metric: mRMR, JMI and DISR. Publicly available code includes not only CUDA implementations of the general methods, but also an adaptation of them to work with low-precision fixed point in order to further increase their performance on GPUs. The experimental evaluation was carried out on two modern Nvidia GPUs (Turing T4 and Ampere A100) with highly satisfactory results, achieving speedups of up to 283x when compared to state-of-the-art C implementations.This work was supported by grants PID2019-104184RB-I00, PID-
2019-109238GB-C22, TED2021-130599A-I00 and PID2022-136435NB-
I00, funded by MCIN/AEI/ 10.13039/501100011033 (TED2021 also
funded by “NextGenerationEU”/PRTR and PID2022 by “ERDF A way of
making Europe”, EU). Grant TSI-100925-2023-1, funded by Ministry
for Digital Transformation and Civil Service. FPU predoctoral grant
of Bieito Beceiro ref. FPU20/00997, funded by the Ministry of Sci-
ence, Innovation and Universities. We gratefully thank the Galician
Supercomputing Center (CESGA) for the access granted to its super-
computing resources. Funding for open access charge: Universidade da
Coruña/CISUG.Financiado para publicación en acceso aberto: Universidade da Coruña/CISU