'Centre for Evaluation in Education and Science (CEON/CEES)'
Doi
Abstract
Introduction/purpose: The utilization of machine learning methods has
become indispensable in analyzing large-scale, complex data in
contemporary data-driven environments, with a diverse range of
applications from optimizing business operations to advancing scientific
research. Despite the potential for insight and innovation presented by
these voluminous datasets, they pose significant challenges in areas such
as data quality and structure, necessitating the implementation of effective
management strategies. Machine learning techniques have emerged as
essential tools in identifying and mitigating these challenges and developing
viable solutions to address them. The MNIST dataset represents a
prominent example of a widely-used dataset in this field, renowned for its
expansive collection of handwritten numerical digits, and frequently
employed in tasks such as classification and analysis, as demonstrated in
the present study.
Methods: This study employed the MNIST dataset to investigate various
statistical techniques, including the Principal Components Analysis (PCA)
algorithm implemented using the Python programming language.
Additionally, Support Vector Machine (SVM) models were applied to both
linear and non-linear classification problems to assess the accuracy of the
model.
Results: The results of the present study indicate that while the PCA
technique is effective for dimensionality reduction, it may not be as effective
for visualization purposes. Moreover, the findings demonstrate that both linear and non-linear SVM models were capable of effectively classifying
the dataset.
Conclusion: The findings of the study demonstrate that SVM can serve as
an efficacious technique for addressing classification problems