Recently, with the explosive growth of digital technologies, there has been a rapid proliferation of the size of image collection. The technique of supervised image clas sification has been widely applied in many domains in order to organize, search, and retrieve images. However, the traditional feature extraction approaches yield the poor classification accuracy. Therefore, the Bag-of-visual-words model, inspired by Bag-of Words model in document classification, was used to present images with the local descriptors for image classification, and also it performs well in some fields. This research provides the empirical evidence to prove that the BoVW model outperforms the traditional feature extraction approaches for both binary image clas sification and multi-class image classification. Furthermore, the research reveals that the size of the visual vocabulary during the process of building BoVW model impact on the accuracy results of image classification