6,909 research outputs found

    A Review of Codebook Models in Patch-Based Visual Object Recognition

    No full text
    The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

    Web Page Multiclass Classification

    Get PDF
    As the internet age evolves, the volume of content hosted on the Web is rapidly expanding. With this ever-expanding content, the capability to accurately categorize web pages is a current challenge to serve many use cases. This paper proposes a variation in the approach to text preprocessing pipeline whereby noun phrase extraction is performed first followed by lemmatization, contraction expansion, removing special characters, removing extra white space, lower casing, and removal of stop words. The first step of noun phrase extraction is aimed at reducing the set of terms to those that best describe what the web pages are about to improve the categorization capabilities of the model. Separately, a text preprocessing using keyword extraction is evaluated. In addition to the text preprocessing techniques mentioned, feature reduction techniques are applied to optimize model performance. Several modeling techniques are examined using these two approaches and are compared to a baseline model. The baseline model is a Support Vector Machine with linear kernel and is based on text preprocessing and feature reduction techniques that do not include noun phrase extraction or keyword extraction and uses stemming rather than lemmatization. The recommended SVM One-Versus-One model based on noun phrase extraction and lemmatization during text preprocessing shows an accuracy improvement over the baseline model of nearly 1% and a 5-fold reduction in misclassification of web pages as undesirable categories
    • …
    corecore