2 research outputs found

    Sparse Matrices and Summa Matrix Multiplication Algorithm in STAPL Matrix Framework

    Get PDF
    Applications of matrices are found in most scientiļ¬c ļ¬elds, such as physics, computer graphics, numerical analysis, etc. The high applicability of matrix algorithms and representations make them an important component in any parallel programming language, therefore matrix frameworks are a continuous research eļ¬€ort in high performance computing. This work focuses on a generic matrix framework in the STAPL library. First, we extend the STAPL library by adding a sparse matrix container. Second we implement SUMMA, the parallel matrix-multiplication algorithm, for ļ¬ne grained computations. Then, implement parallel matrix-matrix algorithms for the sparse matrix container. Finally, we conduct experimental studies for each of the components we have implemented and discuss the ļ¬ndings. Experiments are conducted on a Cray XE6m cluster. Experimental studies consist of multiple matrix and data inputs that showcase and stress the matrix models implemented. We ļ¬nd that the sparse matrix container outperforms its dense counterpart in sparse in-puts, and vice versa. Both containers, and the matrix summa implementation show scalability up to 512 cores

    Classification of HTML Documents

    Get PDF
    Text Classification is the task of mapping a document into one or more classes based on the presence or absence of words (or features) in the document. It is intensively being studied and different classification techniques and algorithms have been developed. This thesis focuses on classification of online documents that has become more critical with the development of World Wide Web. The WWW vastly increases the availability of on-line documents in digital format and has highlighted the need to classify them. From this background, we have noted the emergence of ā€œautomatic Web Classificationā€. These mainly concentrate on classifying HTML-like documents into classes or categories by not only using the methods that are inherited from the traditional Text Classification process, but also utilizing the extra information provided only by Web pages. Our work is based on the fact that, Web documents, contain not only ordinary features (words) but also extra information, such as meta-data and hyperlinks that can be used to advantage the classification process. The aim of this research is to study various ways of using the extra information, in particularly, hyperlink information provided by HTML-documents (Web pages). The merit of the approach, developed in this thesis, is its simplicity, compared with existing approaches. We present different approaches of using hyperlink information to improve the effectiveness of web classification. Unlike other work in this area, we will only use the mappings between linked documents and their own class or classes. In this case, we only need to add a few features called linked-class features into the datasets, and then apply classifiers on them for classification. In the numerical experiments we adopted two wellknown Text Classification algorithms, Support Vector Machines and BoosTexter. The results obtained show that classification accuracy can be improved by using mixtures of ordinary and linked-class features. Moreover, out-links usually work better than in-links in classification. We also analyse and discuss the reasons behind this improvement.Master of Computin
    corecore