7,741 research outputs found

    Searching for Ground Truth: a stepping stone in automating genre classification

    Get PDF
    This paper examines genre classification of documents and its role in enabling the effective automated management of digital documents by digital libraries and other repositories. We have previously presented genre classification as a valuable step toward achieving automated extraction of descriptive metadata for digital material. Here, we present results from experiments using human labellers, conducted to assist in genre characterisation and the prediction of obstacles which need to be overcome by an automated system, and to contribute to the process of creating a solid testbed corpus for extending automated genre classification and testing metadata extraction tools across genres. We also describe the performance of two classifiers based on image and stylistic modeling features in labelling the data resulting from the agreement of three human labellers across fifteen genre classes.

    Feature Type Analysis in Automated Genre Classification

    Get PDF
    In this paper, we compare classifiers based on language model, image, and stylistic features for automated genre classification. The majority of previous studies in genre classification have created models based on an amalgamated representation of a document using a multitude of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. By independently modeling and comparing classifiers based on features belonging to three types, describing visual, stylistic, and topical properties, we demonstrate that different genres have distinctive feature strengths.

    Building a document genre corpus: a profile of the KRYS I corpus

    Get PDF
    This paper describes the KRYS I corpus, consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a nontopical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains.

    VALIDATING EFFECTIVE RESUME BASED ON EMPLOYER’S INTEREST WITH RECOMMENDATION SYSTEM

    Get PDF
    In current technological world, recruitment process of corporate has evolved to the greater extent. Both the candidates and the recruiters prefer resumes to be submitted as an e-document. Validating those resumes manually is not much flexible and effective and time saving. The team requires more man power to scrutinize the resumes of the candidates. The aim of our work is to help the recruiters to find the most appropriate resume that match all their requirements. The system allows the recruiter to post his/her requirement as query, and the system will recommend the relevant resume by calculating the similarity between the query and the resume using Vector Space Model (VSM)

    Building a Document Genre Corpus: a Profile of the KRYS I Corpus

    Get PDF
    This paper describes the KRYS I corpus (http://www.krys-corpus.eu/Info.html), consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains

    VALIDATING EFFECTIVE RESUME BASED ON EMPLOYER’S INTEREST WITH RECOMMENDATION SYSTEM

    Get PDF
    In current technological world, recruitment process of corporate has evolved to the greater extent. Both the candidates and the recruiters prefer resumes to be submitted as an e-document. Validating those resumes manually is not much flexible and effective and time saving. The team requires more man power to scrutinize the resumes of the candidates. The aim of our work is to help the recruiters to find the most appropriate resume that match all their requirements. The system allows the recruiter to post his/her requirement as query, and the system will recommend the relevant resume by calculating the similarity between the query and the resume using Vector Space Model (VSM)

    A comparison of forensic evidence recovery techniques for a windows mobile smart phone

    Get PDF
    <p>Acquisition, decoding and presentation of information from mobile devices is complex and challenging. Device memory is usually integrated into the device, making isolation prior to recovery difficult. In addition, manufacturers have adopted a variety of file systems and formats complicating decoding and presentation.</p> <p>A variety of tools and methods have been developed (both commercially and in the open source community) to assist mobile forensics investigators. However, it is unclear to what extent these tools can present a complete view of the information held on a mobile device, or the extent the results produced by different tools are consistent.</p> <p>This paper investigates what information held on a Windows Mobile smart phone can be recovered using several different approaches to acquisition and decoding. The paper demonstrates that no one technique recovers all information of potential forensic interest from a Windows Mobile device; and that in some cases the information recovered is conflicting.</p&gt
    corecore