31 research outputs found

    A coprocessor design for the architectural support of non-numeric operations

    Get PDF
    Computer Science is concerned with the electronic manipulation of information. Continually increasing amounts of computer time are being expended on information that is not numeric. This is represented in part by modem computing requirements such as the block moves associated with context switching and virtual memory management, peripheral device communication, compilers, editors, word processors, databases, and text retrieval. This dissertation examines the traditional support of non-numeric information from a software, firmware, and hardware perspective and presents a coprocessor design to improve the performance of a set of non-numeric operations. Simple micro-coding of operations can provide a degree of performance improvement through parallel execution of instructions and control store access speeds. New special purpose parallel hardware algorithms can yield complexity improvements. This dissertation presents a parallel hardware regular expression searching algorithm which requires linear time and quadratic space compared to software uniprocessor algorithms which require exponential time and space. A very large scale integration (VLSD implementation of a version of this algorithm was designed, fabricated, and tested. The hardware. searching algorithm is then combined with other special purpose hardware to implement a set of operations. Simulation is then used to quantify the performance improvement of the operations when compared to software solutions. A coprocessor approach allows the optional addition of hardware to accelerate a set of operations. This is appropriate from a complex instruction set computer (CISC) perspective since hardware acceleration is being utilized. It is also appropriate from a reduced instruction set computer (RISC) perspective since the operations are distributed away from the central processing unit (CPU)

    Stylistic atructures: a computational approach to text classification

    Get PDF
    The problem of authorship attribution has received attention both in the academic world (e.g. did Shakespeare or Marlowe write Edward III?) and outside (e.g. is this confession really the words of the accused or was it made up by someone else?). Previous studies by statisticians and literary scholars have sought "verbal habits" that characterize particular authors consistently. By and large, this has meant looking for distinctive rates of usage of specific marker words -- as in the classic study by Mosteller and Wallace of the Federalist Papers. The present study is based on the premiss that authorship attribution is just one type of text classification and that advances in this area can be made by applying and adapting techniques from the field of machine learning. Five different trainable text-classification systems are described, which differ from current stylometric practice in a number of ways, in particular by using a wider variety of marker patterns than customary and by seeking such markers automatically, without being told what to look for. A comparison of the strengths and weaknesses of these systems, when tested on a representative range of text-classification problems, confirms the importance of paying more attention than usual to alternative methods of representing distinctive differences between types of text. The thesis concludes with suggestions on how to make further progress towards the goal of a fully automatic, trainable text-classification system

    Colloquium capita datastructuren

    Get PDF

    Development of a Web-Based Service to Transcribe Between Multiple Orthographies of the Iu Mien Language

    Get PDF
    Thesis (M.S.)--Indiana University South Bend, 2011.The goal of this study was to explore the use of machine learning techniques in the development of a web-based application that transcribes between multiple orthographies of the same language. To this end, source text files used in the publishing of the Iu Mien Bible translation in 4 scripts were merged into a single textbase that served as a text corpus for this study. All syllables in the corpus were combined into a list of parallel renderings which were subjected to ID3 and neural networks with the back propagation in an attempt to achieve machine learning of transcription between the different Iu Mien orthographies. The most effective set of neural net transcription rules were captured and incorporated into a web-based service where visitors could submit text in one writing system and receive a webpage containing the corresponding text rendered in the other writing systems of this language. Transcriptions that are in excess of 90% correct were achieved between a Roman script and another Roman script or between a non-Roman script and another non-Roman script. Transcriptions between a Roman script and a non-Roman yield output that were only 50% correct. This system is still being tested and improved by linguists and volunteers from various organizations associated with the target community within Thailand, Laos, Vietnam and the USA. This study demonstrates the potential of this approach for developing written materials in languages with multiple scripts. This study also provides useful insights on how this technology might be improved
    corecore