6 research outputs found

    Data Mining Revision Controlled Document History Metadata for Automatic Classification

    Get PDF
    Version controlled documents provide a complete history of the changes to the document, including everything from what was changed to who made the change and much more. Through the use of cluster analysis and several sets of manipulated data, this research examines the revision history of Wikipedia in an attempt to find language-independent patterns that could assist in automatic page classification software. Utilizing two sample data sets and applying the aforementioned cluster analysis, no conclusive evidence was found that would indicate that such patterns exist. Our work on the software, however, does provide a foundation for more possible types of data manipulation and refined clustering algorithms to be used for further research into finding such patterns

    Towards the automatic evaluation of stylistic quality of natural texts: constructing a special-­purpose corpus of stylistic edits from the Wikipedia revision history

    Get PDF
    This thesis proposes an approach to automatic evaluation of the stylistic quality of natural texts through data-driven methods of Natural Language Processing. Advantages of data driven methods and their dependency on the size of training data are discussed. Also the advantages of using Wikipedia as a source for textual data mining are presented. The method in this project crucially involves a program for quick automatic extraction of sentences edited by users from the Wikipedia Revision History. The resulting edits have been compiled in a large-scale corpus of examples of stylistic editing. The complete modular structure of the extraction program is described and its performance is analyzed. Furthermore, the need to separate stylistic edits stylistic edits from factual ones is discussed and a number of Machine Learning classification algorithms for this task are proposed and tested. The program developed in this project was able to process approximately 10% of the whole Russian Wikipedia Revision history (200 gigabytes of textual data) in one month, resulting in the extraction of more than two millions of user edits. The best algorithm for the classification of edits into factual and stylistic ones achieved 86.2% cross-validation accuracy, which is comparable with state-of-the-art performance of similar models described in published papers.Master i Datalingvistikk og sprÄkteknologiMAHF-DASPDASP35

    The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

    Get PDF

    Autopoietic-extended architecture: can buildings think?

    Get PDF
    To incorporate bioremedial functions into the performance of buildings and to balance generative architecture's dominant focus on computational programming and digital fabrication, this thesis first hybridizes theories of autopoiesis into extended cognition in order to research biological domains that include synthetic biology and biocomputation. Under the rubric of living technology I survey multidisciplinary fields to gather perspective for student design of bioremedial and/or metabolic components in generative architecture where generative not only denotes the use of computation but also includes biochemical, biomechanical, and metabolic functions. I trace computation and digital simulations back to Alan Turing's early 1950s Morphogenetic drawings, reaction-diffusion algorithms, and pioneering artificial intelligence (AI) in order to establish generative architecture's point of origin. I ask provocatively: Can buildings think? as a question echoing Turing's own "Can machines think?" Thereafter, I anticipate not only future bioperformative materials but also theories capable of underpinning strains of metabolic intelligences made possible via AI, synthetic biology, and living technology. I do not imply that metabolic architectural intelligence will be like human cognition. I suggest, rather, that new research and pedagogies involving the intelligence of bacteria, plants, synthetic biology, and algorithms define approaches that generative architecture should take in order to source new forms of autonomous life that will be deployable as corrective environmental interfaces. I call the research protocol autopoietic-extended design, theorizing it as an operating system (OS), a research methodology, and an app schematic for design studios and distance learning that makes use of in-field, e-, and m-learning technologies. A quest of this complexity requires scaffolding for coordinating theory-driven teaching with practice-oriented learning. Accordingly, I fuse Maturana and Varela's biological autopoiesis and its definitions of minimal biological life with Andy Clark's hypothesis of extended cognition and its cognition-to-environment linkages. I articulate a generative design strategy and student research method explained via architectural history interpreted from Louis Sullivan's 1924 pedagogical drawing system, Le Corbusier's Modernist pronouncements, and Greg Lynn's Animate Form. Thus, autopoietic-extended design organizes thinking about the generation of ideas for design prior to computational production and fabrication, necessitating a fresh relationship between nature/science/technology and design cognition. To systematize such a program requires the avoidance of simple binaries (mind/body, mind/nature) as well as the stationing of tool making, technology, and architecture within the ream of nature. Hence, I argue, in relation to extended phenotypes, plant-neurobiology, and recent genetic research: Consequently, autopoietic-extended design advances design protocols grounded in morphology, anatomy, cognition, biology, and technology in order to appropriate metabolic and intelligent properties for sensory/response duty in buildings. At m-learning levels smartphones, social media, and design apps source data from nature for students to mediate on-site research by extending 3D pedagogical reach into new university design programs. I intend the creation of a dialectical investigation of animal/human architecture and computational history augmented by theory relevant to current algorithmic design and fablab production. The autopoietic-extended design dialectic sets out ways to articulate opposition/differences outside the Cartesian either/or philosophy in order to prototype metabolic architecture, while dialectically maintaining: Buildings can think
    corecore