Search CORE

6 research outputs found

Data Mining Revision Controlled Document History Metadata for Automatic Classification

Author: Maass Dustin
Publication venue: UWM Digital Commons
Publication date: 01/12/2013
Field of study

Version controlled documents provide a complete history of the changes to the document, including everything from what was changed to who made the change and much more. Through the use of cluster analysis and several sets of manipulated data, this research examines the revision history of Wikipedia in an attempt to find language-independent patterns that could assist in automatic page classification software. Utilizing two sample data sets and applying the aforementioned cluster analysis, no conclusive evidence was found that would indicate that such patterns exist. Our work on the software, however, does provide a foundation for more possible types of data manipulation and refined clustering algorithms to be used for further research into finding such patterns

University of Wisconsin-Milwaukee

Towards the automatic evaluation of stylistic quality of natural texts: constructing a special-purpose corpus of stylistic edits from the Wikipedia revision history

Author: Kotlyarov Alexandr
Publication venue: The University of Bergen
Publication date: 01/01/2016
Field of study

This thesis proposes an approach to automatic evaluation of the stylistic quality of natural texts through data-driven methods of Natural Language Processing. Advantages of data driven methods and their dependency on the size of training data are discussed. Also the advantages of using Wikipedia as a source for textual data mining are presented. The method in this project crucially involves a program for quick automatic extraction of sentences edited by users from the Wikipedia Revision History. The resulting edits have been compiled in a large-scale corpus of examples of stylistic editing. The complete modular structure of the extraction program is described and its performance is analyzed. Furthermore, the need to separate stylistic edits stylistic edits from factual ones is discussed and a number of Machine Learning classification algorithms for this task are proposed and tested. The program developed in this project was able to process approximately 10% of the whole Russian Wikipedia Revision history (200 gigabytes of textual data) in one month, resulting in the extraction of more than two millions of user edits. The best algorithm for the classification of edits into factual and stylistic ones achieved 86.2% cross-validation accuracy, which is comparable with state-of-the-art performance of similar models described in published papers.Master i Datalingvistikk og språkteknologiMAHF-DASPDASP35

University of Bergen

NORA - Norwegian Open Research Archives

Recommended from our members

The classification of gene products in the molecular biology domain: Realism, objectivity, and the limitations of the Gene Ontology

Author: Mayor Charlie
Publication venue
Publication date
Field of study

Background: Controlled vocabularies in the molecular biology domain exist to facilitate data integration across database resources. One such tool is the Gene Ontology (GO), a classification designed to act as a universal index for gene products from any species. The Gene Ontology is used extensively in annotating gene products and analysing gene expression data, yet very little research exists from a library and information science perspective exploring the design principles, philosophy and social role of ontologies in biology. Aim: To explore how molecular biologists, in creating the Gene Ontology, devised guidelines and rules for determining which scientific concepts are included in the ontology, and the criteria for how these concepts are represented. Methods: A domain analysis approach was used to devise a mixed methodology to study the design of the Gene Ontology. Concept analysis of a GO term and a critical discourse analysis of GO developer mailing list texts were used to test whether ontological realism is a tenable basis for constructing objective ontologies. A comparison of the current GO vocabulary construction guidelines and a study of the reasons why GO terms are removed from the ontology further explored the justifications for the design of the Gene Ontology. Finally, a content analysis of published GO papers examined how authors use and cite GO data and terminology. Results: Gene Ontology terms can be presented according to different epistemologies for concepts, indicating that ontological realism is not the only way objective ontologies can be designed. Social roles and the exercise of power were found to play an important role in determining ontology content, and poor synonym control, a lack of clear warrant for deciding terminology and arbitrary decisions to delete and invent new terms undermine the objectivity and universal applicability of the Gene Ontology. Authors exhibited poor compliance with GO data citation policies, and in re-wording and misquoting GO terminology, risk exacerbating the semantic problems this controlled vocabulary was designed to solve. Conclusions: The failure of the Gene Ontology to define what is meant by a molecular function, the exercise of power by GO developers in clearing contentious concepts from the ontology, and the strict adherence to ontological realism, which marginalises social and subjective ways of classifying scientific concepts, limits the utility of the ontology as a tool to unify the molecular biology domain. These limitations to the Gene Ontology design could be overcome with the development of lighter, pluralistic, user-controlled ‘open ontologies’ for gene products that can work alongside more traditional, ‘top-down’ developed vocabularies

City Research Online

The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

Author
Publication venue: Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb
Publication date: 01/11/2009
Field of study

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora 2023), 14–15 September 2023, University of Mannheim, Germany

Author
Publication venue: Institut für Deutsche Sprache (IDS)
Publication date: 01/01/2023
Field of study

MAnnheim DOCument Server

Autopoietic-extended architecture: can buildings think?

Author: Dollens Dennis Lindsey
Publication venue: The University of Edinburgh
Publication date: 27/06/2015
Field of study

To incorporate bioremedial functions into the performance of buildings and to balance generative architecture's dominant focus on computational programming and digital fabrication, this thesis first hybridizes theories of autopoiesis into extended cognition in order to research biological domains that include synthetic biology and biocomputation. Under the rubric of living technology I survey multidisciplinary fields to gather perspective for student design of bioremedial and/or metabolic components in generative architecture where generative not only denotes the use of computation but also includes biochemical, biomechanical, and metabolic functions. I trace computation and digital simulations back to Alan Turing's early 1950s Morphogenetic drawings, reaction-diffusion algorithms, and pioneering artificial intelligence (AI) in order to establish generative architecture's point of origin. I ask provocatively: Can buildings think? as a question echoing Turing's own "Can machines think?" Thereafter, I anticipate not only future bioperformative materials but also theories capable of underpinning strains of metabolic intelligences made possible via AI, synthetic biology, and living technology. I do not imply that metabolic architectural intelligence will be like human cognition. I suggest, rather, that new research and pedagogies involving the intelligence of bacteria, plants, synthetic biology, and algorithms define approaches that generative architecture should take in order to source new forms of autonomous life that will be deployable as corrective environmental interfaces. I call the research protocol autopoietic-extended design, theorizing it as an operating system (OS), a research methodology, and an app schematic for design studios and distance learning that makes use of in-field, e-, and m-learning technologies. A quest of this complexity requires scaffolding for coordinating theory-driven teaching with practice-oriented learning. Accordingly, I fuse Maturana and Varela's biological autopoiesis and its definitions of minimal biological life with Andy Clark's hypothesis of extended cognition and its cognition-to-environment linkages. I articulate a generative design strategy and student research method explained via architectural history interpreted from Louis Sullivan's 1924 pedagogical drawing system, Le Corbusier's Modernist pronouncements, and Greg Lynn's Animate Form. Thus, autopoietic-extended design organizes thinking about the generation of ideas for design prior to computational production and fabrication, necessitating a fresh relationship between nature/science/technology and design cognition. To systematize such a program requires the avoidance of simple binaries (mind/body, mind/nature) as well as the stationing of tool making, technology, and architecture within the ream of nature. Hence, I argue, in relation to extended phenotypes, plant-neurobiology, and recent genetic research: Consequently, autopoietic-extended design advances design protocols grounded in morphology, anatomy, cognition, biology, and technology in order to appropriate metabolic and intelligent properties for sensory/response duty in buildings. At m-learning levels smartphones, social media, and design apps source data from nature for students to mediate on-site research by extending 3D pedagogical reach into new university design programs. I intend the creation of a dialectical investigation of animal/human architecture and computational history augmented by theory relevant to current algorithmic design and fablab production. The autopoietic-extended design dialectic sets out ways to articulate opposition/differences outside the Cartesian either/or philosophy in order to prototype metabolic architecture, while dialectically maintaining: Buildings can think

Edinburgh Research Archive