101 research outputs found

    Network theory and CAD collections

    Get PDF
    Graph and network theory have become commonplace in modern life. So widespread in fact that most people not only understand the basics of what a network is, but are adept at using them and do so daily. This has not long been the case however and the relatively quick growth and uptake of network technology has sparked the interest of many scientists and researchers. The Science of Networks has sprung up, showing how networks are useful in connecting molecules and particles, computers and web pages, as well as people. Despite being shown to be effective in many areas, network theory has yet to be applied to mechanical engineering design. This work makes use of network science advances and explores how they can impact Computer Aided Design (CAD) data. CAD data is considered the most valuable design data within mechanical engineering and two places large collections are found are educational institutes and industry. This work begins by exploring 5 novel networks of different sized CAD collections, where metrics and network developments are assessed. From there collections from educational and industrial settings are explored in depth, with novel methods and visualisations being presented. The results of this investigation show that network science provides interesting analysis of CAD collections and two key discoveries are presented: network metrics and visualisations are shown to be effective at highlighting plagiarism in collections of students' CAD submissions. Also when used to assess collections of real world company data, network theory is shown to provide unique metrics for analysis and characterising collections of CAD and associated data

    Statistical Methods in Metabolomics

    Get PDF
    Metabolomics lies at the fulcrum of the system biology ‘omics’. Metabolic profiling offers researchers new insight into genetic and environmental interactions, responses to pathophysi- ological stimuli and novel biomarker discovery. Metabolomics lacks the simplicity of a single data capturing technique; instead, increasingly sophisticated multivariate statistical techniques are required to tease out useful metabolic features from various complex datasets. In this work, two major metabolomics methods are examined: Nuclear Magnetic Resonance (NMR) Spec- troscopy and Liquid Chromatography-Mass Spectrometry (LC-MS). MetAssimulo, an 1H-NMR metabolic-profile simulator, was developed in part by this author and is described in the Chap- ter 2. Peak positional variation is a phenomenon occurring in NMR spectra that complicates metabolomic analysis so Chapter 3 focuses on modelling the effect of pH on peak position. Analysis of LC-MS data is somewhat more complex given its 2-D structure, so I review existing pre-processing and feature detection techniques in Chapter 4 and then attempt to tackle the issue from a Bayesian viewpoint. A Bayesian Partition Model is developed to distinguish chro- matographic peaks representing useful features from chemical and instrumental interference and noise. Another of the LC-MS pre-processing problems, data binning, is also explored as part of H-MS: a pre-processing algorithm incorporating wavelet smoothing and novel Gaussian and Exponentially Modified Gaussian peak detection. The performance of H-MS is compared alongside two existing pre-processing packages: apLC-MS and XCMS.Open Acces

    Tagungsband zum 21. Kolloquium Programmiersprachen und Grundlagen der Programmierung

    Get PDF
    Das 21. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS 2021) setzt eine traditionelle Reihe von Arbeitstagungen fort, die 1980 von den Forschungsgruppen der Professoren Friedrich L. Bauer (TU München), Klaus Indermark (RWTH Aachen) und Hans Langmaack(CAU Kiel) ins Leben gerufen wurde.Die Veranstaltung ist ein offenes Forum für alle interessierten deutschsprachigen Wissenschaftlerinnen und Wissenschaftler zum zwanglosen Austausch neuer Ideen und Ergebnisse aus den Forschungsbereichen Entwurf und Implementierung von Programmiersprachen sowie Grundlagen und Methodik des Programmierens. Dieser Tagungsband enthält die wissenschaftlichen Beiträge,die bei dem 21. Kolloquium dieser Tagungsreihe präsentiert wurden, welches vom 27. bis 29. September 2021 in Kiel stattfand und von der Arbeitsgruppe Programmiersprachen und Übersetzerkonstruktion der Christian-Albrechts-Universität zu Kiel organisiert wurde

    A modular architecture for systematic text categorisation

    Get PDF
    This work examines and attempts to overcome issues caused by the lack of formal standardisation when defining text categorisation techniques and detailing how they might be appropriately integrated with each other. Despite text categorisation’s long history the concept of automation is relatively new, coinciding with the evolution of computing technology and subsequent increase in quantity and availability of electronic textual data. Nevertheless insufficient descriptions of the diverse algorithms discovered have lead to an acknowledged ambiguity when trying to accurately replicate methods, which has made reliable comparative evaluations impossible. Existing interpretations of general data mining and text categorisation methodologies are analysed in the first half of the thesis and common elements are extracted to create a distinct set of significant stages. Their possible interactions are logically determined and a unique universal architecture is generated that encapsulates all complexities and highlights the critical components. A variety of text related algorithms are also comprehensively surveyed and grouped according to which stage they belong in order to demonstrate how they can be mapped. The second part reviews several open-source data mining applications, placing an emphasis on their ability to handle the proposed architecture, potential for expansion and text processing capabilities. Finding these inflexible and too elaborate to be readily adapted, designs for a novel framework are introduced that focus on rapid prototyping through lightweight customisations and reusable atomic components. Being a consequence of inadequacies with existing options, a rudimentary implementation is realised along with a selection of text categorisation modules. Finally a series of experiments are conducted that validate the feasibility of the outlined methodology and importance of its composition, whilst also establishing the practicality of the framework for research purposes. The simplicity of experiments and results gathered clearly indicate the potential benefits that can be gained when a formalised approach is utilised

    Data quality issues in electronic health records for large-scale databases

    Get PDF
    Data Quality (DQ) in Electronic Health Records (EHRs) is one of the core functions that play a decisive role to improve the healthcare service quality. The DQ issues in EHRs are a noticeable trend to improve the introduction of an adaptive framework for interoperability and standards in Large-Scale Databases (LSDB) management systems. Therefore, large data communications are challenging in the traditional approaches to satisfy the needs of the consumers, as data is often not capture directly into the Database Management Systems (DBMS) in a seasonably enough fashion to enable their subsequent uses. In addition, large data plays a vital role in containing plenty of treasures for all the fields in the DBMS. EHRs technology provides portfolio management systems that allow HealthCare Organisations (HCOs) to deliver a higher quality of care to their patients than that which is possible with paper-based records. EHRs are in high demand for HCOs to run their daily services as increasing numbers of huge datasets occur every day. Efficient EHR systems reduce the data redundancy as well as the system application failure and increase the possibility to draw all necessary reports. However, one of the main challenges in developing efficient EHR systems is the inherent difficulty to coherently manage data from diverse heterogeneous sources. It is practically challenging to integrate diverse data into a global schema, which satisfies the need of users. The efficient management of EHR systems using an existing DBMS present challenges because of incompatibility and sometimes inconsistency of data structures. As a result, no common methodological approach is currently in existence to effectively solve every data integration problem. The challenges of the DQ issue raised the need to find an efficient way to integrate large EHRs from diverse heterogeneous sources. To handle and align a large dataset efficiently, the hybrid algorithm method with the logical combination of Fuzzy-Ontology along with a large-scale EHRs analysis platform has shown the results in term of improved accuracy. This study investigated and addressed the raised DQ issues to interventions to overcome these barriers and challenges, including the provision of EHRs as they pertain to DQ and has combined features to search, extract, filter, clean and integrate data to ensure that users can coherently create new consistent data sets. The study researched the design of a hybrid method based on Fuzzy-Ontology with performed mathematical simulations based on the Markov Chain Probability Model. The similarity measurement based on dynamic Hungarian algorithm was followed by the Design Science Research (DSR) methodology, which will increase the quality of service over HCOs in adaptive frameworks

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Encoding Through Procedure: Unexpected Meaning in Serious Game Design

    Get PDF
    Encoding Through Procedure explores the creation and transmission of ideas through game design. My primary argument is that rules have an agency that complicates current models of procedural rhetoric. In Chapter 1, drawing from Stuart Hall’s Encoding/Decoding model, I prepare a methodological foundation to demonstrate the unique possibilities and difficulties that game rules offer as a communicative medium. Using Kim Sawchuck and Owen Chapman’s work on research-creation, I deploy a game-design-based method of research. The last methodological step explores Bruno Latour’s Actor Network-Theory both as a method of design and critique. In Chapter 2, I present a literature review of serious games and gamification. Here the field produces avenues for exploration, given the inconsistent positions it holds on serious games. In addressing these, I argue for the benefits of distinguishing gamification from serious games. Chapter 3 explores an additional set of literature interested in emergence and algorithmic representation. The argument here focuses on a lacuna in the field’s conception of procedural rhetoric. I agree with pre-existing literature, that emergent results can lead to convincing arguments. That said, there is no method to date for considering how designers might produce a work which reliably creates emergent results. Instead, I argue the field focuses on post-hoc readings of games successfully communicating authorial ideas. In Chapter 4, to address these concerns, I present my own design practices. I offer three examples of serious games I completed during my doctoral work. These demonstrate the various forces which alter the process of communicating across games. Each provides distinctly different moments of my own practice conflicting with the agency of my games’ rules

    The building and application of a semantic platform for an e-research society

    No full text
    This thesis reviews the area of e-Research (the use of electronic infrastructure to support research) and considers how the insight gained from the development of social networking sites in the early 21st century might assist researchers in using this infrastructure. In particular it examines the myExperiment project, a website for e-Research that allows users to upload, share and annotate work flows and associated files, using a social networking framework. This Virtual Organisation (VO) supports many of the attributes required to allow a community of users to come together to build an e-Research society. The main focus of the thesis is how the emerging society that is developing out of my-Experiment could use Semantic Web technologies to provide users with a significantly richer representation of their research and research processes to better support reproducible research. One of the initial major contributions was building an ontology for myExperiment. Through this it became possible to build an API for generating and delivering this richer representation and an interface for querying it. Having this richer representation it has been possible to follow Linked Data principles to link up with other projects that have this type of representation. Doing this has allowed additional data to be provided to the user and has begun to set in context the data produced by myExperiment. The way that the myExperiment project has gone about this task and consideration of how changes may affect existing users, is another major contribution of this thesis. Adding a semantic representation to an emergent e-Research society like myExperiment,has given it the potential to provide additional applications. In particular the capability to support Research Objects, an encapsulation of a scientist's research or research process to support reproducibility. The insight gained by adding a semantic representation to myExperiment, has allowed this thesis to contribute towards the design of the architecture for these Research Objects that use similar Semantic Web technologies. The myExperiment ontology has been designed such that it can be aligned with other ontologies. Scientific Discourse, the collaborative argumentation of different claims and hypotheses, with the support of evidence from experiments, to construct, confirm or disprove theories requires the capability to represent experiments carried out in silico. This thesis discusses how, as part of the HCLS Scientific Discourse subtask group, the myExperiment ontology has begun to be aligned with other scientific discourse ontologies to provide this capability. It also compares this alignment of ontologies with the architecture for Research Objects. This thesis has also examines how myExperiment's Linked Data and that of other projects can be used in the design of novel interfaces. As a theoretical exercise, it considers how this Linked Data might be used to support a Question-Answering system, that would allow users to query myExperiment's data in a more efficient and user-friendly way. It concludes by reviewing all the steps undertaken to provide a semantic platform for an emergent e-Research society to facilitate the sharing of research and its processes to support reproducible research. It assesses their contribution to enhancing the features provided by myExperiment, as well as e-Research as a whole. It considers how the contributions provided by this thesis could be extended to produce additional tools that will allow researchers to make greater use of the rich data that is now available, in a way that enhances their research process rather than significantly changing it or adding extra workload

    ENGLISH WORD-MAKING

    Get PDF
    English Word-Making presents the content and methods of modern research in morphology in the form of a textbook for secondary school English students. The opening section offers a rationale for the uses of morphology at the secondary level. The emergence of English as a subject in the curriculum is traced historically; and the study of morphology is related specifically to humanistic goals and to the enhancing of skills in language analysis, speaking, reading, vocabulary growth, grammar and usage study, spelling, composition, and literary interpretation. The main body of the text consists of ten chapters, each exploring, diachronically and synchronically, a primary category of English word-formation: compounding, reduplication, derivation, conversion, clipping, back formation, acronyming, blending, and eponyming. Each chapter includes exercises that require students to apply what they have learned about the English language. At the end of each chapter are extensive Notes that reinforce and expand the concepts presented in the main text. Appendix 1 is an exposition of English spelling through a cataloguing of various phoneme-grapheme correspondences. Appendix 2 is an attempt to apply to the slang lexicon of St. Paul\u27s School (vintage 1978) the principals of morphological analysis that are treated throughout the manuscript
    corecore