15 research outputs found

    The construction of a linguistic linked data framework for bilingual lexicographic resources

    Get PDF
    Little-known lexicographic resources can be of tremendous value to users once digitised. By extending the digitisation efforts for a lexicographic resource, converting the human readable digital object to a state that is also machine-readable, structured data can be created that is semantically interoperable, thereby enabling the lexicographic resource to access, and be accessed by, other semantically interoperable resources. The purpose of this study is to formulate a process when converting a lexicographic resource in print form to a machine-readable bilingual lexicographic resource applying linguistic linked data principles, using the English-Xhosa Dictionary for Nurses as a case study. This is accomplished by creating a linked data framework, in which data are expressed in the form of RDF triples and URIs, in a manner which allows for extensibility to a multilingual resource. Click languages with characters not typically represented by the Roman alphabet are also considered. The purpose of this linked data framework is to define each lexical entry as “historically dynamic”, instead of “ontologically static” (Rafferty, 2016:5). For a framework which has instances in constant evolution, focus is thus given to the management of provenance and linked data generation thereof. The output is an implementation framework which provides methodological guidelines for similar language resources in the interdisciplinary field of Library and Information Science

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    The mixed experience of achieving business benefit from the internet : a multi-disciplinary study

    Get PDF
    From 1995 the Internet attracted commercial investment, but financially measurable benefits and competitive advantage proved elusive. Usage for personal communication and business information only slowly translated into commercial transactions. This reflects a unique feature of Internet development. Unlike other media of the 19th and 20th centuries, widespread Internet use preceded commercial investment. The early military and research use led to an architecture that poorly supported the certainty and security requirements of commercial transactions. Subsequent attempts to align this architecture with commercial transactional requirements were expensive and mostly unsuccessful. This multi-disciplinary thesis describes these commercial factors from historical, usage, technical, regulatory and commercial perspectives. It provides a new and balanced understanding in a subject area dominated by poor communication between separate perspectives

    Enterprise Pharo a Web Perspective

    Get PDF
    International audienceEnterprise Pharo is the third volume of the series, following Pharo by Example and Deep into Pharo. It covers enterprise libraries and frameworks, and in particular those useful for doing web development.The book is structured in five parts.The first part talks about simple web applications, starting with a minimal web application in chapter 1 on Teapot and then a tutorial on building a more complete web application in chapter 2.Part two of the book deals with HTTP support in Pharo, talking about charac- ter encoding in chapter 3, about using Pharo as an HTTP Client (in chapter 4) and server (in chapter 5), and about using WebSockets (in chapter 6).In the third part we discuss the handling of data for the application. Firstly we treat data that is in the form of comma-separated values (CSV) in chapter 7. Secondly and thirdly, we treat JSON (in chapter 8) and its Smalltalk coun- terpart STON (in chapter 9). Fourthly, serialization and deserialization of object graphs with Fuel is treated in chapter 10. Lastly, we discuss the Voyage persistence framework and persisting to MongoDB databases in chapter 11. Part four of the book deals with the presentation layer. Chapter 12 shows how to use Mustache templates in Pharo, and chapter 13 talks about program- matic generation of CSS files. The documentation of applications could be written in Pillar, which is presented in chapter 14. How to generate .pdf files from the application with Artefact is shown in chapter 15.The fifth part of the book deals with deploying the web application. This is explained in chapter 16 that talks not only about how to build and run the application, but also other important topics like monitoring.This book is a collective work The editors have curated and reformatted the following chapters from blog posts and tutorials written by many people. Here is the complete list of contributors to the book, in alphabetical order:ïżŒïżŒOlivier AuverlotSven Van Caekenberghe Damien Cassou Gabriel Cotelli Christophe Demarey MartĂ­n Dias StĂ©phane Ducasse Luc Fabresse Johan Fabry Cyril Ferlicot Delbecque Norbert Hartl Guillaume LarchevĂȘque Max Leske Esteban Lorenzano Attila Magyar Mariano Martinez-Peck Damien Polle

    Securing mobile code.

    Full text link

    Genome-scale transcriptomic and epigenomic analysis of stem cells

    Get PDF
    Embryonic stem cells (ESCs) are a special type of cell marked by two key properties: The capacity to create an unlimited number of identical copies of themselves (self-renewal) and the ability to give rise to differentiated progeny that can contribute to all tissues of the adult body (pluripotency). Decades of past research have identified many of the genetic determinants of the state of these cells, such as the transcription factors Pou5f1, Sox2 and Nanog. Many other transcription factors and, more recently, epigenetic determinants like histone modifications, have been implicated in the establishment, maintenance and loss of pluripotent stem cell identity. The study of these regulators has been boosted by technological advances in the field of high-throughput sequencing (HTS) that have made it possible to investigate the binding and modification of many proteins on a genome-wide level, resulting in an explosion of the amount of genomic data available to researchers. The challenge is now to effectively use these data and to integrate the manifold measurements into coherent and intelligible models that will actually help to better understand the way in which gene expression in stem cells is regulated to maintain their precarious identity. In this thesis, I first explore the potential of HTS by describing two pilot studies using the technology to investigate global differences in the transcriptional profiles of different cell populations. In both cases, I was able to identify a number of promising candidates that mark and, possibly, explain the phenotypic and functional differences between the cells studied. The pilot studies highlighted a strong requirement for specialised software to deal with the analysis of HTS data. I have developed GeneProf, a powerful computational framework for the integrated analysis of functional genomics experiments. This software platform solves many recurring data analysis challenges and streamlines, simplifies and standardises data analysis work flows promoting transparent and reproducible methodologies. The software offers a graphical, user-friendly interface and integrates expert knowledge to guide researchers through the analysis process. All primary analysis results are supplemented with a range of informative plots and summaries that ease the interpretation of the results. Behind the scenes, computationally demanding tasks are handled remotely on a distributed network of high-performance computers, removing rate-limiting requirements on local hardware set-up. A flexible and modular software design lays the foundations for a scalable and extensible framework that will be expanded to address an even wider range of data analysis tasks in future. Using GeneProf, billions of data points from over a hundred published studies have been re-analysed. The results of these analyses are stored in an web-accessible database as part of the GeneProf system, building up an accessible resource for all life scientists. All results, together with details about the analysis procedures used, can be browsed and examined in detail and all final and intermediate results are available and can instantly be reused and compared with new findings. In an attempt to elucidate the regulatory mechanisms of ESCs, I use this knowledge base to identify high-confidence candidate genes relevant to stem cell characteristics by comparing the transcriptional profiles of ESCs with those of other cell types. Doing so, I describe 229 genes with highly ESC-specific transcription. I then integrate the expression data for these ES-specific genes with genome-wide transcription factor binding and histone modification data. After investigating the global characteristics of these "regulatory inputs", I employ machine learning methods to first cluster subgroups of genes with ESC-specific expression patterns and then to define a "regulatory code" that marks one of the subgroups based on their regulatory signatures. The tightly co-regulated core cluster of genes identified in this analysis contains many known members of the transcriptional circuitry of ESCs and a number of novel candidates that I deem worthy of further investigations thanks to their similarity to their better known counterparts. Integrating these candidates and the regulatory code that drives them into our models of the workings of ESCs might eventually help to refine the ways in which we derive, culture and manipulate these cells - with all its prospective benefits to research and medicine

    The Treatment of Advanced Persistent Threats on Windows Based Systems

    Get PDF
    Advanced Persistent Threat (APT) is the name given to individuals or groups who write malicious software (malware) and who have the intent to perform actions detrimental to the victim or the victims' organisation. This thesis investigates ways in which it is possible to treat APTs before, during and after the malware has been laid down on the victim's computer. The scope of the thesis is restricted to desktop and laptop computers with hard disk drives. APTs have different motivations for their work and this thesis is agnostic towards their origin and intent. Anti-malware companies freely present the work of APTs in many ways but summarise mainly in the form of white papers. Individually, pieces of these works give an incomplete picture of an APT but in aggregate it is possible to construct a view of APT families and pan-APT commonalities by comparing and contrasting the work of many anti-malware companies; it as if there are alot of the pieces of a jigsaw puzzle but there is no box lid available with the complete picture. In addition, academic papers provide proof of concept attacks and observations, some of which may become used by malware writers. Gaps in, and extensions to, the public knowledge may be filled through inference, implication, interpolation and extrapolation and form the basis for this thesis. The thesis presents a view of where APTs lie on windows-based systems. It uses this view to create and build generic views of where APTs lie on Hard Disc Drives on Windows based systems using the Lockheed Martin Cyber Kill Chain. This is then used to treat APTs on Windows based IT systems using purpose-built software in such a way that the malware is negated by. The thesis does not claim to find all malware on but it demonstrates how to increase the cost of doing business for APTs, for example by overwriting unused disc space so APTs cannot place malware there. The software developed was able to find Indicators of Compromise on all eight Hard Disc Drives provided for analysis. Separately, from a corpus of 228 files known to be associated with malware it identified approximately two thirds as Indicators of Compromise

    The Seeker

    Get PDF
    This book is dedicated to acknowledge and honour the work Prof John P Keeves. A seeker of knowledge, John is exemplary in highlighting the nexus between instruction, learning and research. John’s diversity of learning experiences and contributions to students, colleagues and the broader community are highlighted through the broad range of articles in the book. PART 1 FROM SCHOOL TO UNIVERSITY Chapter 1 Observations from a Family Perspective by John S. Keeves & Wendy Keech Chapter 2 Student Days at PAC by Ren Potts Chapter 3 Prince Alfred College 1934-1977 by Murray Thompson & Alan Dennis Chapter 4 John’s Reflection of PAC and beyond by Ron Gibbs & Murray Thompson Chapter 5 Teaching Days at PAC 1947-49, 52-56, 58-61 by David Prest Chapter 6 Wesley College Council by David Prest Chapter 7 Port Willunga by David Prest Chapter 8 Teacher and Scout Leader by John Willoughby PART 2 CONTRIBUTIONS AND COLLABORATIONS BEYOND AUSTRALIA Chapter 9 Ten Questions by which to Judge the Soundness of Educational Achievement Surveys by T. Neville Postlethwaite Chapter 10 Exploring the Effects of Language Proficiency upon Secondary Students’ Performance in Mathematics in a Developing Context by Sarah J Howie & Tjeerd Plomp Chapter 11 The Subversive Influence of Formative Assessment by Paul Black Chapter 12 Diversity of Research on Teaching by Toh Kok Aun PART 3 FLINDERS UNIVERSITY INSTITUTE OF INTERNATIONAL EDUCATION AND BEYOND Chapter 13 Investigating Good Quality Knowledge about Learning and Teaching by Michael J. Lawson & Helen Askell-Williams Chapter 14 Future Directions for the Reform of Education in Oceania by G R (Bob) Teasdale Chapter 15 Students’ Knowledge of Normal Swallowing: Tracking Growth and Determining Variables by Ingrid Scholten Chapter 16 Rasch Scaling and the Judging of Produce by Murray Thompson Chapter 17 Modelling and Experiments by Tony Gibbons Chapter 18 Theological Education and the Identity of the Uniting Church in Australia by Andrew Dutney Chapter 19 Teaching Out of the Unconscious: The Role of Shadow and Archetype by Robert Matthews Chapter 20 Collaboration over the Net: HTML & Java, the Necessary Tools by Sivakumar Alagumalai & Jury Mohyla Chapter 21 Factors Influencing Reading Achievement in Germany and Finland: Evidence from PISA 2000 by Dieter Kotte & Petra Lietz Epilogue Lifelong Learning and the Place for ICT: Learning and Research for the Twenty-first Century by John P. Keeveshttps://research.acer.edu.au/saier/1014/thumbnail.jp
    corecore