154 research outputs found

    Android Game for Typing Skill Evaluation

    Get PDF
    As social beings, humans basically need does communication to express his wishes. As the technological progress, the developers are competing to create applications that facilitate communication relationships in both personal and group. In use, it is often found that a classic problem called ‘typo’ that has led to the misunderstanding in socializing. With the development of game applications based on Android, it will generate data in the form of feasibility and development typing ability of before, after, even when usage by counting the number of letters that can be solved also see the speed of words per minute users that aims to train the speed and accuracy of the type which may be impact on the ability to type in a user's social media

    A qualitative assessment of machine learning support for detecting data completeness and accuracy issues to improve data analytics in big data for the healthcare industry

    Get PDF
    Tackling Data Quality issues as part of Big Data can be challenging. For data cleansing activities, manual methods are not efficient due to the potentially very large amount of data. This paper aims to qualitatively assess the possibilities for using machine learning in the process of detecting data incompleteness and inaccuracy, since these two data quality dimensions were found to be the most significant by a previous research study conducted by the authors. A review of existing literature concludes that there is no unique machine learning algorithm most suitable to deal with both incompleteness and inaccuracy of data. Various algorithms are selected from existing studies and applied against a representative big (healthcare) dataset. Following experiments, it was also discovered that the implementation of machine learning algorithms in this context encounters several challenges for Big Data quality activities. These challenges are related to the amount of data particular machine learning algorithms can scale to and also to certain data type restrictions imposed by some machine learning algorithms. The study concludes that 1) data imputation works better with linear regression models, 2) clustering models are more efficient to detect outliers but fully automated systems may not be realistic in this context. Therefore, a certain level of human judgement is still needed

    Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts

    Get PDF
    Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.Sociedad Argentina de Informática e Investigación Operativ

    Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts

    Get PDF
    Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.Sociedad Argentina de Informática e Investigación Operativ

    Conference Proceedings: 2019 TDE Postgraduate Research Student Conference

    Get PDF
    The papers presented in this publication are drawn from the Faculty of Technology, Design and Environment's Annual Research Student Conference in May 2019. The contributions highlight the excellent and varied research being carried out by our students across a range of disciplines including - Architecture, Art, Built Environment, Computing and Engineering. In addition the conference and this publication were brought together by an enthusiastic and talented group of research students

    Investigating the attainment of optimum data quality for EHR Big Data: proposing a new methodological approach

    Get PDF
    The value derivable from the use of data is continuously increasing since some years. Both commercial and non-commercial organisations have realised the immense benefits that might be derived if all data at their disposal could be analysed and form the basis of decision taking. The technological tools required to produce, capture, store, transmit and analyse huge amounts of data form the background to the development of the phenomenon of Big Data. With Big Data, the aim is to be able to generate value from huge amounts of data, often in non-structured format and produced extremely frequently. However, the potential value derivable depends on general level of governance of data, more precisely on the quality of the data. The field of data quality is well researched for traditional data uses but is still in its infancy for the Big Data context. This dissertation focused on investigating effective methods to enhance data quality for Big Data. The principal deliverable of this research is in the form of a methodological approach which can be used to optimize the level of data quality in the Big Data context. Since data quality is contextual, (that is a non-generalizable field), this research study focuses on applying the methodological approach in one use case, in terms of the Electronic Health Records (EHR). The first main contribution to knowledge of this study systematically investigates which data quality dimensions (DQDs) are most important for EHR Big Data. The two most important dimensions ascertained by the research methods applied in this study are accuracy and completeness. These are two well-known dimensions, and this study confirms that they are also very important for EHR Big Data. The second important contribution to knowledge is an investigation into whether Artificial Intelligence with a special focus upon machine learning could be used in improving the detection of dirty data, focusing on the two data quality dimensions of accuracy and completeness. Regression and clustering algorithms proved to be more adequate for accuracy and completeness related issues respectively, based on the experiments carried out. However, the limits of implementing and using machine learning algorithms for detecting data quality issues for Big Data were also revealed and discussed in this research study. It can safely be deduced from the knowledge derived from this part of the research study that use of machine learning for enhancing data quality issues detection is a promising area but not yet a panacea which automates this entire process. The third important contribution is a proposed guideline to undertake data repairs most efficiently for Big Data; this involved surveying and comparing existing data cleansing algorithms against a prototype developed for data reparation. Weaknesses of existing algorithms are highlighted and are considered as areas of practice which efficient data reparation algorithms must focus upon. Those three important contributions form the nucleus for a new data quality methodological approach which could be used to optimize Big Data quality, as applied in the context of EHR. Some of the activities and techniques discussed through the proposed methodological approach can be transposed to other industries and use cases to a large extent. The proposed data quality methodological approach can be used by practitioners of Big Data Quality who follow a data-driven strategy. As opposed to existing Big Data quality frameworks, the proposed data quality methodological approach has the advantage of being more precise and specific. It gives clear and proven methods to undertake the main identified stages of a Big Data quality lifecycle and therefore can be applied by practitioners in the area. This research study provides some promising results and deliverables. It also paves the way for further research in the area. Technical and technological changes in Big Data is rapidly evolving and future research should be focusing on new representations of Big Data, the real-time streaming aspect, and replicating same research methods used in this current research study but on new technologies to validate current results

    Diagnosing Errors inside Computer Networks Based on the Typo Errors

    Get PDF
    Cieľom tejto diplomovej práce je vytvorenie systému pre sieťovú diagnostiku na základe vyhľadávania a opravy preklepov. Systém má slúžiť sieťovým administrátorom ako ďalší diagnostický nástroj. Na rozdiel od primárneho využitia detekcie a korekcie slova v bežnom texte sú tieto metódy aplikované na sieťové dáta, ktoré sú zadané od užívateľa. Vytvorený systém pracuje s NetFlow dátami, pcap súbormi alebo záznamami aktivity. Kontext je modelovaný rôznymi vytvorenými kategóriami dát. Pre overenie správnosti slov sa používajú slovníky, kde každá kategória používa svoj. Hľadanie opravy iba podľa editačnej vzdialenosti vedie k viacerým výsledkom a pre výber správneho kandidáta bola navrhnutá heuristika ohodnotenia kandidátov. Vytvorený systém bol otestovaný z pohľadu funkčnosti a výkonnosti.The goal of this diploma thesis is to create system for network data diagnostics based on detecting and correcting spelling errors. The system is intended to be used by network administrators as next diagnostics tool. As opposed to the primary use of detection and correction spelling error in common text, these methods are applied to network data, which are given by the user. Created system works with NetFlow data, pcap files or log files. Context is modeled with different created data categories. Dictionaries are used to verify the correctness of words, where each category uses its own. Finding a correction only according to the edit distance leads to many results and therefore a heuristic for evaluating candidates was proposed for selecting the right candidate. The created system was tested in terms of functionality and performance.

    License to Supervise:Influence of Driving Automation on Driver Licensing

    Get PDF
    To use highly automated vehicles while a driver remains responsible for safe driving, places new – yet demanding, requirements on the human operator. This is because the automation creates a gap between drivers’ responsibility and the human capabilities to take responsibility, especially for unexpected or time-critical transitions of control. This gap is not being addressed by current practises of driver licensing. Based on literature review, this research collects drivers’ requirements to enable safe transitions in control attuned to human capabilities. This knowledge is intended to help system developers and authorities to identify the requirements on human operators to (re)take responsibility for safe driving after automation
    corecore