768 research outputs found

    A qualitative assessment of machine learning support for detecting data completeness and accuracy issues to improve data analytics in big data for the healthcare industry

    Get PDF
    Tackling Data Quality issues as part of Big Data can be challenging. For data cleansing activities, manual methods are not efficient due to the potentially very large amount of data. This paper aims to qualitatively assess the possibilities for using machine learning in the process of detecting data incompleteness and inaccuracy, since these two data quality dimensions were found to be the most significant by a previous research study conducted by the authors. A review of existing literature concludes that there is no unique machine learning algorithm most suitable to deal with both incompleteness and inaccuracy of data. Various algorithms are selected from existing studies and applied against a representative big (healthcare) dataset. Following experiments, it was also discovered that the implementation of machine learning algorithms in this context encounters several challenges for Big Data quality activities. These challenges are related to the amount of data particular machine learning algorithms can scale to and also to certain data type restrictions imposed by some machine learning algorithms. The study concludes that 1) data imputation works better with linear regression models, 2) clustering models are more efficient to detect outliers but fully automated systems may not be realistic in this context. Therefore, a certain level of human judgement is still needed

    Generative Input: Towards Next-Generation Input Methods Paradigm

    Full text link
    Since the release of ChatGPT, generative models have achieved tremendous success and become the de facto approach for various NLP tasks. However, its application in the field of input methods remains under-explored. Many neural network approaches have been applied to the construction of Chinese input method engines(IMEs).Previous research often assumed that the input pinyin was correct and focused on Pinyin-to-character(P2C) task, which significantly falls short of meeting users' demands. Moreover, previous research could not leverage user feedback to optimize the model and provide personalized results. In this study, we propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task. We propose a novel reward model training method that eliminates the need for additional manual annotations and the performance surpasses GPT-4 in tasks involving intelligent association and conversational assistance. Compared to traditional paradigms, GeneInput not only demonstrates superior performance but also exhibits enhanced robustness, scalability, and online learning capabilities

    Assessing the language of chat for teamwork dialogue

    Get PDF
    In technology enhanced language learning, many pedagogical activities involve students in online discussion such as synchronous chat, in order to help them practice their language skills. Besides developing the language competency of students, it is also crucial to nurture their teamwork competencies for today's global and complex environment. Language communication is an important glue of teamwork. In order to assess the language of chat for teamwork dimensions, several text mining methods are pos sible. However, difficulties arise such as pre-processing being a black box and classification approaches and algorithms being dependent on the context. To address these issues, the study will evaluate and explain preprocessing and classification methods used to analyze teamwork dialogue from a dataset of chat data. Analytics methods evaluated in this study provide a direction for assessing the language of chat for teamwork dialogue and can help extend the work of technology enhanced language learning to n ot only focus on academic competency, but on the communication aspect too

    Android Game for Typing Skill Evaluation

    Get PDF
    As social beings, humans basically need does communication to express his wishes. As the technological progress, the developers are competing to create applications that facilitate communication relationships in both personal and group. In use, it is often found that a classic problem called ‘typo’ that has led to the misunderstanding in socializing. With the development of game applications based on Android, it will generate data in the form of feasibility and development typing ability of before, after, even when usage by counting the number of letters that can be solved also see the speed of words per minute users that aims to train the speed and accuracy of the type which may be impact on the ability to type in a user's social media

    International Conference on Computer Science and Communication Engineering

    Get PDF
    UBT Annual International Conference is the 8th international interdisciplinary peer reviewed conference which publishes works of the scientists as well as practitioners in the area where UBT is active in Education, Research and Development. The UBT aims to implement an integrated strategy to establish itself as an internationally competitive, research-intensive university, committed to the transfer of knowledge and the provision of a world-class education to the most talented students from all background. The main perspective of the conference is to connect the scientists and practitioners from different disciplines in the same place and make them be aware of the recent advancements in different research fields, and provide them with a unique forum to share their experiences. It is also the place to support the new academic staff for doing research and publish their work in international standard level. This conference consists of sub conferences in different fields like: – Computer Science and Communication Engineering– Management, Business and Economics– Mechatronics, System Engineering and Robotics– Energy Efficiency Engineering– Information Systems and Security– Architecture – Spatial Planning– Civil Engineering , Infrastructure and Environment– Law– Political Science– Journalism , Media and Communication– Food Science and Technology– Pharmaceutical and Natural Sciences– Design– Psychology– Education and Development– Fashion– Music– Art and Digital Media– Dentistry– Applied Medicine– Nursing This conference is the major scientific event of the UBT. It is organizing annually and always in cooperation with the partner universities from the region and Europe. We have to thank all Authors, partners, sponsors and also the conference organizing team making this event a real international scientific event. Edmond Hajrizi, President of UBTUBT – Higher Education Institutio

    A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

    Get PDF
    The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets
    • …
    corecore