5 research outputs found

    Semi-supervised incremental learning with few examples for discovering medical association rules

    Get PDF
    Background: Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data. Methods: In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps. Results: The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data. Conclusions: Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.This work has been partially supported by projects DOTT-HEALTH (PID2019-106942RB-C32, MCI/AEI/FEDER, UE). (Design of the study. Analysis and interpretation of data) and EXTRAE II (IMIENS 2019). (Design of the study. Analysis and interpretation of data. HUF corpus manual tagging. Writing of the manuscript), PI18CIII/00004 “Infobanco para uso secundario de datos basado en estándares de tecnología y conocimiento: implementación y evaluación de un infobanco de salud para CoRIS (Info-bank for the secondary use of data based on technology and knowledge standards: implementation and evaluation of a health info-bank for CoRIS) – SmartPITeS” (Data collection and HUF corpus construction), and PI18CIII/00019 - PI18/00890 - PI18/00981 “Arquitectura normalizada de datos clínicos para la generación de infobancos y su uso secundario en investigación: solución tecnológica (Clinical data normalized architecture for the genaration of info-banks and their secondary use in research: technological solution) – CAMAMA 4” (Data collection and HUF corpus construction) from Fondo de Investigación Sanitaria (FIS) Plan Nacional de I+D+i.S

    I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

    Full text link
    A simile is a figure of speech that compares two different things (called the tenor and the vehicle) via shared properties. The tenor and the vehicle are usually connected with comparator words such as "like" or "as". The simile phenomena are unique and complex in a real-life dialogue scene where the tenor and the vehicle can be verbal phrases or sentences, mentioned by different speakers, exist in different sentences, or occur in reversed order. However, the current simile research usually focuses on similes in a triplet tuple (tenor, property, vehicle) or a single sentence where the tenor and vehicle are usually entities or noun phrases, which could not reflect complex simile phenomena in real scenarios. In this paper, we propose a novel and high-quality multilingual simile dialogue (MSD) dataset to facilitate the study of complex simile phenomena. The MSD is the largest manually annotated simile data (\sim20K) and it contains both English and Chinese data. Meanwhile, the MSD data can also be used on dialogue tasks to test the ability of dialogue systems when using similes. We design 3 simile tasks (recognition, interpretation, and generation) and 2 dialogue tasks (retrieval and generation) with MSD. For each task, we provide experimental results from strong pre-trained or state-of-the-art models. The experiments demonstrate the challenge of MSD and we have released the data/code on GitHub.Comment: 13 Pages, 1 Figure, 12 Tables, ACL 2023 finding

    The Concept of Simile in Relevance Theory: An Analysis of the Degree of Relevance of the Simile “Houris” in the Holy Qur\u27an

    Get PDF
    This study deals with simile as a figure of speech in the field of theoretical linguistics and relevance theory. It presents a model in analyzing the degree of relevance in comparing “Houris” in the Quranic discourse, based on the inductive approach in reviewing related literature and the descriptive approach in selecting the material. In other words, this study further illustrates some of its pragmatic functions via Qur\u27anic discourse by utilizing the selected simile of Houris -the nymphs of paradise in the Qur\u27anic verses: and wide-eyed houris, as the likeness of nestled pearls (Al-Waqi\u27ah: 22-23). The use of relevance theory here reaffirms the analytical benefits of cognitive linguistic accounts. This paper is divided into two main sections. The first section is for providing conceptual clarity to the notion of simile as a single basic phenomenon. The second section provides the practical application of the study’s theoretical premise by scrutinizing the realization the Degree of Relevance of the simile “Houris” in the Holy Qur\u27an. This study finds that simile is more related to metaphor than to literal comparison. Unlike literal comparison, both metaphor and simile figuratively involve ad hoc concepts, even though the concepts work, and are constructed and perceived differently. It further reveals that Qur\u27anic simile is used as a cognitive tool that facilitates inferential and interpretative processes via its communication of abstract and unseen and nuanced themes of God’s message to its audience. The houris\u27 similes essentially provide its recipients with strong ostensive stimuli with strong contextual effects, while the contents enable them to exert the least cognitive effort to grasp the intangible and immeasurable, and infer the utterance intended meaning

    Applications of Mass Spectrometry to the Analysis of Adulterated Food

    Get PDF
    Food quality and safety are the major issues in food industry around the world. With the abundance of processed food with long supply chain in the market, food fraud is always a concern. Food fraud is defined as modification of an actual labeling of food chemicals in which expensive, less accessible original ingredients are replaced by lower cost and more accessible alternatives, which is also known as food adulteration. Some of these food adulterations might only affect the public mass financially, but some adulteration might affect others more seriously. Various food authentication techniques can be utilized to ensure safety and quality of food products adhering to the standards, such as DNA-based techniques with polymerase chain reaction, vibrational spectroscopy, electronic nose, and mass spectrophotometry, which has been used widely to estimate pharmaceutical and biological samples. However, most of these techniques still require substantial sample preparation or some have very high sensitivity to adulterants and are prone to give undefined results. Complex mixtures of food adulterants can be identified using very high resolution mass spectroscopy. The chemical compounds and structure of natural and mixtures of the adulterants are examined in this chapter using advanced mass spectroscopy technique and gas chromatography time-of-flight mass spectroscopy to identify the lard biomarker

    Automated Realistic Test Input Generation and Cost Reduction in Service-centric System Testing

    Get PDF
    Service-centric System Testing (ScST) is more challenging than testing traditional software due to the complexity of service technologies and the limitations that are imposed by the SOA environment. One of the most important problems in ScST is the problem of realistic test data generation. Realistic test data is often generated manually or using an existing source, thus it is hard to automate and laborious to generate. One of the limitations that makes ScST challenging is the cost associated with invoking services during testing process. This thesis aims to provide solutions to the aforementioned problems, automated realistic input generation and cost reduction in ScST. To address automation in realistic test data generation, the concept of Service-centric Test Data Generation (ScTDG) is presented, in which existing services used as realistic data sources. ScTDG minimises the need for tester input and dependence on existing data sources by automatically generating service compositions that can generate the required test data. In experimental analysis, our approach achieved between 93% and 100% success rates in generating realistic data while state-of-the-art automated test data generation achieved only between 2% and 34%. The thesis addresses cost concerns at test data generation level by enabling data source selection in ScTDG. Source selection in ScTDG has many dimensions such as cost, reliability and availability. This thesis formulates this problem as an optimisation problem and presents a multi-objective characterisation of service selection in ScTDG, aiming to reduce the cost of test data generation. A cost-aware pareto optimal test suite minimisation approach addressing testing cost concerns during test execution is also presented. The approach adapts traditional multi-objective minimisation approaches to ScST domain by formulating ScST concerns, such as invocation cost and test case reliability. In experimental analysis, the approach achieved reductions between 69% and 98.6% in monetary cost of service invocations during testin
    corecore