8,595 research outputs found

    Validating Multimedia Content Moderation Software via Semantic Fusion

    Full text link
    The exponential growth of social media platforms, such as Facebook and TikTok, has revolutionized communication and content publication in human society. Users on these platforms can publish multimedia content that delivers information via the combination of text, audio, images, and video. Meanwhile, the multimedia content release facility has been increasingly exploited to propagate toxic content, such as hate speech, malicious advertisements, and pornography. To this end, content moderation software has been widely deployed on these platforms to detect and blocks toxic content. However, due to the complexity of content moderation models and the difficulty of understanding information across multiple modalities, existing content moderation software can fail to detect toxic content, which often leads to extremely negative impacts. We introduce Semantic Fusion, a general, effective methodology for validating multimedia content moderation software. Our key idea is to fuse two or more existing single-modal inputs (e.g., a textual sentence and an image) into a new input that combines the semantics of its ancestors in a novel manner and has toxic nature by construction. This fused input is then used for validating multimedia content moderation software. We realized Semantic Fusion as DUO, a practical content moderation software testing tool. In our evaluation, we employ DUO to test five commercial content moderation software and two state-of-the-art models against three kinds of toxic content. The results show that DUO achieves up to 100% error finding rate (EFR) when testing moderation software. In addition, we leverage the test cases generated by DUO to retrain the two models we explored, which largely improves model robustness while maintaining the accuracy on the original test set.Comment: Accepted by ISSTA 202

    Getting Past the Language Gap: Innovations in Machine Translation

    Get PDF
    In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT

    Image Quality Assessment for Population Cardiac MRI: From Detection to Synthesis

    Get PDF
    Cardiac magnetic resonance (CMR) images play a growing role in diagnostic imaging of cardiovascular diseases. Left Ventricular (LV) cardiac anatomy and function are widely used for diagnosis and monitoring disease progression in cardiology and to assess the patient's response to cardiac surgery and interventional procedures. For population imaging studies, CMR is arguably the most comprehensive imaging modality for non-invasive and non-ionising imaging of the heart and great vessels and, hence, most suited for population imaging cohorts. Due to insufficient radiographer's experience in planning a scan, natural cardiac muscle contraction, breathing motion, and imperfect triggering, CMR can display incomplete LV coverage, which hampers quantitative LV characterization and diagnostic accuracy. To tackle this limitation and enhance the accuracy and robustness of the automated cardiac volume and functional assessment, this thesis focuses on the development and application of state-of-the-art deep learning (DL) techniques in cardiac imaging. Specifically, we propose new image feature representation types that are learnt with DL models and aimed at highlighting the CMR image quality cross-dataset. These representations are also intended to estimate the CMR image quality for better interpretation and analysis. Moreover, we investigate how quantitative analysis can benefit when these learnt image representations are used in image synthesis. Specifically, a 3D fisher discriminative representation is introduced to identify CMR image quality in the UK Biobank cardiac data. Additionally, a novel adversarial learning (AL) framework is introduced for the cross-dataset CMR image quality assessment and we show that the common representations learnt by AL can be useful and informative for cross-dataset CMR image analysis. Moreover, we utilize the dataset invariance (DI) representations for CMR volumes interpolation by introducing a novel generative adversarial nets (GANs) based image synthesis framework, which enhance the CMR image quality cross-dataset

    Data-driven machine translation for sign languages

    Get PDF
    This thesis explores the application of data-driven machine translation (MT) to sign languages (SLs). The provision of an SL MT system can facilitate communication between Deaf and hearing people by translating information into the native and preferred language of the individual. We begin with an introduction to SLs, focussing on Irish Sign Language - the native language of the Deaf in Ireland. We describe their linguistics and mechanics including similarities and differences with spoken languages. Given the lack of a formalised written form of these languages, an outline of annotation formats is discussed as well as the issue of data collection. We summarise previous approaches to SL MT, highlighting the pros and cons of each approach. Initial experiments in the novel area of example-based MT for SLs are discussed and an overview of the problems that arise when automatically translating these manual-visual languages is given. Following this we detail our data-driven approach, examining the MT system used and modifications made for the treatment of SLs and their annotation. Through sets of automatically evaluated experiments in both language directions, we consider the merits of data-driven MT for SLs and outline the mainstream evaluation metrics used. To complete the translation into SLs, we discuss the addition and manual evaluation of a signing avatar for real SL output

    A Survey on Interpretable Cross-modal Reasoning

    Full text link
    In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. As the deployment of AI systems becomes more ubiquitous, the demand for transparency and comprehensibility in these systems' decision-making processes has intensified. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results. This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the existing CMR datasets with annotations for explanations. Finally, this survey summarizes the challenges for I-CMR and discusses potential future directions. In conclusion, this survey aims to catalyze the progress of this emerging research area by providing researchers with a panoramic and comprehensive perspective, illuminating the state of the art and discerning the opportunities

    Learning Explicit and Implicit Arabic Discourse Relations.

    Get PDF
    We propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the Segmented Discourse Representation Theory (SDRT). We use the Discourse Arabic Treebank corpus (D-ATB) which is composed of newspaper documents extracted from the syntactically annotated Arabic Treebank v3.2 part3 where each document is associated with complete discourse graph according to the cognitive principles of SDRT. Our list of discourse relations is composed of a three-level hierarchy of 24 relations grouped into 4 top-level classes. To automatically learn them, we use state of the art features whose efficiency has been empirically proved. We investigate how each feature contributes to the learning process. We report our experiments on identifying fine-grained discourse relations, mid-level classes and also top-level classes. We compare our approach with three baselines that are based on the most frequent relation, discourse connectives and the features used by Al-Saif and Markert (2011). Our results are very encouraging and outperform all the baselines with an F-score of 78.1% and an accuracy of 80.6%

    Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

    Get PDF
    Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.This work has been partially supported by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231). AE was supported by BAGEP 2021 Award of the Science Academy. EE was supported in part by TUBA GEBIP 2018 Award. BP is in in part funded by Independent Research Fund Denmark (DFF) grant 9063-00077B. IC has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 838188. EL is partly funded by Generalitat Valenciana and the Spanish Government throught projects PROMETEU/2018/089 and RTI2018-094649-B-I00, respectively. SMI is partly funded by UNIRI project uniri-drustv-18-20. GB is partly supported by the Ministry of Innovation and the National Research, Development and Innovation Office within the framework of the Hungarian Artificial Intelligence National Laboratory Programme. COT is partially funded by the Romanian Ministry of European Investments and Projects through the Competitiveness Operational Program (POC) project “HOLOTRAIN” (grant no. 29/221 ap2/07.04.2020, SMIS code: 129077) and by the German Academic Exchange Service (DAAD) through the project “AWAKEN: content-Aware and netWork-Aware faKE News mitigation” (grant no. 91809005). ESA is partially funded by the German Academic Exchange Service (DAAD) through the project “Deep-Learning Anomaly Detection for Human and Automated Users Behavior” (grant no. 91809358)

    The reliability of cephalometric tracing using AI

    Full text link
    Introduction : L'objectif de cette Ă©tude est de comparer la diffĂ©rence entre l'analyse cĂ©phalomĂ©trique manuelle et l'analyse automatisĂ©e par l’intelligence artificielle afin de confirmer la fiabilitĂ© de cette derniĂšre. Notre hypothĂšse de recherche est que la technique manuelle est la plus fiable des deux mĂ©thodes. MĂ©thode : Un total de 99 radiographies cĂ©phalomĂ©triques latĂ©rales sont recueillies. Des tracĂ©s par technique manuelle (MT) et par localisation automatisĂ©e par intelligence artificielle (AI) sont rĂ©alisĂ©s pour toutes les radiographies. La localisation de 29 points cĂ©phalomĂ©triques couramment utilisĂ©s est comparĂ©e entre les deux groupes. L'erreur radiale moyenne (MRE) et un taux de dĂ©tection rĂ©ussie (SDR) de 2 mm sont utilisĂ©s pour comparer les deux groupes. Le logiciel AudaxCeph version 6.2.57.4225 est utilisĂ© pour l'analyse manuelle et l'analyse AI. RĂ©sultats : Le MRE et SDR pour le test de fiabilitĂ© inter-examinateur sont respectivement de 0,87 ± 0,61mm et 95%. Pour la comparaison entre la technique manuelle MT et le repĂ©rage par intelligence artificielle AI, le MRE et SDR pour tous les repĂšres sont respectivement de 1,48 ± 1,42 mm et 78 %. Lorsque les repĂšres dentaires sont exclus, le MRE diminue Ă  1,33 ± 1,39 mm et le SDR augmente Ă  84 %. Lorsque seuls les repĂšres des tissus durs sont inclus (excluant les points des tissus mous et dentaires), le MRE diminue encore Ă  1,25 ± 1,09 mm et le SDR augmente Ă  85 %. Lorsque seuls les points de repĂšre des tissus mous sont inclus, le MRE augmente Ă  1,68 ± 1,89 mm et le SDR diminue Ă  78 %. Conclusion: La performance du logiciel est similaire Ă  celles prĂ©cĂ©demment rapportĂ©e dans la littĂ©rature pour des logiciels utilisant un cadre de modĂ©lisation similaire. Nos rĂ©sultats rĂ©vĂšlent que le repĂ©rage manuel a donnĂ© lieu Ă  une plus grande prĂ©cision. Le logiciel a obtenu de trĂšs bons rĂ©sultats pour les points de tissus durs, mais sa prĂ©cision a diminuĂ© pour les tissus mous et dentaires. Nous concluons que cette technologie est trĂšs prometteuse pour une application en milieu clinique sous la supervision du docteur.Introduction: The objective of this study is to compare the difference between manual cephalometric analysis and automatic analysis by artificial intelligence to confirm the reliability of the latter. Our research hypothesis is that the manual technique is the most reliable of the methods and is still considered the gold standard. Method: A total of 99 lateral cephalometric radiographs were collected in this study. Manual technique (MT) and automatic localization by artificial intelligence (AI) tracings were performed for all radiographs. The localization of 29 commonly used landmarks were compared between both groups. Mean radial error (MRE) and a successful detection rate (SDR) of 2mm were used to compare both groups. AudaxCeph software version 6.2.57.4225 (Audax d.o.o., Ljubljana, Slovenia) was used for both manual and AI analysis. Results: The MRE and SDR for the inter-examinator reliability test were 0.87 ± 0.61mm and 95% respectively. For the comparison between the manual technique MT and landmarking with artificial intelligence AI, the MRE and SDR for all landmarks were 1.48 ± 1.42mm and 78% respectively. When dental landmarks are excluded, the MRE decreases to 1.33 ± 1.39mm and the SDR increases to 84%. When only hard tissue landmarks are included (excluding soft tissue and dental points) the MRE decreases further to 1.25 ± 1.09mm and the SDR increases to 85%. When only soft tissue landmarks are included the MRE increases to 1.68 ± 1.89mm and the SDR decreases to 78%. Conclusion: The software performed similarly to what was previously reported in literature for software that use analogous modeling framework. Comparing the software’s landmarking to manual landmarking our results reveal that the manual landmarking resulted in higher accuracy. The software operated very well for hard tissue points, but its accuracy went down for soft and dental tissue. Our conclusion is this technology shows great promise for application in clinical settings under the doctor’s supervision
    • 

    corecore