16 research outputs found

    AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

    Full text link
    [EN] This paper describes our participation in the DEtection of TOXicity in comments In Spanish (DETOXIS) shared task 2021 at the 3rd Workshop on Iberian Languages Evaluation Forum. The shared task is divided into two related classification tasks: (i) Task 1: toxicity detection and; (ii) Task 2: toxicity level detection. They focus on the xenophobic problem exacerbated by the spread of toxic comments posted in different online news articles related to immigration. One of the necessary efforts towards mitigating this problem is to detect toxicity in the comments. Our main objective was to implement an accurate model to detect xenophobia in comments about web news articles within the DETOXIS shared task 2021, based on the competition¿s official metrics: the F1-score for Task 1 and the Closeness Evaluation Metric (CEM) for Task 2. To solve the tasks, we worked with two types of machine learning models: (i) statistical models and (ii) Deep Bidirectional Transformers for Language Understanding (BERT) models. We obtained our best results in both tasks using BETO, a BERT model trained on a big Spanish corpus. We obtained the 3rd place in Task 1 official ranking with the F1-score of 0.5996, and we achieved the 6th place in Task 2 official ranking with the CEM of 0.7142. Our results suggest: (i) BERT models obtain better results than statistical models for toxicity detection in text comments; (ii) Monolingual BERT models have an advantage over multilingual BERT models in toxicity detection in text comments in their pre-trained languageMagnossao De Paula, AF.; Baris Schlicht, I. (2021). AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models. CEUR Workshop. 547-566. http://hdl.handle.net/10251/19055354756

    AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

    Full text link
    [EN] This paper describes our participation in the DEtection of TOXicity in comments In Spanish (DETOXIS) shared task 2021 at the 3rd Workshop on Iberian Languages Evaluation Forum. The shared task is divided into two related classification tasks: (i) Task 1: toxicity detection and; (ii) Task 2: toxicity level detection. They focus on the xenophobic problem exacerbated by the spread of toxic comments posted in different online news articles related to immigration. One of the necessary efforts towards mitigating this problem is to detect toxicity in the comments. Our main objective was to implement an accurate model to detect xenophobia in comments about web news articles within the DETOXIS shared task 2021, based on the competition¿s official metrics: the F1-score for Task 1 and the Closeness Evaluation Metric (CEM) for Task 2. To solve the tasks, we worked with two types of machine learning models: (i) statistical models and (ii) Deep Bidirectional Transformers for Language Understanding (BERT) models. We obtained our best results in both tasks using BETO, a BERT model trained on a big Spanish corpus. We obtained the 3rd place in Task 1 official ranking with the F1-score of 0.5996, and we achieved the 6th place in Task 2 official ranking with the CEM of 0.7142. Our results suggest: (i) BERT models obtain better results than statistical models for toxicity detection in text comments; (ii) Monolingual BERT models have an advantage over multilingual BERT models in toxicity detection in text comments in their pre-trained languageMagnossao De Paula, AF.; Baris Schlicht, I. (2021). AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models. CEUR Workshop. 547-566. http://hdl.handle.net/10251/19055354756

    A Strategic Roadmap for Maximizing Big Data Return

    Get PDF
    Big Data has turned out to be one of the popular expressions in IT the last couple of years. In the current digital period, according to the huge improvement occurring in the web and online world innovations, we are facing a gigantic volume of information. The size of data has expanded significantly with the appearance of today's innovation in numerous segments, for example, assembling, business, and science. Types of information have been changed from structured data-driven databases to data including documents, images, audio, video, and social media contents referred to as unstructured data or Big Data. Consequently, most of the organizations try to invest in the big data technology aiming to get value from their investment. However, the organizations face a challenge to determine their requirements and then the technology that suits their businesses. Different technologies are provided by variety of vendors, each of them can be used, and there is no methodology helping them for choosing and making a right decision. Therefore, the objective of this paper is to construct a roadmap for helping the organizations determine their needs and selecting a suitable technology and applying this conducted proposed roadmap practically on two companies

    Big Data Analytics: A Survey

    Get PDF
    Internet-based programs and communication techniques have become widely used and respected in the IT industry recently. A persistent source of "big data," or data that is enormous in volume, diverse in type, and has a complicated multidimensional structure, is internet applications and communications. Today, several measures are routinely performed with no assurance that any of them will be helpful in understanding the phenomenon of interest in an era of automatic, large-scale data collection. Online transactions that involve buying, selling, or even investing are all examples of e-commerce. As a result, they generate data that has a complex structure and a high dimension. The usual data storage techniques cannot handle those enormous volumes of data. There is a lot of work being done to find ways to minimize the dimensionality of big data in order to provide analytics reports that are even more accurate and data visualizations that are more interesting. As a result, the purpose of this survey study is to give an overview of big data analytics along with related problems and issues that go beyond technology

    Robust Variational Physics-Informed Neural Networks

    Full text link
    We introduce a Robust version of the Variational Physics-Informed Neural Networks (RVPINNs) to approximate the Partial Differential Equations (PDEs) solution. We start from a weak Petrov-Galerkin formulation of the problem, select a discrete test space, and define a quadratic loss functional as in VPINNs. Whereas in VPINNs the loss depends upon the selected basis functions of a given test space, herein we minimize a loss based on the residual in the discrete dual norm, which is independent of the test space's choice of test basis functions. We demonstrate that this loss is a reliable and efficient estimator of the true error in the energy norm. The proposed loss function requires computation of the Gram matrix inverse, similar to what occurs in traditional residual minimization methods. To validate our theoretical findings, we test the performance and robustness of our algorithm in several advection-dominated-diffusion problems in one spatial dimension. We conclude that RVPINNs is a robust method

    Robust Variational Physics-Informed Neural Networks

    Get PDF
    We introduce a Robust version of the Variational Physics-Informed Neural Networks method (RVPINNs). As in VPINNs, we define the quadratic loss functional in terms of a Petrov-Galerkin-type variational formulation of the PDE problem: the trial space is a (Deep) Neural Network (DNN) manifold, while the test space is a finite-dimensional vector space. Whereas the VPINN’s loss depends upon the selected basis functions of a given test space, herein, we minimize a loss based on the discrete dual norm of the residual. The main advantage of such a loss definition is that it provides a reliable and efficient estimator of the true error in the energy norm under the assumption of the existence of a local Fortin operator. We test the performance and robustness of our algorithm in several advection-diffusion problems. These numerical results perfectly align with our theoretical findings, showing that our estimates are sharp

    A Deep Learning Approach for Molecular Classification Based on AFM Images

    Get PDF
    In spite of the unprecedented resolution provided by non-contact atomic force microscopy (AFM) with CO-functionalized and advances in the interpretation of the observed contrast, the unambiguous identification of molecular systems solely based on AFM images, without any prior information, remains an open problem. This work presents a first step towards the automatic classification of AFM experimental images by a deep learning model trained essentially with a theoretically generated dataset. We analyze the limitations of two standard models for pattern recognition when applied to AFM image classification and develop a model with the optimal depth to provide accurate results and to retain the ability to generalize. We show that a variational autoencoder (VAE) provides a very efficient way to incorporate, from very few experimental images, characteristic features into the training set that assure a high accuracy in the classification of both theoretical and experimental images

    Deep Learning for Network Traffic Monitoring and Analysis (NTMA): A Survey

    Get PDF
    Modern communication systems and networks, e.g., Internet of Things (IoT) and cellular networks, generate a massive and heterogeneous amount of traffic data. In such networks, the traditional network management techniques for monitoring and data analytics face some challenges and issues, e.g., accuracy, and effective processing of big data in a real-time fashion. Moreover, the pattern of network traffic, especially in cellular networks, shows very complex behavior because of various factors, such as device mobility and network heterogeneity. Deep learning has been efficiently employed to facilitate analytics and knowledge discovery in big data systems to recognize hidden and complex patterns. Motivated by these successes, researchers in the field of networking apply deep learning models for Network Traffic Monitoring and Analysis (NTMA) applications, e.g., traffic classification and prediction. This paper provides a comprehensive review on applications of deep learning in NTMA. We first provide fundamental background relevant to our review. Then, we give an insight into the confluence of deep learning and NTMA, and review deep learning techniques proposed for NTMA applications. Finally, we discuss key challenges, open issues, and future research directions for using deep learning in NTMA applications.publishedVersio
    corecore