Search CORE

1,884 research outputs found

A survey on technique for solving web page classification problem

Author: Jamaludin Sallim
Rozlina Mohamed
Siti Hawa Apandi
Publication venue: 'IOP Publishing'
Publication date: 05/06/2020
Field of study

Nowadays, the number of web pages on the World Wide Web has been increasing due to the popularity of the Internet usage. The web page classification is needed in order to organize the increasing number of web pages. There are many web page classification techniques that have been proposed by the other researchers. However, there is no comprehensive survey on the performance of the techniques for the web page classification. In this paper, surveys of the different web page classification techniques with the result of the techniques achieved are presented. The existing works of web page classification are reviewed. Based on the survey, we found that the neural network technique namely Convolutional Neural Network (CNN) produce high F-measure value and meet the real-time requirement for classification compared to the other machine learning technique

UMP Institutional Repository

DEBACER: a method for slicing moderated debates

Author: Alcoforado Alexandre
Bustos Enzo
Costa Anna Helena Reali
d’Almeida André Corrêa
Ferraz Thomas Palmeira
Gerber Rodrigo
Müller Naíde
Oliveira André Seidel
Veloso Bruno Miguel
Publication venue: 'Sociedade Brasileira de Computacao - SB'
Publication date: 29/11/2021
Field of study

Subjects change frequently in moderated debates with several participants, such as in parliamentary sessions, electoral debates, and trials. Partitioning a debate into blocks with the same subject is essential for understanding. Often a moderator is responsible for defining when a new block begins so that the task of automatically partitioning a moderated debate can focus solely on the moderator's behavior. In this paper, we (i) propose a new algorithm, DEBACER, which partitions moderated debates; (ii) carry out a comparative study between conventional and BERTimbau pipelines; and (iii) validate DEBACER applying it to the minutes of the Assembly of the Republic of Portugal. Our results show the effectiveness of DEBACER.info:eu-repo/semantics/publishedVersio

arXiv.org e-Print Archive

Repositório Institucional da Universidade Católica Portuguesa

Factors Influencing the Surprising Instability of Word Embeddings

Author: Kummerfeld Jonathan K.
Mihalcea Rada
Wendlandt Laura
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.Comment: NAACL HLT 201

arXiv.org e-Print Archive

Crossref

Blending Sentence Optimization Weights of Unsupervised Approaches for Extractive Speech Summarization

Author: Jamil Nursuriati
Seman Noraini
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 31/12/2015
Field of study

AbstractThis paper evaluates the performance of two unsupervised approaches, Maximum Marginal Relevance (MMR) and concept-based global optimization framework for speech summarization. Automatic summarization is very useful techniques that can help the users browse a large amount of data. This study focuses on automatic extractive summarization on multi-dialogue speech corpus. We propose improved methods by blending each unsupervised approach at sentence level. Sentence level information is leveraged to improve the linguistic quality of selected summaries. First, these scores are used to filter sentences for concept extraction and concept weight computation. Second, we pre-select a subset of candidate summary sentences according to their sentence weights. Last, we extend the optimization function to a joint optimization of concept and sentence weights to cover both important concepts and sentences. Our experimental results show that these methods can improve the system performance comparing to the concept-based optimization baseline for both human transcripts and ASR output. The best scores are achieved by combining all three approaches, which are significantly better than the baseline system

Elsevier - Publisher Connector

Sinkhorn-Flow: Predicting Probability Mass Flow in Dynamical Systems Using Optimal Transport

Author: Bhutani Mukul
Kolter J. Zico
Publication venue
Publication date: 14/03/2023
Field of study

Predicting how distributions over discrete variables vary over time is a common task in time series forecasting. But whereas most approaches focus on merely predicting the distribution at subsequent time steps, a crucial piece of information in many settings is to determine how this probability mass flows between the different elements over time. We propose a new approach to predicting such mass flow over time using optimal transport. Specifically, we propose a generic approach to predicting transport matrices in end-to-end deep learning systems, replacing the standard softmax operation with Sinkhorn iterations. We apply our approach to the task of predicting how communities will evolve over time in social network settings, and show that the approach improves substantially over alternative prediction methods. We specifically highlight results on the task of predicting faction evolution in Ukrainian parliamentary voting.Comment: A prior version of the work appeared in the Optimal Transport Workshop at NeurIPS 201

arXiv.org e-Print Archive

Further with Knowledge Graphs:proceedings of the 17th International Conference on Semantic Systems, 6-9 September 2021, Amsterdam, The Netherlands

Author
Publication venue: 'IOS Press'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Further with Knowledge Graphs:proceedings of the 17th International Conference on Semantic Systems, 6-9 September 2021, Amsterdam, The Netherlands

Author
Publication venue: 'IOS Press'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models

Author: Cheng Xueqi
de Rijke Maarten
Fan Yixing
Guo Jiafeng
Wu Chen
Zhang Ruqing
Publication venue
Publication date: 04/04/2022
Field of study

Neural ranking models (NRMs) have shown remarkable success in recent years, especially with pre-trained language models. However, deep neural models are notorious for their vulnerability to adversarial examples. Adversarial attacks may become a new type of web spamming technique given our increased reliance on neural information retrieval models. Therefore, it is important to study potential adversarial attacks to identify vulnerabilities of NRMs before they are deployed. In this paper, we introduce the Adversarial Document Ranking Attack (ADRA) task against NRMs, which aims to promote a target document in rankings by adding adversarial perturbations to its text. We focus on the decision-based black-box attack setting, where the attackers have no access to the model parameters and gradients, but can only acquire the rank positions of the partial retrieved list by querying the target model. This attack setting is realistic in real-world search engines. We propose a novel Pseudo Relevance-based ADversarial ranking Attack method (PRADA) that learns a surrogate model based on Pseudo Relevance Feedback (PRF) to generate gradients for finding the adversarial perturbations. Experiments on two web search benchmark datasets show that PRADA can outperform existing attack strategies and successfully fool the NRM with small indiscernible perturbations of text

arXiv.org e-Print Archive