Search CORE

4 research outputs found

Real-World Image Restoration Using Degradation Adaptive Transformer-Based Adversarial Network

Author: Zhai Lujun
Publication venue: Digital Commons @PVAMU
Publication date: 01/08/2023
Field of study

Most existing learning-based image restoration methods heavily rely on paired degraded/non-degraded training datasets that are based on simplistic handcrafted degradation assumptions. These assumptions often involve a limited set of degradations, such as Gaussian blurs, noises, and bicubic downsampling. However, when these methods are applied to real-world images, there is a significant decrease in performance due to the discrepancy between synthetic and realistic degradation. Additionally, they lack the flexibility to adapt to unknown degradations in practical scenarios, which limits their generalizability to complex and unconstrained scenes. To address the absence of image pairs, recent studies have proposed Generative Adversarial Network (GAN)-based unpaired methods. Nevertheless, unpaired learning models based on convolution operations encounter challenges in capturing long-range pixel dependencies in real-world images. This limitation stems from their reliance on convolution operations, which offer local connectivity and translation equivariance but struggle to capture global dependencies due to their limited receptive field. To address these challenges, this dissertation proposed an innovative unpaired image restoration basic model along with an advanced model. The proposed basic model is the DA-CycleGAN model, which is based on the CycleGAN [1] neural network and specifically designed for blind real-world Single Image Super-Resolution (SISR). The DA-CycleGAN incorporates a degradation adaptive (DA) module to learn various real-world degradations (such as noise and blur patterns) in an unpaired manner, enabling strong flexible adaptation. Additionally, an advanced model called Trans-CycleGAN was designed, which integrated the Transformer architecture into CycleGAN to leverage its global connectivity. This combination allowed for image-to-image translation using CycleGAN [1] while enabling the Transformer to model global connectivity across long-range pixels. Extensive experiments conducted on realistic images demonstrate the superior performance of the proposed method in solving real-world image restoration problems, resulting in clearer and finer details. Overall, this dissertation presents a novel unpaired image restoration basic model and an advanced model that effectively address the limitations of existing approaches. The proposed approach achieves significant advancements in handling real-world degradations and modeling long-range pixel dependencies, thereby offering substantial improvements in image restoration tasks. Index Terms— Cross-domain translation, generative adversarial network, image restoration, super-resolution, transformer, unpaired training

Digital Commons @ PVAMU (Prairie View A&M Univ)

Detecting Social Spamming on Facebook Platform

Author: Mohamed Ghada Zakaria
Publication venue
Publication date: 01/01/2018
Field of study

Tänapäeval toimub väga suur osa kommunikatsioonist elektroonilistes suhtlusvõrgustikes. Ühest küljest lihtsustab see omavahelist suhtlemist ja uudiste levimist, teisest küljest loob see ideaalse pinnase sotsiaalse rämpsposti levikuks. Rohkem kui kahe miljardi kasutajaga Facebooki platvorm on hetkel rämpsposti levitajate üks põhilisi sihtmärke. Platvormi kasutajad puutuvad igapäevaselt kokku ohtude ja ebameeldivustega nagu pahavara levitavad lingid, vulgaarsused, vihakõned, kättemaksuks levitatav porno ja muu. Kuigi uurijad on esitanud erinevaid tehnikaid sotsiaalmeedias rämpspostituste vähendamiseks, on neid rakendatud eelkõige Twitteri platvormil ja vaid vähesed on seda teinud Facebookis. Pidevalt arenevate rämpspostitusmeetoditega võitlemiseks tuleb välja töötada järjest uusi rämpsposti avastamise viise. Käesolev magistritöö keskendub Facebook platvormile, kuhu on lõputöö raames paigutatud kümme „meepurki” (ingl honeypot), mille abil määratakse kindlaks väljakutsed rämpsposti tuvastamisel, et pakkuda tõhusamaid lahendusi. Kasutades kõiki sisendeid, kaasa arvatud varem mujal sotsiaalmeedias testitud meetodid ja informatsioon „meepurkidest”, luuakse andmekaeve ja masinõppe meetoditele tuginedes klassifikaator, mis suudab eristada rämpspostitaja profiili tavakasutaja profiilist. Nimetatu saavutamiseks vaadeldakse esmalt peamisi väljakutseid ja piiranguid rämpsposti tuvastamisel ning esitletakse varasemalt tehtud uuringuid koos tulemustega. Seejärel kirjeldatakse rakenduslikku protsessi, alustades „meepurgi” ehitusest, andmete kogumisest ja ettevalmistamisest kuni klassifikaatori ehitamiseni. Lõpuks esitatakse „meepurkidelt” saadud vaatlusandmed koos klassifikaatori tulemustega ning võrreldakse neid uurimistöödega teiste sotsiaalmeedia platvormide kohta. Selle lõputöö peamine panus on klassifikaator, mis suudab eristada Facebooki kasutaja profiilid spämmerite omast. Selle lõputöö originaalsus seisneb eesmärgis avastada erinevat sotsiaalset spämmi, mitte ainult pahavara levitajaid vaid ka neid, kes levitavad roppust, massiliselt sõnumeid, heakskiitmata sisu jne.OSNs (Online Social Networks) are dominating the human interaction nowadays, easing the communication and spreading of news on one hand and providing a global fertile soil to grow all different kinds of social spamming, on the other. Facebook platform, with its 2 billions current active users, is currently on the top of the spammers' targets. Its users are facing different kind of social threats everyday, including malicious links, profanity, hate speech, revenge porn and others. Although many researchers have presented their different techniques to defeat spam on social media, specially on Twitter platform, very few have targeted Facebook's.To fight the continuously evolving spam techniques, we have to constantly develop and enhance the spam detection methods. This research digs deeper in the Facebook platform, through 10 implemented honeypots, to state the challenges that slow the spam detection process, and ways to overcome it. Using all the given inputs, including the previous techniques tested on other social medias along with observations driven from the honeypots, the final product is a classifier that distinguish the spammer profiles from legitimate ones through data mining and machine learning techniques. To achieve this, the research first overviews the main challenges and limitations that obstruct the spam detection process, and presents the related researches with their results. It then, outlines the implementation steps, from the honeypot construction step, passing through the data collection and preparation and ending by building the classifier itself. Finally, it presents the observations driven from the honeypot and the results from the classifier and validates it against the results from previous researches on different social platforms. The main contribution of this thesis is the end classifier which will be able to distinguish between the legitimate Facebook profiles and the spammer ones. The originality of the research lies in its aim to detect all kind of social spammers, not only the spreading-malware spammers, but also spamming in its general context, e.g. the ones spreading profanity, bulk messages and unapproved contents

DSpace at Tartu University Library

A Topic-Based Hidden Markov Model for Real-Time Spam Tweets Filtering

Author: Abascal-Mena
Almaatouq
Cao
Chen
Chen
Chu
Chu
Ghahramani
Hall
Hu
Hu
Le Cam
Manning
Martinez-Romo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

International audienceOnline social networks (OSNs) have become an important source of information for a tremendous range of applications and researches such as search engines, and summarization systems. However, the high usability and accessibility of OSNs have exposed many information quality (IQ) problems which consequently decrease the performance of the OSNs dependent applications. Social spammers are a particular kind of ill-intentioned users who degrade the quality of OSNs information through misusing all possible services provided by OSNs. Social spammers spread many intensive posts/tweets to lure legitimate users to malicious or commercial sites containing malware downloads, phishing, and drug sales. Given the fact that Twitter is not immune towards the social spam problem, different researchers have designed various detection methods which inspect individual tweets or accounts for the existence of spam contents. However, although of the high detection rates of the account-based spam detection methods, these methods are not suitable for filtering tweets in the real-time detection because of the need for information from Twitter’s servers. At tweet spam detection level, many light features have been proposed for real-time filtering; however, the existing classification models separately classify a tweet without considering the state of previous handled tweets associated with a topic. Also, these models periodically require retraining using a ground-truth data to make them up-to-date. Hence, in this paper, we formalize a Hidden Markov Model (HMM) as a time-dependent model for real-time topical spam tweets filtering. More precisely, our method only leverages the available and accessible meta-data in the tweet object to detect spam tweets exiting in a stream of tweets related to a topic (e.g., #Trump), with considering the state of previously handled tweets associated to the same topic. Compared to the classical time-independent classification methods such as Random Forest, the experimental evaluation demonstrates the efficiency of increasing the quality of topics in terms of precision, recall, and F-measure performance metrics

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Information quality in online social media and big data collection: an example of Twitter spam detection

Author: Washha Mahdi
Publication venue
Publication date: 17/07/2018
Field of study

La popularité des médias sociaux en ligne (Online Social Media - OSM) est fortement liée à la qualité du contenu généré par l'utilisateur (User Generated Content - UGC) et la protection de la vie privée des utilisateurs. En se basant sur la définition de la qualité de l'information, comme son aptitude à être exploitée, la facilité d'utilisation des OSM soulève de nombreux problèmes en termes de la qualité de l'information ce qui impacte les performances des applications exploitant ces OSM. Ces problèmes sont causés par des individus mal intentionnés (nommés spammeurs) qui utilisent les OSM pour disséminer des fausses informations et/ou des informations indésirables telles que les contenus commerciaux illégaux. La propagation et la diffusion de telle information, dit spam, entraînent d'énormes problèmes affectant la qualité de services proposés par les OSM. La majorité des OSM (comme Facebook, Twitter, etc.) sont quotidiennement attaquées par un énorme nombre d'utilisateurs mal intentionnés. Cependant, les techniques de filtrage adoptées par les OSM se sont avérées inefficaces dans le traitement de ce type d'information bruitée, nécessitant plusieurs semaines ou voir plusieurs mois pour filtrer l'information spam. En effet, plusieurs défis doivent être surmontées pour réaliser une méthode de filtrage de l'information bruitée . Les défis majeurs sous-jacents à cette problématique peuvent être résumés par : (i) données de masse ; (ii) vie privée et sécurité ; (iii) hétérogénéité des structures dans les réseaux sociaux ; (iv) diversité des formats du UGC ; (v) subjectivité et objectivité. Notre travail s'inscrit dans le cadre de l'amélioration de la qualité des contenus en termes de messages partagés (contenu spam) et de profils des utilisateurs (spammeurs) sur les OSM en abordant en détail les défis susmentionnés. Comme le spam social est le problème le plus récurant qui apparaît sur les OSM, nous proposons deux approches génériques pour détecter et filtrer le contenu spam : i) La première approche consiste à détecter le contenu spam (par exemple, les tweets spam) dans un flux en temps réel. ii) La seconde approche est dédiée au traitement d'un grand volume des données relatives aux profils utilisateurs des spammeurs (par exemple, les comptes Twitter). Pour filtrer le contenu spam en temps réel, nous introduisons une approche d'apprentissage non supervisée qui permet le filtrage en temps réel des tweets spams dans laquelle la fonction de classification est adaptée automatiquement. La fonction de classification est entraîné de manière itérative et ne requière pas une collection de données annotées manuellement. Dans la deuxième approche, nous traitons le problème de classification des profils utilisateurs dans le contexte d'une collection de données à grande échelle. Nous proposons de faire une recherche dans un espace réduit de profils utilisateurs (une communauté d'utilisateurs) au lieu de traiter chaque profil d'utilisateur à part. Ensuite, chaque profil qui appartient à cet espace réduit est analysé pour prédire sa classe à l'aide d'un modèle de classification binaire. Les expériences menées sur Twitter ont montré que le modèle de classification collective non supervisé proposé est capable de générer une fonction efficace de classification binaire en temps réel des tweets qui s'adapte avec l'évolution des stratégies des spammeurs sociaux sur Twitter. L'approche proposée surpasse les performances de deux méthodes de l'état de l'art de détection de spam en temps réel. Les résultats de la deuxième approche ont démontré que l'extraction des métadonnées des spams et leur exploitation dans le processus de recherche de profils de spammeurs est réalisable dans le contexte de grandes collections de profils Twitter. L'approche proposée est une alternative au traitement de tous les profils existants dans le OSM.The popularity of OSM is mainly conditioned by the integrity and the quality of UGC as well as the protection of users' privacy. Based on the definition of information quality as fitness for use, the high usability and accessibility of OSM have exposed many information quality (IQ) problems which consequently decrease the performance of OSM dependent applications. Such problems are caused by ill-intentioned individuals who misuse OSM services to spread different kinds of noisy information, including fake information, illegal commercial content, drug sales, mal- ware downloads, and phishing links. The propagation and spreading of noisy information cause enormous drawbacks related to resources consumptions, decreasing quality of service of OSM-based applications, and spending human efforts. The majority of popular social networks (e.g., Facebook, Twitter, etc) over the Web 2.0 is daily attacked by an enormous number of ill-intentioned users. However, those popular social networks are ineffective in handling the noisy information, requiring several weeks or months to detect them. Moreover, different challenges stand in front of building a complete OSM-based noisy information filtering methods that can overcome the shortcomings of OSM information filters. These challenges are summarized in: (i) big data; (ii) privacy and security; (iii) structure heterogeneity; (iv) UGC format diversity; (v) subjectivity and objectivity; (vi) and service limitations In this thesis, we focus on increasing the quality of social UGC that are published and publicly accessible in forms of posts and profiles over OSNs through addressing in-depth the stated serious challenges. As the social spam is the most common IQ problem appearing over the OSM, we introduce a design of two generic approaches for detecting and filtering out the spam content. The first approach is for detecting the spam posts (e.g., spam tweets) in a real-time stream, while the other approach is dedicated for handling a big data collection of social profiles (e.g., Twitter accounts). For filtering the spam content in real-time, we introduce an unsupervised collective-based framework that automatically adapts a supervised spam tweet classification function in order to have an updated real-time classifier without requiring manual annotated data-sets. In the second approach, we treat the big data collections through minimizing the search space of profiles that needs advanced analysis, instead of processing every user's profile existing in the collections. Then, each profile falling in the reduced search space is further analyzed in an advanced way to produce an accurate decision using a binary classification model. The experiments conducted on Twitter online social network have shown that the unsupervised collective-based framework is able to produce updated and effective real- time binary tweet-based classification function that adapts the high evolution of social spammer's strategies on Twitter, outperforming the performance of two existing real- time spam detection methods. On the other hand, the results of the second approach have demonstrated that performing a preprocessing step for extracting spammy meta-data values and leveraging them in the retrieval process is a feasible solution for handling a large collections of Twitter profiles, as an alternative solution for processing all profiles existing in the input data collection. The introduced approaches open different opportunities for information science researcher to leverage our solutions in other information filtering problems and applications. Our long term perspective consists of (i) developing a generic platform covering most common OSM for instantly checking the quality of a given piece of information where the forms of the input information could be profiles, website links, posts, and plain texts; (ii) and transforming and adapting our methods to handle additional IQ problems such as rumors and information overloading

Thèses en ligne de l'Université Toulouse III - Paul Sabatier