4 research outputs found

    Real-World Image Restoration Using Degradation Adaptive Transformer-Based Adversarial Network

    Get PDF
    Most existing learning-based image restoration methods heavily rely on paired degraded/non-degraded training datasets that are based on simplistic handcrafted degradation assumptions. These assumptions often involve a limited set of degradations, such as Gaussian blurs, noises, and bicubic downsampling. However, when these methods are applied to real-world images, there is a significant decrease in performance due to the discrepancy between synthetic and realistic degradation. Additionally, they lack the flexibility to adapt to unknown degradations in practical scenarios, which limits their generalizability to complex and unconstrained scenes. To address the absence of image pairs, recent studies have proposed Generative Adversarial Network (GAN)-based unpaired methods. Nevertheless, unpaired learning models based on convolution operations encounter challenges in capturing long-range pixel dependencies in real-world images. This limitation stems from their reliance on convolution operations, which offer local connectivity and translation equivariance but struggle to capture global dependencies due to their limited receptive field. To address these challenges, this dissertation proposed an innovative unpaired image restoration basic model along with an advanced model. The proposed basic model is the DA-CycleGAN model, which is based on the CycleGAN [1] neural network and specifically designed for blind real-world Single Image Super-Resolution (SISR). The DA-CycleGAN incorporates a degradation adaptive (DA) module to learn various real-world degradations (such as noise and blur patterns) in an unpaired manner, enabling strong flexible adaptation. Additionally, an advanced model called Trans-CycleGAN was designed, which integrated the Transformer architecture into CycleGAN to leverage its global connectivity. This combination allowed for image-to-image translation using CycleGAN [1] while enabling the Transformer to model global connectivity across long-range pixels. Extensive experiments conducted on realistic images demonstrate the superior performance of the proposed method in solving real-world image restoration problems, resulting in clearer and finer details. Overall, this dissertation presents a novel unpaired image restoration basic model and an advanced model that effectively address the limitations of existing approaches. The proposed approach achieves significant advancements in handling real-world degradations and modeling long-range pixel dependencies, thereby offering substantial improvements in image restoration tasks. Index Terms— Cross-domain translation, generative adversarial network, image restoration, super-resolution, transformer, unpaired training

    Detecting Social Spamming on Facebook Platform

    Get PDF
    TĂ€napĂ€eval toimub vĂ€ga suur osa kommunikatsioonist elektroonilistes suhtlusvĂ”rgustikes. Ühest kĂŒljest lihtsustab see omavahelist suhtlemist ja uudiste levimist, teisest kĂŒljest loob see ideaalse pinnase sotsiaalse rĂ€mpsposti levikuks. Rohkem kui kahe miljardi kasutajaga Facebooki platvorm on hetkel rĂ€mpsposti levitajate ĂŒks pĂ”hilisi sihtmĂ€rke. Platvormi kasutajad puutuvad igapĂ€evaselt kokku ohtude ja ebameeldivustega nagu pahavara levitavad lingid, vulgaarsused, vihakĂ”ned, kĂ€ttemaksuks levitatav porno ja muu. Kuigi uurijad on esitanud erinevaid tehnikaid sotsiaalmeedias rĂ€mpspostituste vĂ€hendamiseks, on neid rakendatud eelkĂ”ige Twitteri platvormil ja vaid vĂ€hesed on seda teinud Facebookis. Pidevalt arenevate rĂ€mpspostitusmeetoditega vĂ”itlemiseks tuleb vĂ€lja töötada jĂ€rjest uusi rĂ€mpsposti avastamise viise. KĂ€esolev magistritöö keskendub Facebook platvormile, kuhu on lĂ”putöö raames paigutatud kĂŒmme „meepurki” (ingl honeypot), mille abil mÀÀratakse kindlaks vĂ€ljakutsed rĂ€mpsposti tuvastamisel, et pakkuda tĂ”husamaid lahendusi. Kasutades kĂ”iki sisendeid, kaasa arvatud varem mujal sotsiaalmeedias testitud meetodid ja informatsioon „meepurkidest”, luuakse andmekaeve ja masinĂ”ppe meetoditele tuginedes klassifikaator, mis suudab eristada rĂ€mpspostitaja profiili tavakasutaja profiilist. Nimetatu saavutamiseks vaadeldakse esmalt peamisi vĂ€ljakutseid ja piiranguid rĂ€mpsposti tuvastamisel ning esitletakse varasemalt tehtud uuringuid koos tulemustega. SeejĂ€rel kirjeldatakse rakenduslikku protsessi, alustades „meepurgi” ehitusest, andmete kogumisest ja ettevalmistamisest kuni klassifikaatori ehitamiseni. LĂ”puks esitatakse „meepurkidelt” saadud vaatlusandmed koos klassifikaatori tulemustega ning vĂ”rreldakse neid uurimistöödega teiste sotsiaalmeedia platvormide kohta. Selle lĂ”putöö peamine panus on klassifikaator, mis suudab eristada Facebooki kasutaja profiilid spĂ€mmerite omast. Selle lĂ”putöö originaalsus seisneb eesmĂ€rgis avastada erinevat sotsiaalset spĂ€mmi, mitte ainult pahavara levitajaid vaid ka neid, kes levitavad roppust, massiliselt sĂ”numeid, heakskiitmata sisu jne.OSNs (Online Social Networks) are dominating the human interaction nowadays, easing the communication and spreading of news on one hand and providing a global fertile soil to grow all different kinds of social spamming, on the other. Facebook platform, with its 2 billions current active users, is currently on the top of the spammers' targets. Its users are facing different kind of social threats everyday, including malicious links, profanity, hate speech, revenge porn and others. Although many researchers have presented their different techniques to defeat spam on social media, specially on Twitter platform, very few have targeted Facebook's.To fight the continuously evolving spam techniques, we have to constantly develop and enhance the spam detection methods. This research digs deeper in the Facebook platform, through 10 implemented honeypots, to state the challenges that slow the spam detection process, and ways to overcome it. Using all the given inputs, including the previous techniques tested on other social medias along with observations driven from the honeypots, the final product is a classifier that distinguish the spammer profiles from legitimate ones through data mining and machine learning techniques. To achieve this, the research first overviews the main challenges and limitations that obstruct the spam detection process, and presents the related researches with their results. It then, outlines the implementation steps, from the honeypot construction step, passing through the data collection and preparation and ending by building the classifier itself. Finally, it presents the observations driven from the honeypot and the results from the classifier and validates it against the results from previous researches on different social platforms. The main contribution of this thesis is the end classifier which will be able to distinguish between the legitimate Facebook profiles and the spammer ones. The originality of the research lies in its aim to detect all kind of social spammers, not only the spreading-malware spammers, but also spamming in its general context, e.g. the ones spreading profanity, bulk messages and unapproved contents

    A Topic-Based Hidden Markov Model for Real-Time Spam Tweets Filtering

    Get PDF
    International audienceOnline social networks (OSNs) have become an important source of information for a tremendous range of applications and researches such as search engines, and summarization systems. However, the high usability and accessibility of OSNs have exposed many information quality (IQ) problems which consequently decrease the performance of the OSNs dependent applications. Social spammers are a particular kind of ill-intentioned users who degrade the quality of OSNs information through misusing all possible services provided by OSNs. Social spammers spread many intensive posts/tweets to lure legitimate users to malicious or commercial sites containing malware downloads, phishing, and drug sales. Given the fact that Twitter is not immune towards the social spam problem, different researchers have designed various detection methods which inspect individual tweets or accounts for the existence of spam contents. However, although of the high detection rates of the account-based spam detection methods, these methods are not suitable for filtering tweets in the real-time detection because of the need for information from Twitter’s servers. At tweet spam detection level, many light features have been proposed for real-time filtering; however, the existing classification models separately classify a tweet without considering the state of previous handled tweets associated with a topic. Also, these models periodically require retraining using a ground-truth data to make them up-to-date. Hence, in this paper, we formalize a Hidden Markov Model (HMM) as a time-dependent model for real-time topical spam tweets filtering. More precisely, our method only leverages the available and accessible meta-data in the tweet object to detect spam tweets exiting in a stream of tweets related to a topic (e.g., #Trump), with considering the state of previously handled tweets associated to the same topic. Compared to the classical time-independent classification methods such as Random Forest, the experimental evaluation demonstrates the efficiency of increasing the quality of topics in terms of precision, recall, and F-measure performance metrics

    Information quality in online social media and big data collection: an example of Twitter spam detection

    Get PDF
    La popularitĂ© des mĂ©dias sociaux en ligne (Online Social Media - OSM) est fortement liĂ©e Ă  la qualitĂ© du contenu gĂ©nĂ©rĂ© par l'utilisateur (User Generated Content - UGC) et la protection de la vie privĂ©e des utilisateurs. En se basant sur la dĂ©finition de la qualitĂ© de l'information, comme son aptitude Ă  ĂȘtre exploitĂ©e, la facilitĂ© d'utilisation des OSM soulĂšve de nombreux problĂšmes en termes de la qualitĂ© de l'information ce qui impacte les performances des applications exploitant ces OSM. Ces problĂšmes sont causĂ©s par des individus mal intentionnĂ©s (nommĂ©s spammeurs) qui utilisent les OSM pour dissĂ©miner des fausses informations et/ou des informations indĂ©sirables telles que les contenus commerciaux illĂ©gaux. La propagation et la diffusion de telle information, dit spam, entraĂźnent d'Ă©normes problĂšmes affectant la qualitĂ© de services proposĂ©s par les OSM. La majoritĂ© des OSM (comme Facebook, Twitter, etc.) sont quotidiennement attaquĂ©es par un Ă©norme nombre d'utilisateurs mal intentionnĂ©s. Cependant, les techniques de filtrage adoptĂ©es par les OSM se sont avĂ©rĂ©es inefficaces dans le traitement de ce type d'information bruitĂ©e, nĂ©cessitant plusieurs semaines ou voir plusieurs mois pour filtrer l'information spam. En effet, plusieurs dĂ©fis doivent ĂȘtre surmontĂ©es pour rĂ©aliser une mĂ©thode de filtrage de l'information bruitĂ©e . Les dĂ©fis majeurs sous-jacents Ă  cette problĂ©matique peuvent ĂȘtre rĂ©sumĂ©s par : (i) donnĂ©es de masse ; (ii) vie privĂ©e et sĂ©curitĂ© ; (iii) hĂ©tĂ©rogĂ©nĂ©itĂ© des structures dans les rĂ©seaux sociaux ; (iv) diversitĂ© des formats du UGC ; (v) subjectivitĂ© et objectivitĂ©. Notre travail s'inscrit dans le cadre de l'amĂ©lioration de la qualitĂ© des contenus en termes de messages partagĂ©s (contenu spam) et de profils des utilisateurs (spammeurs) sur les OSM en abordant en dĂ©tail les dĂ©fis susmentionnĂ©s. Comme le spam social est le problĂšme le plus rĂ©curant qui apparaĂźt sur les OSM, nous proposons deux approches gĂ©nĂ©riques pour dĂ©tecter et filtrer le contenu spam : i) La premiĂšre approche consiste Ă  dĂ©tecter le contenu spam (par exemple, les tweets spam) dans un flux en temps rĂ©el. ii) La seconde approche est dĂ©diĂ©e au traitement d'un grand volume des donnĂ©es relatives aux profils utilisateurs des spammeurs (par exemple, les comptes Twitter). Pour filtrer le contenu spam en temps rĂ©el, nous introduisons une approche d'apprentissage non supervisĂ©e qui permet le filtrage en temps rĂ©el des tweets spams dans laquelle la fonction de classification est adaptĂ©e automatiquement. La fonction de classification est entraĂźnĂ© de maniĂšre itĂ©rative et ne requiĂšre pas une collection de donnĂ©es annotĂ©es manuellement. Dans la deuxiĂšme approche, nous traitons le problĂšme de classification des profils utilisateurs dans le contexte d'une collection de donnĂ©es Ă  grande Ă©chelle. Nous proposons de faire une recherche dans un espace rĂ©duit de profils utilisateurs (une communautĂ© d'utilisateurs) au lieu de traiter chaque profil d'utilisateur Ă  part. Ensuite, chaque profil qui appartient Ă  cet espace rĂ©duit est analysĂ© pour prĂ©dire sa classe Ă  l'aide d'un modĂšle de classification binaire. Les expĂ©riences menĂ©es sur Twitter ont montrĂ© que le modĂšle de classification collective non supervisĂ© proposĂ© est capable de gĂ©nĂ©rer une fonction efficace de classification binaire en temps rĂ©el des tweets qui s'adapte avec l'Ă©volution des stratĂ©gies des spammeurs sociaux sur Twitter. L'approche proposĂ©e surpasse les performances de deux mĂ©thodes de l'Ă©tat de l'art de dĂ©tection de spam en temps rĂ©el. Les rĂ©sultats de la deuxiĂšme approche ont dĂ©montrĂ© que l'extraction des mĂ©tadonnĂ©es des spams et leur exploitation dans le processus de recherche de profils de spammeurs est rĂ©alisable dans le contexte de grandes collections de profils Twitter. L'approche proposĂ©e est une alternative au traitement de tous les profils existants dans le OSM.The popularity of OSM is mainly conditioned by the integrity and the quality of UGC as well as the protection of users' privacy. Based on the definition of information quality as fitness for use, the high usability and accessibility of OSM have exposed many information quality (IQ) problems which consequently decrease the performance of OSM dependent applications. Such problems are caused by ill-intentioned individuals who misuse OSM services to spread different kinds of noisy information, including fake information, illegal commercial content, drug sales, mal- ware downloads, and phishing links. The propagation and spreading of noisy information cause enormous drawbacks related to resources consumptions, decreasing quality of service of OSM-based applications, and spending human efforts. The majority of popular social networks (e.g., Facebook, Twitter, etc) over the Web 2.0 is daily attacked by an enormous number of ill-intentioned users. However, those popular social networks are ineffective in handling the noisy information, requiring several weeks or months to detect them. Moreover, different challenges stand in front of building a complete OSM-based noisy information filtering methods that can overcome the shortcomings of OSM information filters. These challenges are summarized in: (i) big data; (ii) privacy and security; (iii) structure heterogeneity; (iv) UGC format diversity; (v) subjectivity and objectivity; (vi) and service limitations In this thesis, we focus on increasing the quality of social UGC that are published and publicly accessible in forms of posts and profiles over OSNs through addressing in-depth the stated serious challenges. As the social spam is the most common IQ problem appearing over the OSM, we introduce a design of two generic approaches for detecting and filtering out the spam content. The first approach is for detecting the spam posts (e.g., spam tweets) in a real-time stream, while the other approach is dedicated for handling a big data collection of social profiles (e.g., Twitter accounts). For filtering the spam content in real-time, we introduce an unsupervised collective-based framework that automatically adapts a supervised spam tweet classification function in order to have an updated real-time classifier without requiring manual annotated data-sets. In the second approach, we treat the big data collections through minimizing the search space of profiles that needs advanced analysis, instead of processing every user's profile existing in the collections. Then, each profile falling in the reduced search space is further analyzed in an advanced way to produce an accurate decision using a binary classification model. The experiments conducted on Twitter online social network have shown that the unsupervised collective-based framework is able to produce updated and effective real- time binary tweet-based classification function that adapts the high evolution of social spammer's strategies on Twitter, outperforming the performance of two existing real- time spam detection methods. On the other hand, the results of the second approach have demonstrated that performing a preprocessing step for extracting spammy meta-data values and leveraging them in the retrieval process is a feasible solution for handling a large collections of Twitter profiles, as an alternative solution for processing all profiles existing in the input data collection. The introduced approaches open different opportunities for information science researcher to leverage our solutions in other information filtering problems and applications. Our long term perspective consists of (i) developing a generic platform covering most common OSM for instantly checking the quality of a given piece of information where the forms of the input information could be profiles, website links, posts, and plain texts; (ii) and transforming and adapting our methods to handle additional IQ problems such as rumors and information overloading
    corecore