4 research outputs found
Real-World Image Restoration Using Degradation Adaptive Transformer-Based Adversarial Network
Most existing learning-based image restoration methods heavily rely on paired degraded/non-degraded training datasets that are based on simplistic handcrafted degradation assumptions. These assumptions often involve a limited set of degradations, such as Gaussian blurs, noises, and bicubic downsampling. However, when these methods are applied to real-world images, there is a significant decrease in performance due to the discrepancy between synthetic and realistic degradation. Additionally, they lack the flexibility to adapt to unknown degradations in practical scenarios, which limits their generalizability to complex and unconstrained scenes.
To address the absence of image pairs, recent studies have proposed Generative Adversarial Network (GAN)-based unpaired methods. Nevertheless, unpaired learning models based on convolution operations encounter challenges in capturing long-range pixel dependencies in real-world images. This limitation stems from their reliance on convolution operations, which offer local connectivity and translation equivariance but struggle to capture global dependencies due to their limited receptive field.
To address these challenges, this dissertation proposed an innovative unpaired image restoration basic model along with an advanced model. The proposed basic model is the DA-CycleGAN model, which is based on the CycleGAN [1] neural network and specifically designed for blind real-world Single Image Super-Resolution (SISR). The DA-CycleGAN incorporates a degradation adaptive (DA) module to learn various real-world degradations (such as noise and blur patterns) in an unpaired manner, enabling strong flexible adaptation. Additionally, an advanced model called Trans-CycleGAN was designed, which integrated the Transformer architecture into CycleGAN to leverage its global connectivity. This combination allowed for image-to-image translation using CycleGAN [1] while enabling the Transformer to model global connectivity across long-range pixels. Extensive experiments conducted on realistic images demonstrate the superior performance of the proposed method in solving real-world image restoration problems, resulting in clearer and finer details.
Overall, this dissertation presents a novel unpaired image restoration basic model and an advanced model that effectively address the limitations of existing approaches. The proposed approach achieves significant advancements in handling real-world degradations and modeling long-range pixel dependencies, thereby offering substantial improvements in image restoration tasks.
Index Termsâ Cross-domain translation, generative adversarial network, image restoration, super-resolution, transformer, unpaired training
Detecting Social Spamming on Facebook Platform
TĂ€napĂ€eval toimub vĂ€ga suur osa kommunikatsioonist elektroonilistes suhtlusvĂ”rgustikes. Ăhest kĂŒljest lihtsustab see omavahelist suhtlemist ja uudiste levimist, teisest kĂŒljest loob see ideaalse pinnase sotsiaalse rĂ€mpsposti levikuks. Rohkem kui kahe miljardi kasutajaga Facebooki platvorm on hetkel rĂ€mpsposti levitajate ĂŒks pĂ”hilisi sihtmĂ€rke. Platvormi kasutajad puutuvad igapĂ€evaselt kokku ohtude ja ebameeldivustega nagu pahavara levitavad lingid, vulgaarsused, vihakĂ”ned, kĂ€ttemaksuks levitatav porno ja muu. Kuigi uurijad on esitanud erinevaid tehnikaid sotsiaalmeedias rĂ€mpspostituste vĂ€hendamiseks, on neid rakendatud eelkĂ”ige Twitteri platvormil ja vaid vĂ€hesed on seda teinud Facebookis. Pidevalt arenevate rĂ€mpspostitusmeetoditega vĂ”itlemiseks tuleb vĂ€lja töötada jĂ€rjest uusi rĂ€mpsposti avastamise viise. KĂ€esolev magistritöö keskendub Facebook platvormile, kuhu on lĂ”putöö raames paigutatud kĂŒmme âmeepurkiâ (ingl honeypot), mille abil mÀÀratakse kindlaks vĂ€ljakutsed rĂ€mpsposti tuvastamisel, et pakkuda tĂ”husamaid lahendusi. Kasutades kĂ”iki sisendeid, kaasa arvatud varem mujal sotsiaalmeedias testitud meetodid ja informatsioon âmeepurkidestâ, luuakse andmekaeve ja masinĂ”ppe meetoditele tuginedes klassifikaator, mis suudab eristada rĂ€mpspostitaja profiili tavakasutaja profiilist. Nimetatu saavutamiseks vaadeldakse esmalt peamisi vĂ€ljakutseid ja piiranguid rĂ€mpsposti tuvastamisel ning esitletakse varasemalt tehtud uuringuid koos tulemustega. SeejĂ€rel kirjeldatakse rakenduslikku protsessi, alustades âmeepurgiâ ehitusest, andmete kogumisest ja ettevalmistamisest kuni klassifikaatori ehitamiseni. LĂ”puks esitatakse âmeepurkideltâ saadud vaatlusandmed koos klassifikaatori tulemustega ning vĂ”rreldakse neid uurimistöödega teiste sotsiaalmeedia platvormide kohta. Selle lĂ”putöö peamine panus on klassifikaator, mis suudab eristada Facebooki kasutaja profiilid spĂ€mmerite omast. Selle lĂ”putöö originaalsus seisneb eesmĂ€rgis avastada erinevat sotsiaalset spĂ€mmi, mitte ainult pahavara levitajaid vaid ka neid, kes levitavad roppust, massiliselt sĂ”numeid, heakskiitmata sisu jne.OSNs (Online Social Networks) are dominating the human interaction nowadays, easing the communication and spreading of news on one hand and providing a global fertile soil to grow all different kinds of social spamming, on the other. Facebook platform, with its 2 billions current active users, is currently on the top of the spammers' targets. Its users are facing different kind of social threats everyday, including malicious links, profanity, hate speech, revenge porn and others. Although many researchers have presented their different techniques to defeat spam on social media, specially on Twitter platform, very few have targeted Facebook's.To fight the continuously evolving spam techniques, we have to constantly develop and enhance the spam detection methods. This research digs deeper in the Facebook platform, through 10 implemented honeypots, to state the challenges that slow the spam detection process, and ways to overcome it. Using all the given inputs, including the previous techniques tested on other social medias along with observations driven from the honeypots, the final product is a classifier that distinguish the spammer profiles from legitimate ones through data mining and machine learning techniques. To achieve this, the research first overviews the main challenges and limitations that obstruct the spam detection process, and presents the related researches with their results. It then, outlines the implementation steps, from the honeypot construction step, passing through the data collection and preparation and ending by building the classifier itself. Finally, it presents the observations driven from the honeypot and the results from the classifier and validates it against the results from previous researches on different social platforms. The main contribution of this thesis is the end classifier which will be able to distinguish between the legitimate Facebook profiles and the spammer ones. The originality of the research lies in its aim to detect all kind of social spammers, not only the spreading-malware spammers, but also spamming in its general context, e.g. the ones spreading profanity, bulk messages and unapproved contents
A Topic-Based Hidden Markov Model for Real-Time Spam Tweets Filtering
International audienceOnline social networks (OSNs) have become an important source of information for a tremendous range of applications and researches such as search engines, and summarization systems. However, the high usability and accessibility of OSNs have exposed many information quality (IQ) problems which consequently decrease the performance of the OSNs dependent applications. Social spammers are a particular kind of ill-intentioned users who degrade the quality of OSNs information through misusing all possible services provided by OSNs. Social spammers spread many intensive posts/tweets to lure legitimate users to malicious or commercial sites containing malware downloads, phishing, and drug sales. Given the fact that Twitter is not immune towards the social spam problem, different researchers have designed various detection methods which inspect individual tweets or accounts for the existence of spam contents. However, although of the high detection rates of the account-based spam detection methods, these methods are not suitable for filtering tweets in the real-time detection because of the need for information from Twitterâs servers. At tweet spam detection level, many light features have been proposed for real-time filtering; however, the existing classification models separately classify a tweet without considering the state of previous handled tweets associated with a topic. Also, these models periodically require retraining using a ground-truth data to make them up-to-date. Hence, in this paper, we formalize a Hidden Markov Model (HMM) as a time-dependent model for real-time topical spam tweets filtering. More precisely, our method only leverages the available and accessible meta-data in the tweet object to detect spam tweets exiting in a stream of tweets related to a topic (e.g., #Trump), with considering the state of previously handled tweets associated to the same topic. Compared to the classical time-independent classification methods such as Random Forest, the experimental evaluation demonstrates the efficiency of increasing the quality of topics in terms of precision, recall, and F-measure performance metrics
Information quality in online social media and big data collection: an example of Twitter spam detection
La popularité des médias sociaux en ligne (Online Social Media - OSM) est fortement liée à la qualité du contenu généré par l'utilisateur (User Generated Content - UGC) et la
protection de la vie privĂ©e des utilisateurs. En se basant sur la dĂ©finition de la qualitĂ© de l'information, comme son aptitude Ă ĂȘtre exploitĂ©e, la facilitĂ© d'utilisation des
OSM soulÚve de nombreux problÚmes en termes de la qualité de l'information ce qui impacte les performances des applications exploitant ces OSM. Ces problÚmes sont causés par des
individus mal intentionnés (nommés spammeurs) qui utilisent les OSM pour disséminer des fausses informations et/ou des informations indésirables telles que les contenus
commerciaux illégaux. La propagation et la diffusion de telle information, dit spam, entraßnent d'énormes problÚmes affectant la qualité de services proposés par les OSM.
La majorité des OSM (comme Facebook, Twitter, etc.) sont quotidiennement attaquées par un énorme nombre d'utilisateurs mal intentionnés. Cependant, les techniques de filtrage
adoptées par les OSM se sont avérées inefficaces dans le traitement de ce type d'information bruitée, nécessitant plusieurs semaines ou voir plusieurs mois pour filtrer
l'information spam. En effet, plusieurs dĂ©fis doivent ĂȘtre surmontĂ©es pour rĂ©aliser une mĂ©thode de filtrage de l'information bruitĂ©e . Les dĂ©fis majeurs sous-jacents Ă cette
problĂ©matique peuvent ĂȘtre rĂ©sumĂ©s par : (i) donnĂ©es de masse ; (ii) vie privĂ©e et sĂ©curitĂ© ; (iii) hĂ©tĂ©rogĂ©nĂ©itĂ© des structures dans les rĂ©seaux sociaux ; (iv) diversitĂ© des
formats du UGC ; (v) subjectivité et objectivité.
Notre travail s'inscrit dans le cadre de l'amélioration de la qualité des contenus en termes de messages partagés (contenu spam) et de profils des utilisateurs (spammeurs) sur
les OSM en abordant en détail les défis susmentionnés. Comme le spam social est le problÚme le plus récurant qui apparaßt sur les OSM, nous proposons deux approches génériques
pour détecter et filtrer le contenu spam : i) La premiÚre approche consiste à détecter le contenu spam (par exemple, les tweets spam) dans un flux en temps réel. ii) La seconde
approche est dédiée au traitement d'un grand volume des données relatives aux profils utilisateurs des spammeurs (par exemple, les comptes Twitter).
Pour filtrer le contenu spam en temps réel, nous introduisons une approche d'apprentissage non supervisée qui permet le filtrage en temps réel des tweets spams dans laquelle la
fonction de classification est adaptée automatiquement. La fonction de classification est entraßné de maniÚre itérative et ne requiÚre pas une collection de données annotées
manuellement.
Dans la deuxiÚme approche, nous traitons le problÚme de classification des profils utilisateurs dans le contexte d'une collection de données à grande échelle. Nous proposons de
faire une recherche dans un espace réduit de profils utilisateurs (une communauté d'utilisateurs) au lieu de traiter chaque profil d'utilisateur à part. Ensuite, chaque profil
qui appartient à cet espace réduit est analysé pour prédire sa classe à l'aide d'un modÚle de classification binaire.
Les expériences menées sur Twitter ont montré que le modÚle de classification collective non supervisé proposé est capable de générer une fonction efficace de classification
binaire en temps réel des tweets qui s'adapte avec l'évolution des stratégies des spammeurs sociaux sur Twitter. L'approche proposée surpasse les performances de deux méthodes
de l'état de l'art de détection de spam en temps réel. Les résultats de la deuxiÚme approche ont démontré que l'extraction des métadonnées des spams et leur exploitation dans le
processus de recherche de profils de spammeurs est réalisable dans le contexte de grandes collections de profils Twitter. L'approche proposée est une alternative au traitement
de tous les profils existants dans le OSM.The popularity of OSM is mainly conditioned by the integrity and the quality of UGC as well as the protection of users' privacy. Based on the definition of information quality
as fitness for use, the high usability and accessibility of OSM have exposed many information quality (IQ) problems which consequently decrease the performance of OSM dependent
applications. Such problems are caused by ill-intentioned individuals who misuse OSM services to spread different kinds of noisy information, including fake information, illegal
commercial content, drug sales, mal- ware downloads, and phishing links. The propagation and spreading of noisy information cause enormous drawbacks related to resources
consumptions, decreasing quality of service of OSM-based applications, and spending human efforts.
The majority of popular social networks (e.g., Facebook, Twitter, etc) over the Web 2.0 is daily attacked by an enormous number of ill-intentioned users. However, those popular
social networks are ineffective in handling the noisy information, requiring several weeks or months to detect them. Moreover, different challenges stand in front of building a
complete OSM-based noisy information filtering methods that can overcome the shortcomings of OSM information filters. These challenges are summarized in: (i) big data; (ii)
privacy and security; (iii) structure heterogeneity; (iv) UGC format diversity; (v) subjectivity and objectivity; (vi) and service limitations
In this thesis, we focus on increasing the quality of social UGC that are published and publicly accessible in forms of posts and profiles over OSNs through addressing in-depth
the stated serious challenges. As the social spam is the most common IQ problem appearing over the OSM, we introduce a design of two generic approaches for detecting and
filtering out the spam content. The first approach is for detecting the spam posts (e.g., spam tweets) in a real-time stream, while the other approach is dedicated for handling
a big data collection of social profiles (e.g., Twitter accounts). For filtering the spam content in real-time, we introduce an unsupervised collective-based framework that
automatically adapts a supervised spam tweet classification function in order to have an updated real-time classifier without requiring manual annotated data-sets. In the second
approach, we treat the big data collections through minimizing the search space of profiles that needs advanced analysis, instead of processing every user's profile existing in
the collections. Then, each profile falling in the reduced search space is further analyzed in an advanced way to produce an accurate decision using a binary classification
model.
The experiments conducted on Twitter online social network have shown that the unsupervised collective-based framework is able to produce updated and effective real- time binary
tweet-based classification function that adapts the high evolution of social spammer's strategies on Twitter, outperforming the performance of two existing real- time spam
detection methods. On the other hand, the results of the second approach have demonstrated that performing a preprocessing step for extracting spammy meta-data values and
leveraging them in the retrieval process is a feasible solution for handling a large collections of Twitter profiles, as an alternative solution for processing all profiles
existing in the input data collection.
The introduced approaches open different opportunities for information science researcher to leverage our solutions in other information filtering problems and applications. Our
long term perspective consists of (i) developing a generic platform covering most common OSM for instantly checking the quality of a given piece of information where the forms
of the input information could be profiles, website links, posts, and plain texts; (ii) and transforming and adapting our methods to handle additional IQ problems such as rumors
and information overloading