29 research outputs found
Scaling a Plagiarism Search Service on the BonFIRE Testbed
The KOPI Online Plagiarism Search Portal ? a nationwide plagiarism service in Hungary ? is a unique, open service for web users that enables them to check for identical or similar contents between their own documents and the files uploaded by other authors. As our recent result, we can also detect cross-language plagiarism, but with a highly increased computational demand. The paper describes our experiment with the BonFIRE testbed to find a suitable scaling mechanism for translational plagiarism detection in a cloud federation
Content-based trust and bias classification via biclustering
In this paper we improve trust, bias and factuality classification over Web data on the domain level. Unlike the majority of literature in this area that aims at extracting opinion and handling short text on the micro level, we aim to aid a researcher or an archivist in obtaining a large collection that, on the high level, originates from unbiased and trustworthy sources. Our method generates features as Jensen-Shannon distances from centers in a host-term biclustering. On top of the distance features, we apply kernel methods and also combine with baseline text classifiers. We test our method on the ECML/PKDD Discovery Challenge data set DC2010. Our method improves over the best achieved text classification NDCG results by over 3--10% for neutrality, bias and trustworthiness. The fact that the ECML/PKDD Discovery Challenge 2010 participants reached an AUC only slightly above 0.5 indicates the hardness of the task
Smokers’ Engagement Behavior on Facebook: Verbalizing and Visual Expressing the Smoking Cessation Process
The “processes of change” and “motivational language” are common in smoker Facebook users’ comments under smoking cessation support contents. Smokers can combine this verbalization of the smoking cessation process with visual expression when they use comments and Facebook reactions at the same time. The aim of this study was to understand the relationship between processes of change, motivational language, and the Facebook reaction buttons. A total of 821 smokers’ comments were analyzed in the current study (n = 821), which responded to image-based smoking cessation support contents. The processes of change and the motivational language used in the investigated comments were identified. These linguistic categories were compared with the usage of reaction buttons. The Facebook users who used the “Haha” reaction button wrote a significantly higher proportion of sustain talk than those who used the “Like” or “Love” reaction buttons. The Facebook users who combined the comment and “Love” reaction wrote significantly more change talk than those who did not utilize these buttons. We suggest that the “Haha” reaction may be a negative indicator, the “Like” reaction may be a neutral indicator, and the “Love” reaction may be a positive engagement indicator in terms of the smoking cessation process during Facebook-based interventions. These results may highlight how to evaluate Facebook reactions relating to smoking cessation support contents
How to Avoid Lower Priority for Smoking Cessation Support Content on Facebook: An Analysis of Engagement Bait
Facebook demotes “engagement bait” content that makes people interact. As a result of this sanctioning, public health content can reach fewer Facebook users. This study aims to determine the negative effect of engagement bait and find alternative techniques. In a three-year period, 791 smoking cessation support content was included (n = 791). The Facebook posts were classified into “engagement bait”, “alternative techniques” and control groups. Facebook metrics were compared between the study and control groups. The reach of Facebook page fans was significantly lower in the engagement bait group compared to the control group. On the other hand, the alternative techniques had a significantly lower rate of negative Facebook interactions, as well as significantly higher click rates compared to the control group. This is the first study to reveal the sanctioning of engagement bait on smoking cessation support Facebook posts. “Engagement bait” content has a lower ranking on the Facebook Fans’ Newsfeed page. Nevertheless, alternative techniques can circumvent the restrictions on engagement bait. At the same time, alternative techniques can stimulate the click rate and inhibit the rate of negative interactions
Alkalmazott algoritmusok nagyméretű feladatokra = Applied algorithms for large-scale problems
Alap Ă©s alkalmazott kutatást vĂ©geztĂĽnk a következĹ‘ fĹ‘ terĂĽleteken: - Formális matematikai mĂłdszerek adatbányászatban Ă©s optimalizálásban; - NagymĂ©retű adatok elemzĂ©se Ă©s modellezĂ©se, hálĂłzatokkal kapcsolatos ĂĽzleti intelligencia alkalmazásokban; - FelhasználĂł Ă©s tartalom összerendelĂ©se, keresĂ©s, ajánlás. A projekt rĂ©sztvevĹ‘i zárt láncban a teljes innováciĂłs láncot lefedik az oktatástĂłl (ELTE Ă©s BME algoritmusok, adatbányászat, Web informáciĂł-keresĂ©s elĹ‘adások) az elmĂ©leti kutatásokon át az alkalmazásokig. A kutatáshoz kapcsolĂłdĂł legfontosabb kĂ©t ipari partnerĂĽnk a Magyar Telekom Ă©s az AEGON, amelyek számára egyedi keresĹ‘ megoldásokat fejlesztettĂĽnk, naplĂłelemzĂ©si Ă©s ĂĽgyfĂ©l-elemzĂ©si feladatokat oldottunk meg. EurĂłpai kapcsolataink segĂtsĂ©gĂ©vel a jelen kutatási eredmĂ©nyekre Ă©pĂĽlĹ‘ Digitális Könyvtárak Ă©s Biztonság tĂ©májĂş projektben veszĂĽnk rĂ©szt. A kutatásunk nemzetközi elismertsĂ©gĂ©t jelzi, hogy felkĂ©rtek a legjelentĹ‘sebb eurĂłpai adatbányászati verseny, az ECML/PKDD Discovery Challenge szervezĂ©sĂ©re, illetve a legrangosabb World Wide Web konferencián Workshop Chair, a WSDM (Web Search and Data Mining) konferencián szenior, további kapcsolĂłdĂł tĂ©májĂş konferencián Ă©s workshopon (ICALP, AIRWeb, ESA stb) programbizottági tagot adunk. Legfontosabb eredmĂ©nyeink: - ElĹ‘relĂ©pĂ©st a vĂ©ges testek feletti polinomfelbontás algoritmusaiban; - DĂjnyertes megoldás a KDD Cup 2009 feladaton; - Ăšj Web Spam szűrĹ‘ mĂłdszerek; - Tartalom alapĂş kĂ©pkeresĹ‘ eljárások. | Our results cover a wide range of areas of theory and application: -Formal mathematical methods in data mining and optimization; -Analysis and modeling very large scale data with applications in the areas of network related business intelligence; -User-content interaction, optimization. The project team covers full innovation chain from Education (Technical University and Eötvös University courses in algorithms, data mining, Web information retrieval), Pure, Applied Research and Innovation. Our industrial exploitation include the Hungarian Telecom Group and AEGON Hungary where we developed custom search engines and conducted log mining and business intelligence projects. Based on the reported results, we participated in several Digital Libraries and Security ICT projects. Our results are acknowledged by being the main organizer of the major European data mining contest, the ECML/PKDD Discovery Challenge 2010 and the invitation to serve as Workshop Chair at the highest prestige World Wide Web conference, senoir program committee member at the Web Search and Data Mining conferences, and PC member of other related conferences and workshops (ICALP, AIRWeb, ESA etc). Our most important research results include -Breakthrough algorithms in factorization of polynomials over finite fields; -Prize winner solution at KDD Cup 2009, in a telco classification task; -New methodologies in Web Spam filtering; -Content-based multimedia indexing methods