Search CORE

1,259 research outputs found

Text stream mining for Massive Open Online Courses:review and perspectives

Author: Cocea Mihaela
Gaber Mohamad Medhat
Shatnawi Safwan
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2014
Field of study

Portsmouth University Research Portal (Pure)

Prescription Based Recommender System for Diabetic Patients Using Efficient Map Reduce

Author: Bateja Ritika
Bhatt Ashutosh
Dubey Sanjay Kumar
Publication venue: 'Faculty of Engineering, Chulalongkorn University'
Publication date: 31/10/2022
Field of study

Healthcare sector has been deprived of leveraging knowledge gained through data insights, due to manual processes and legacy record-keeping methods. Outdated methods for maintaining healthcare records have not been proven sufficient for treating chronic diseases like diabetes. Data analysis methods such as Recommendation System (RS) can serve as a boon for treating diabetes. RS leverages predictive analysis and provides clinicians with information needed to determine the treatments to patients. Prescription-based Health Recommender System (HRS) is proposed in this paper which aids in recommending treatments by learning from the treatments prescribed to other patients diagnosed with diabetes. An Advanced Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering is also proposed to cluster the data for deriving recommendations by using winnowing algorithm as a similarity measure. A parallel processing of data is applied using map-reduce to increase the efficiency & scalability of clustering process for effective treatment of diabetes. This paper provides a good picture of how the Map Reduce can benefit in increasing the efficiency and scalability of the HRS using clustering

Engineering Journal (Faculty of Engineering, Chulalongkorn University, Bangkok)

AntiPlag: Plagiarism Detection on Electronic Submissions of Text Based Assignments

Author: Deegalla S.
Jahan M. A. C. Akmal
Jiffriya M. A. C.
Ragel R. G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/03/2014
Field of study

Plagiarism is one of the growing issues in academia and is always a concern in Universities and other academic institutions. The situation is becoming even worse with the availability of ample resources on the web. This paper focuses on creating an effective and fast tool for plagiarism detection for text based electronic assignments. Our plagiarism detection tool named AntiPlag is developed using the tri-gram sequence matching technique. Three sets of text based assignments were tested by AntiPlag and the results were compared against an existing commercial plagiarism detection tool. AntiPlag showed better results in terms of false positives compared to the commercial tool due to the pre-processing steps performed in AntiPlag. In addition, to improve the detection latency, AntiPlag applies a data clustering technique making it four times faster than the commercial tool considered. AntiPlag could be used to isolate plagiarized text based assignments from non-plagiarised assignments easily. Therefore, we present AntiPlag, a fast and effective tool for plagiarism detection on text based electronic assignments

arXiv.org e-Print Archive

Crossref

Predicting Rising Follower Counts on Twitter Using Profile Information

Author: Bandari Roja
Gaudeul Alexia
Kaiser Astrid
Noro Tomoya
Oliver J. Eric
Razis Gerasimos
Srinivasan M. S.
Tsur Oren
Twitter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/05/2017
Field of study

When evaluating the cause of one's popularity on Twitter, one thing is considered to be the main driver: Many tweets. There is debate about the kind of tweet one should publish, but little beyond tweets. Of particular interest is the information provided by each Twitter user's profile page. One of the features are the given names on those profiles. Studies on psychology and economics identified correlations of the first name to, e.g., one's school marks or chances of getting a job interview in the US. Therefore, we are interested in the influence of those profile information on the follower count. We addressed this question by analyzing the profiles of about 6 Million Twitter users. All profiles are separated into three groups: Users that have a first name, English words, or neither of both in their name field. The assumption is that names and words influence the discoverability of a user and subsequently his/her follower count. We propose a classifier that labels users who will increase their follower count within a month by applying different models based on the user's group. The classifiers are evaluated with the area under the receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy, NY, US

arXiv.org e-Print Archive

Crossref

Automated Crowdturfing Attacks and Defenses in Online Review Systems

Author: Arisoy Ebru
Fei Geli
Kakhki Arash Molavi
Kim Gyuwan
Lee Kyumin
Lee Kyumin
Li Fangtao
Maas Andrew L.
Maxwell Harper F.
Mukherjee Arjun
Sutskever Ilya
Publication venue
Publication date: 07/09/2017
Field of study

Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers

arXiv.org e-Print Archive

Crossref

Evaluation and Implementation of n-Gram-Based Algorithm for Fast Text Comparison

Author: Jamro Ernest
Pietroń Marcin
Russek Pawel
Szczepka Paweł
Wiatr Kazimierz
Wielgosz Maciej
Żurek Dominik
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 29/11/2017
Field of study

This paper presents a study of an n-gram-based document comparison method. The method is intended to build a large-scale plagiarism detection system. The work focuses not only on an efficiency of the text similarity extraction but also on the execution performance of the implemented algorithms. We took notice of detection performance, storage requirements and execution time of the proposed approach. The obtained results show the trade-offs between detection quality and computational requirements. The GPGPU and multi-CPU platforms were considered to implement the algorithms and to achieve good execution speed. The method consists of two main algorithms: a document's feature extraction and fast text comparison. The winnowing algorithm is used to generate a compressed representation of the analyzed documents. The authors designed and implemented a dedicated test framework for the algorithm. That allowed for the tuning, evaluation, and optimization of the parameters. Well-known metrics (e.g. precision, recall) were used to evaluate detection performance. The authors conducted the tests to determine the performance of the winnowing algorithm for obfuscated and unobfuscated texts for a different window and n-gram size. Also, a simplified version of the text comparison algorithm was proposed and evaluated to reduce the computational complexity of the text comparison process. The paper also presents GPGPU and multi-CPU implementations of the algorithms for different data structures. The implementation speed was tested for different algorithms' parameters and the size of data. The scalability of the algorithm on multi-CPU platforms was verified. The authors of the paper provide the repository of software tools and programs used to perform the conducted experiments.he appropriate fast document comparison system. Its performance is given in the paper

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)