42,172 research outputs found
AntiPlag: Plagiarism Detection on Electronic Submissions of Text Based Assignments
Plagiarism is one of the growing issues in academia and is always a concern
in Universities and other academic institutions. The situation is becoming even
worse with the availability of ample resources on the web. This paper focuses
on creating an effective and fast tool for plagiarism detection for text based
electronic assignments. Our plagiarism detection tool named AntiPlag is
developed using the tri-gram sequence matching technique. Three sets of text
based assignments were tested by AntiPlag and the results were compared against
an existing commercial plagiarism detection tool. AntiPlag showed better
results in terms of false positives compared to the commercial tool due to the
pre-processing steps performed in AntiPlag. In addition, to improve the
detection latency, AntiPlag applies a data clustering technique making it four
times faster than the commercial tool considered. AntiPlag could be used to
isolate plagiarized text based assignments from non-plagiarised assignments
easily. Therefore, we present AntiPlag, a fast and effective tool for
plagiarism detection on text based electronic assignments
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
Mapping the Bid Behavior of Conference Referees
The peer-review process, in its present form, has been repeatedly criticized.
Of the many critiques ranging from publication delays to referee bias, this
paper will focus specifically on the issue of how submitted manuscripts are
distributed to qualified referees. Unqualified referees, without the proper
knowledge of a manuscript's domain, may reject a perfectly valid study or
potentially more damaging, unknowingly accept a faulty or fraudulent result. In
this paper, referee competence is analyzed with respect to referee bid data
collected from the 2005 Joint Conference on Digital Libraries (JCDL). The
analysis of the referee bid behavior provides a validation of the intuition
that referees are bidding on conference submissions with regards to the subject
domain of the submission. Unfortunately, this relationship is not strong and
therefore suggests that there exists other factors beyond subject domain that
may be influencing referees to bid for particular submissions
Topic Modelling of Everyday Sexism Project Entries
The Everyday Sexism Project documents everyday examples of sexism reported by
volunteer contributors from all around the world. It collected 100,000 entries
in 13+ languages within the first 3 years of its existence. The content of
reports in various languages submitted to Everyday Sexism is a valuable source
of crowdsourced information with great potential for feminist and gender
studies. In this paper, we take a computational approach to analyze the content
of reports. We use topic-modelling techniques to extract emerging topics and
concepts from the reports, and to map the semantic relations between those
topics. The resulting picture closely resembles and adds to that arrived at
through qualitative analysis, showing that this form of topic modeling could be
useful for sifting through datasets that had not previously been subject to any
analysis. More precisely, we come up with a map of topics for two different
resolutions of our topic model and discuss the connection between the
identified topics. In the low resolution picture, for instance, we found Public
space/Street, Online, Work related/Office, Transport, School, Media harassment,
and Domestic abuse. Among these, the strongest connection is between Public
space/Street harassment and Domestic abuse and sexism in personal
relationships.The strength of the relationships between topics illustrates the
fluid and ubiquitous nature of sexism, with no single experience being
unrelated to another.Comment: preprint, under revie
Recommended from our members
Investigation of the use of navigation tools in web-based learning: A data mining approach
Web-based learning is widespread in educational settings. The popularity of Web-based learning is in great measure because of its flexibility. Multiple navigation tools provided some of this flexibility. Different navigation tools offer different functions. Therefore, it is important to understand how the navigation tools are used by learners with different backgrounds, knowledge, and skills. This article presents two empirical studies in which data-mining approaches were used to analyze learners' navigation behavior. The results indicate that prior knowledge and subject content are two potential factors influencing the use of navigation tools. In addition, the lack of appropriate use of navigation tools may adversely influence learning performance. The results have been integrated into a model that can help designers develop Web-based learning programs and other Web-based applications that can be tailored to learners' needs
Green OFDMA Resource Allocation in Cache-Enabled CRAN
Cloud radio access network (CRAN), in which remote radio heads (RRHs) are
deployed to serve users in a target area, and connected to a central processor
(CP) via limited-capacity links termed the fronthaul, is a promising candidate
for the next-generation wireless communication systems. Due to the
content-centric nature of future wireless communications, it is desirable to
cache popular contents beforehand at the RRHs, to reduce the burden on the
fronthaul and achieve energy saving through cooperative transmission. This
motivates our study in this paper on the energy efficient transmission in an
orthogonal frequency division multiple access (OFDMA)-based CRAN with multiple
RRHs and users, where the RRHs can prefetch popular contents. We consider a
joint optimization of the user-SC assignment, RRH selection and transmit power
allocation over all the SCs to minimize the total transmit power of the RRHs,
subject to the RRHs' individual fronthaul capacity constraints and the users'
minimum rate constraints, while taking into account the caching status at the
RRHs. Although the problem is non-convex, we propose a Lagrange duality based
solution, which can be efficiently computed with good accuracy. We compare the
minimum transmit power required by the proposed algorithm with different
caching strategies against the case without caching by simulations, which show
the significant energy saving with caching.Comment: Presented in IEEE Online Conference on Green Communications (Online
GreenComm), Nov. 2016 (Invited Paper
Reinforcement machine learning for predictive analytics in smart cities
The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal smart devices as well as the Internet of Things (IoT) paradigm lead to a vast infrastructure that covers all the aspects of activities in modern societies. In the most of the cases, the critical issue for public authorities (usually, local, like municipalities) is the efficient management of data towards the support of novel services. The reason is that analytics provided on top of the collected data could help in the delivery of new applications that will facilitate citizens’ lives. However, the provision of analytics demands intelligent techniques for the underlying data management. The most known technique is the separation of huge volumes of data into a number of parts and their parallel management to limit the required time for the delivery of analytics. Afterwards, analytics requests in the form of queries could be realized and derive the necessary knowledge for supporting intelligent applications. In this paper, we define the concept of a Query Controller ( QC ) that receives queries for analytics and assigns each of them to a processor placed in front of each data partition. We discuss an intelligent process for query assignments that adopts Machine Learning (ML). We adopt two learning schemes, i.e., Reinforcement Learning (RL) and clustering. We report on the comparison of the two schemes and elaborate on their combination. Our aim is to provide an efficient framework to support the decision making of the QC that should swiftly select the appropriate processor for each query. We provide mathematical formulations for the discussed problem and present simulation results. Through a comprehensive experimental evaluation, we reveal the advantages of the proposed models and describe the outcomes results while comparing them with a deterministic framework
- …