504 research outputs found
Testing of Support Tools for Plagiarism Detection
There is a general belief that software must be able to easily do things that
humans find difficult. Since finding sources for plagiarism in a text is not an
easy task, there is a wide-spread expectation that it must be simple for
software to determine if a text is plagiarized or not. Software cannot
determine plagiarism, but it can work as a support tool for identifying some
text similarity that may constitute plagiarism. But how well do the various
systems work? This paper reports on a collaborative test of 15 web-based
text-matching systems that can be used when plagiarism is suspected. It was
conducted by researchers from seven countries using test material in eight
different languages, evaluating the effectiveness of the systems on
single-source and multi-source documents. A usability examination was also
performed. The sobering results show that although some systems can indeed help
identify some plagiarized content, they clearly do not find all plagiarism and
at times also identify non-plagiarized material as problematic
Transforming Message Detection
The majority of existing spam filtering techniques suffers from several serious
disadvantages. Some of them provide many false positives. The others are suitable only for
email filtering and may not be used in IM and social networks. Therefore content methods
seem to be more efficient. One of them is based on signature retrieval. However it is not change resistant. There are enhancements (e.g. checksums) but they are extremely time and resource consuming. That is why the main objective of this research is to develop a transforming message detection method. To this end we have compared spam in various languages, namely English, French, Russian and Italian. For each language the number of examined messages including spam and notspam was about 1000. 135 quantitative features have been retrieved. Almost all these features do not depend on the language. They underlie the first step of the algorithm based on support vector machine. The next stage is to test the obtained results
applying N-gram approach. Special attention is paid to word distortion and text alteration. The obtaining results indicate the efficiency of the suggested approach
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Large language models (LLMs) are a special class of pretrained language
models obtained by scaling model size, pretraining corpus and computation.
LLMs, because of their large size and pretraining on large volumes of text
data, exhibit special abilities which allow them to achieve remarkable
performances without any task-specific training in many of the natural language
processing tasks. The era of LLMs started with OpenAI GPT-3 model, and the
popularity of LLMs is increasing exponentially after the introduction of models
like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models,
including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With
the ever-rising popularity of GLLMs, especially in the research community,
there is a strong need for a comprehensive survey which summarizes the recent
research progress in multiple dimensions and can guide the research community
with insightful future research directions. We start the survey paper with
foundation concepts like transformers, transfer learning, self-supervised
learning, pretrained language models and large language models. We then present
a brief overview of GLLMs and discuss the performances of GLLMs in various
downstream tasks, specific domains and multiple languages. We also discuss the
data labelling and data augmentation abilities of GLLMs, the robustness of
GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with
multiple insightful future research directions. To summarize, this
comprehensive survey paper will serve as a good resource for both academic and
industry people to stay updated with the latest research related to GPT-3
family large language models.Comment: Preprint under review, 58 page
- …