6,345 research outputs found
SOTXTSTREAM: Density-based self-organizing clustering of text streams
A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets
Use of colour for hand-filled form analysis and recognition
Colour information in form analysis is currently under utilised. As technology has advanced and computing costs have reduced, the processing of forms in colour has now become practicable. This paper describes a novel colour-based approach to the extraction of filled data from colour form images. Images are first quantised to reduce the colour complexity and data is extracted by examining the colour characteristics of the images. The improved performance of the proposed method has been verified by comparing the processing time, recognition rate, extraction precision and recall rate to that of an equivalent black and white system
A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts
Wide usage of social media platforms has increased the risk of aggression,
which results in mental stress and affects the lives of people negatively like
psychological agony, fighting behavior, and disrespect to others. Majority of
such conversations contains code-mixed languages[28]. Additionally, the way
used to express thought or communication style also changes from one social
media plat-form to another platform (e.g., communication styles are different
in twitter and Facebook). These all have increased the complexity of the
problem. To solve these problems, we have introduced a unified and robust
multi-modal deep learning architecture which works for English code-mixed
dataset and uni-lingual English dataset both.The devised system, uses
psycho-linguistic features and very ba-sic linguistic features. Our multi-modal
deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and
Disconnected RNN(with Glove and FastText embedding, both). Finally, the system
takes the decision based on model averaging. We evaluated our system on English
Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from
Kaggle. Experimental results show that our proposed system outperforms all the
previous approaches on English code-mixed dataset and uni-lingual English
dataset.Comment: 10 pages, 5 Figures, 6 Tables, accepted at CoDS-COMAD 202
Detecting Abusive Language on Online Platforms: A Critical Analysis
Abusive language on online platforms is a major societal problem, often
leading to important societal problems such as the marginalisation of
underrepresented minorities. There are many different forms of abusive language
such as hate speech, profanity, and cyber-bullying, and online platforms seek
to moderate it in order to limit societal harm, to comply with legislation, and
to create a more inclusive environment for their users. Within the field of
Natural Language Processing, researchers have developed different methods for
automatically detecting abusive language, often focusing on specific
subproblems or on narrow communities, as what is considered abusive language
very much differs by context. We argue that there is currently a dichotomy
between what types of abusive language online platforms seek to curb, and what
research efforts there are to automatically detect abusive language. We thus
survey existing methods as well as content moderation policies by online
platforms in this light, and we suggest directions for future work
- …