71,929 research outputs found
Grooming Detection using Fuzzy-Rough Feature Selection and Text Classification
Online child grooming detection has recently attracted intensive research interests from both the machine learning community and digital forensics community due to its great social impact. The existing data-driven approaches usually face the challenges of lack of training data and the uncertainty of classes in terms of the classification or decision boundary. This paper proposes a grooming detection approach in an effort to address such uncertainty based on a data set derived from a publicly available profiling data set. In particular, the approach firstly applies the conventional text feature extraction approach in identifying the most significant words in the data set. This is followed by the application of a fuzzy-rough feature selection approach in reducing the high dimensions of the selected words for fast processing, which at the same time addressing the uncertainty of class boundaries. The experimental results demonstrate the efficiency and efficacy
Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition
Online handwritten Chinese text recognition (OHCTR) is a challenging problem
as it involves a large-scale character set, ambiguous segmentation, and
variable-length input sequences. In this paper, we exploit the outstanding
capability of path signature to translate online pen-tip trajectories into
informative signature feature maps using a sliding window-based method,
successfully capturing the analytic and geometric properties of pen strokes
with strong local invariance and robustness. A multi-spatial-context fully
convolutional recurrent network (MCFCRN) is proposed to exploit the multiple
spatial contexts from the signature feature maps and generate a prediction
sequence while completely avoiding the difficult segmentation problem.
Furthermore, an implicit language model is developed to make predictions based
on semantic context within a predicting feature sequence, providing a new
perspective for incorporating lexicon constraints and prior knowledge about a
certain language in the recognition procedure. Experiments on two standard
benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with
correct rates of 97.10% and 97.15%, respectively, which are significantly
better than the best result reported thus far in the literature.Comment: 14 pages, 9 figure
FEATURE SELECTION APPLIED TO THE TIME-FREQUENCY REPRESENTATION OF MUSCLE NEAR-INFRARED SPECTROSCOPY (NIRS) SIGNALS: CHARACTERIZATION OF DIABETIC OXYGENATION PATTERNS
Diabetic patients might present peripheral microcirculation impairment and might benefit from physical training. Thirty-nine diabetic patients underwent the monitoring of the tibialis anterior muscle oxygenation during a series of voluntary ankle flexo-extensions by near-infrared spectroscopy (NIRS). NIRS signals were acquired before and after training protocols. Sixteen control subjects were tested with the same protocol. Time-frequency distributions of the Cohen's class were used to process the NIRS signals relative to the concentration changes of oxygenated and reduced hemoglobin. A total of 24 variables were measured for each subject and the most discriminative were selected by using four feature selection algorithms: QuickReduct, Genetic Rough-Set Attribute Reduction, Ant Rough-Set Attribute Reduction, and traditional ANOVA. Artificial neural networks were used to validate the discriminative power of the selected features. Results showed that different algorithms extracted different sets of variables, but all the combinations were discriminative. The best classification accuracy was about 70%. The oxygenation variables were selected when comparing controls to diabetic patients or diabetic patients before and after training. This preliminary study showed the importance of feature selection techniques in NIRS assessment of diabetic peripheral vascular impairmen
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Social media is often viewed as a sensor into various societal events such as
disease outbreaks, protests, and elections. We describe the use of social media
as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our
approach detects a broad range of cyber-attacks (e.g., distributed denial of
service (DDOS) attacks, data breaches, and account hijacking) in an
unsupervised manner using just a limited fixed set of seed event triggers. A
new query expansion strategy based on convolutional kernels and dependency
parses helps model reporting structure and aids in identifying key event
characteristics. Through a large-scale analysis over Twitter, we demonstrate
that our approach consistently identifies and encodes events, outperforming
existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201
Thesaurus-based index term extraction for agricultural documents
This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction
- …