31,904 research outputs found

    Innovative Label Embedding for Food Safety Comment Classification: Fusion of Self-Semantic and Self-Knowledge Features

    Get PDF
    Food safety comment classification represents a specialized task within the realm of text classification. The objective is to efficiently identify a large volume of food safety comments, aiding relevant authorities in timely food analysis and safety alerts. Traditional methods typically employ one-hot encoding for label processing. However, in real-world situations, classified labels often convey valuable semantic information and guidance. This paper introduces an innovative approach to enhance the classification performance of food safety comments by embedding label information. Initially, we extracted generic sentiment pivot words from various classification labels as label description information. Subsequently, we employ a joint embedding approach to integrate this label description information into the text. This process will pool the expressions of the pivot word into the corresponding sentiment labels in the known domains after averaging to get the embedded expression. This aims to acquire highly detailed self-semantic feature vectors and self-knowledge feature vectors that are integrated with labeled descriptive information. Then, feed the semantic representation of comments and the word-embedded representation of labeled description information into a time-step-based multilayer Bi-LSTM and a step-based multilayer CNN, respectively. Ultimately, we concatenate these two feature vectors to facilitate matching, thereby fusing the self-semantic and self-knowledge features of labeled description information to train a classification model for food safety comments. Experimental results on the food safety comment dataset showcase a noteworthy improvement of 1.74% and 1.27% in Macro_Precision and Macro_F1 metrics, respectively, compared to BERT, BERT-RNN, and BERT-CNN. Through extensive ablation experiments and additional studies, our method effectively embeds labeling information, demonstrating a clear advantage over traditional methods in the task of classifying food safety comments.   Doi: 10.28991/HIJ-2024-05-01-013 Full Text: PD

    PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

    Full text link
    Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures such as convolutional neural networks, these methods usually yield inferior results when applied to particular machine learning tasks. One possible reason is that these text embedding methods learn the representation of text in a fully unsupervised way, without leveraging the labeled information available for the task. Although the low dimensional representations learned are applicable to many different tasks, they are not particularly tuned for any task. In this paper, we fill this gap by proposing a semi-supervised representation learning method for text data, which we call the \textit{predictive text embedding} (PTE). Predictive text embedding utilizes both labeled and unlabeled data to learn the embedding of text. The labeled information and different levels of word co-occurrence information are first represented as a large-scale heterogeneous text network, which is then embedded into a low dimensional space through a principled and efficient algorithm. This low dimensional embedding not only preserves the semantic closeness of words and documents, but also has a strong predictive power for the particular task. Compared to recent supervised approaches based on convolutional neural networks, predictive text embedding is comparable or more effective, much more efficient, and has fewer parameters to tune.Comment: KDD 201

    Integrated Node Encoder for Labelled Textual Networks

    Full text link
    Voluminous works have been implemented to exploit content-enhanced network embedding models, with little focus on the labelled information of nodes. Although TriDNR leverages node labels by treating them as node attributes, it fails to enrich unlabelled node vectors with the labelled information, which leads to the weaker classification result on the test set in comparison to existing unsupervised textual network embedding models. In this study, we design an integrated node encoder (INE) for textual networks which is jointly trained on the structure-based and label-based objectives. As a result, the node encoder preserves the integrated knowledge of not only the network text and structure, but also the labelled information. Furthermore, INE allows the creation of label-enhanced vectors for unlabelled nodes by entering their node contents. Our node embedding achieves state-of-the-art performances in the classification task on two public citation networks, namely Cora and DBLP, pushing benchmarks up by 10.0\% and 12.1\%, respectively, with the 70\% training ratio. Additionally, a feasible solution that generalizes our model from textual networks to a broader range of networks is proposed.Comment: 7 page

    Weakly-Supervised Neural Text Classification

    Full text link
    Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.Comment: CIKM 2018 Full Pape

    Evaluation of Output Embeddings for Fine-Grained Image Classification

    Full text link
    Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and class embeddings, we learn a compatibility function such that matching embeddings are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score. We use state-of-the-art image features and focus on different supervised attributes and unsupervised output embeddings either derived from hierarchies or learned from unlabeled text corpora. We establish a substantially improved state-of-the-art on the Animals with Attributes and Caltech-UCSD Birds datasets. Most encouragingly, we demonstrate that purely unsupervised output embeddings (learned from Wikipedia and improved with fine-grained text) achieve compelling results, even outperforming the previous supervised state-of-the-art. By combining different output embeddings, we further improve results.Comment: @inproceedings {ARWLS15, title = {Evaluation of Output Embeddings for Fine-Grained Image Classification}, booktitle = {IEEE Computer Vision and Pattern Recognition}, year = {2015}, author = {Zeynep Akata and Scott Reed and Daniel Walter and Honglak Lee and Bernt Schiele}
    • …
    corecore