Search CORE

20 research outputs found

Personalized Text Categorization Using a MultiAgent Architecture

Author: ADDIS A
ARMANO G
CHERCHI G
VARGIU E
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, a system able to retrieve contents deemed relevant for the users through a text categorization process, is presented. The system is built exploiting a generic multiagent architecture that supports the implementation of applications aimed at (i) retrieving heterogeneous data spread among different sources (e.g., generic html pages, news, blogs, forums, and databases); (ii) filtering and organizing them according to personal interests explicitly stated by each user; (iii) providing adaptation techniques to improve and refine throughout time the profile of each selected user. In particular, the implemented multiagent system creates personalized press-revies from online newspapers. Preliminary results are encouraging and highlight the effectiveness of the approach

Archivio istituzionale della ricerca - Università di Cagliari

Attention-Based Recurrent Neural Networks (RNNs) for Short Text Classification: An Application in Public Health Monitoring

Author: De La Iglesia Beatriz
Edo-Osagie Osagioduwa
Publication venue
Publication date: 16/05/2019
Field of study

In this paper, we propose an attention-based approach to short text classification, which we have created for the practical application of Twitter mining for public health monitoring. Our goal is to automatically filter Tweets which are relevant to the syndrome of asthma/difficulty breathing. We describe a bi-directional Recurrent Neural Network architecture with an attention layer (termed ABRNN) which allows the network to weigh words in a Tweet differently based on their perceived importance. We further distinguish between two variants of the ABRNN based on the Long Short Term Memory and Gated Recurrent Unit architectures respectively, termed the ABLSTM and ABGRU. We apply the ABLSTM and ABGRU, along with popular deep learning text classification models, to a Tweet relevance classification problem and compare their performances. We find that the ABLSTM outperforms the other models, achieving an accuracy of 0.906 and an F1-score of 0.710. The attention vectors computed as a by-product of our models were also found to be meaningful representations of the input Tweets. As such, the described models have the added utility of computing document embeddings which could be used for other tasks besides classification. To further validate the approach, we demonstrate the ABLSTM’s performance in the real world application of public health surveillance and compare the results with real-world syndromic surveillance data provided by Public Health England (PHE). A strong positive correlation was observed between the ABLSTM surveillance signal and the real-world asthma/difficulty breathing syndromic surveillance data. The ABLSTM is a useful tool for the task of public health surveillance

University of East Anglia digital repository

An automatic approach to weighted subject indexing – An empirical study in the biomedical domain

Author: An
Blair
Blei
Chung
Cooper
Cooper
Foskett
Furnas
Hjørland
Hjørland
Hjørland
Hjørland
Hjørland
Hutchins
Jalali
Kent
Kleineberg
Klingbiel
Lavrenko
Lu
Mai
Manning
Maron
Maron
Maron
Medelyan
Meij
Mu
Plaunt
Ruch
Salton
Salton
Shin
Stokes
Taylor
Travis
Willis
Wilson
Wolfram
Zhang
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.Ye

Crossref

SHAREOK repository

Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval

Author: M.R. Lyu
Rong Jin
S.C.H. Hoi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Machine Learning in Automated Text Categorization

Author: ANDROUTSOPOULOS I.
ATTARDI G.
BAKER L.D.
BIEBRICHER P.
CAROPRESO M.F.
CAVNAR W.B.
CHAKRABARTI S.
CLACK C.
CLEVERDON C.
COHEN W. W.
COHEN W. W.
COHEN W.W.
DAGAN I.
DEERWESTER S.
DENOYER L.
DIAZ ESTEBAN A.
DRUCKER H.
DUMAIS S.T.
DUMAIS S.T.
ESCUDERO G.
Fabrizio Sebastiani
FIELD B.
FORSYTH R. S.
FUHR N.
FUHR N.
FUHR N.
FURNKRANZ J.
GALAVOTTI L.
GALE W. A.
GOVERT N.
GRAY W.A.
GUTHRIE L.
HAYES P.J.
HEAPS H.
HERSH W.
HULL D. A.
HULL D. A.
ITTNER D.J.
IWAYAMA M.
IYER R.D.
JOACHIMS T.
JOACHIMS T.
JOACHIMS T.
JOHN G. H.
JUNKER M.
JUNKER M.
KESSLER B.
KIM Y.-H.
KLINKENBERG R.
KNORZ G.
KOLLER D.
LAM S.L.
LAM W.
LAM W.
LANG K.
LARKEY L. S.
LARKEY L. S.
LARKEY L.S.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LI H.
LI Y.H.
LIERE R.
LIM J. H.
MASAND B.
MASAND B.
MCCALLUM A. K.
MCCALLUM A.K.
MLADENIC D.
MLADENIC D.
MOULINIER I.
MOULINIER I.
MYERS K.
NG H.T.
OH H.-J.
PAZIENZA M. T.
RILOFF E.
ROBERTSON S.E.
ROBERTSON S.E.
ROTH D.
RUIZ M.E.
SABLE C.L.
SARACEVIC T.
SCHAPIRE R. E.
SCHUTZE H.
SCHUTZE H.
SCOTT S.
SEBASTIANI F.
SINGHAL A.
SLONIM N.
TAIRA H.
TUMER K.
TZERAS K.
VAN RIJSBERGEN C. J.
WIENER E.D.
YANG Y.
YANG Y.
YANG Y.
YANG Y.
YU K.L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

arXiv.org e-Print Archive

CiteSeerX

Crossref

Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval

Author: HOI Steven C. H.
JIN Rong
LYU Michael R.
Publication venue: IEEE
Publication date: 01/01/1999
Field of study

Singapore Ministry of Education Academic Research Fund Tier

Crossref

Institutional Knowledge at Singapore Management University

Publikationsserver der RWTH Aachen University

Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval

Author: HOI Steven C. H.
JIN Rong
LYU Michael R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2009
Field of study

Singapore Ministry of Education Academic Research Fund Tier

CiteSeerX

Institutional Knowledge at Singapore Management University

Identification of anonymous users in Twitter

Author: Arin Inanc
Arın İnanç
Publication venue
Publication date: 01/01/2012
Field of study

Users may have multiple profiles when writing comments, blogs, and tweets on the web. While some of these profiles reveal true identity, the others are created under pseudonyms. This is essential especially in the countries with oppressive governments where activists are writing pseudonymous tweets or Facebook messages. In these countries, government offcials discovering the fact that a person is among the activists may have serious consequences, the activist being imprisoned, or even his or her life being jeopardized. Pseudonyms may provide a sense of anonymity, however the writing patterns of an author can provide clues that can be used to link the pseudonymous account to the public account. More specifically, one can look at some features within the text whose author is known, and build a model by using these features to predict whether a given (supposedly) anonymous text belongs to that author or not. In this work, we first demonstrate that a person can be identified as being part of a group by using his/her tweets. We used twitter since it is a popular platform, but the problem is not specific to twitter. We show that through tweets, an adversary can build a classifier from public tweets of known users to match them with pseudonymous twitter accounts. We use a simple vector-space model with tf-idf weights to represent documents and a Naive-Bayes classifer with cosine similarity measure. We show that the problem of matching public and pseudonymous accounts exists in twitter through experiments with real data. We also provide a formalism to describe the problem and based on the formalism we provide a solution to protect the privacy of individuals who would like to stay anonymous when writing tweets

Sabanci University Research Database