3 research outputs found
Similarity Assessment through blocking and affordance assignment in Textual CBR
It has been conceived that children learn new objects through their
affordances, that is, the actions that can be taken on them. We suggest that
web pages also have affordances defined in terms of the users' information need
they meet. An assumption of the proposed approach is that different parts of a
text may not be equally important / relevant to a given query. Judgment on the
relevance of a web document requires, therefore, a thorough look into its
parts, rather than treating it as a monolithic content. We propose a method to
extract and assign affordances to texts and then use these affordances to
retrieve the corresponding web pages. The overall approach presented in the
paper relies on case-based representations that bridge the queries to the
affordances of web documents. We tested our method on the tourism domain and
the results are promising.Comment: 10 pages, 3 figures, WebCBR 2010, Alessandria, Ital
An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression
This paper attempt to study the effectiveness of text representation schemes
on two tasks namely: User Aggression and Fact Detection from the social media
contents. In User Aggression detection, The aim is to identify the level of
aggression from the contents generated in the Social media and written in the
English, Devanagari Hindi and Romanized Hindi. Aggression levels are
categorized into three predefined classes namely: `Non-aggressive`, `Overtly
Aggressive`, and `Covertly Aggressive`. During the disaster-related incident,
Social media like, Twitter is flooded with millions of posts. In such emergency
situations, identification of factual posts is important for organizations
involved in the relief operation. We anticipated this problem as a combination
of classification and Ranking problem. This paper presents a comparison of
various text representation scheme based on BoW techniques, distributed
word/sentence representation, transfer learning on classifiers. Weighted
score is used as a primary evaluation metric. Results show that text
representation using BoW performs better than word embedding on machine
learning classifiers. While pre-trained Word embedding techniques perform
better on classifiers based on deep neural net. Recent transfer learning model
like ELMO, ULMFiT are fine-tuned for the Aggression classification task.
However, results are not at par with pre-trained word embedding model. Overall,
word embedding using fastText produce best weighted -score than Word2Vec
and Glove. Results are further improved using pre-trained vector model.
Statistical significance tests are employed to ensure the significance of the
classification results. In the case of lexically different test Dataset, other
than training Dataset, deep neural models are more robust and perform
substantially better than machine learning classifiers.Comment: 21 Page, 2 Figur
Decision Support for e-Governance: A Text Mining Approach
Information and communication technology has the capability to improve the
process by which governments involve citizens in formulating public policy and
public projects. Even though much of government regulations may now be in
digital form (and often available online), due to their complexity and
diversity, identifying the ones relevant to a particular context is a
non-trivial task. Similarly, with the advent of a number of electronic online
forums, social networking sites and blogs, the opportunity of gathering
citizens' petitions and stakeholders' views on government policy and proposals
has increased greatly, but the volume and the complexity of analyzing
unstructured data makes this difficult. On the other hand, text mining has come
a long way from simple keyword search, and matured into a discipline capable of
dealing with much more complex tasks. In this paper we discuss how text-mining
techniques can help in retrieval of information and relationships from textual
data sources, thereby assisting policy makers in discovering associations
between policies and citizens' opinions expressed in electronic public forums
and blogs etc. We also present here, an integrated text mining based
architecture for e-governance decision support along with a discussion on the
Indian scenario.Comment: 19 Pages, 7 Figure