Search CORE

172 research outputs found

Dataless text classification with descriptive LDA

Author: Carroll John
Chen Xingyuan
Jin Peng
Xia Yunqing
Publication venue: Association for the Advancement of Artificial Intelligence Press
Publication date: 19/02/2015
Field of study

Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios

Sussex Research Online

Association for the Advancement of Artificial Intelligence: AAAI Publications

Detecting Online Hate Speech Using Both Supervised and Weakly-Supervised Approaches

Author: Gao Lei
Publication venue
Publication date: 17/01/2019
Field of study

In the wake of a polarizing election, social media is laden with hateful content. Context accompanying a hate speech text is useful for identifying hate speech, which however has been largely overlooked in existing datasets and hate speech detection models. We provide an annotated corpus of hate speech with context information well kept. Then we propose two types of supervised hate speech detection models that incorporate context information, a logistic regression model with context features and a neural network model with learning components for context. Further, to address various limitations of supervised hate speech classification methods including corpus bias and huge cost of annotation, we propose a weakly supervised two-path bootstrapping approach for online hate speech detection by leveraging large-scale unlabeled data. This system significantly outperforms hate speech detection systems that are trained in a supervised manner using manually annotated data. Applying this model on a large quantity of tweets collected before, after, and on election day reveals motivations and patterns of inflammatory language

Texas A&M Repository

Detecting Online Hate Speech Using Both Supervised and Weakly-Supervised Approaches

Author: Gao Lei
Publication venue
Publication date: 17/01/2019
Field of study

Texas A&M Repository

MANIPULATION ACTION UNDERSTANDING FOR OBSERVATION AND EXECUTION

Author: Yang Yezhou
Publication venue
Publication date: 01/01/2015
Field of study

Modern intelligent agents will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. We want to propose a cognitive system that interprets human manipulation actions from perceptual information (image and depth data) and consists of perceptual modules and reasoning modules that are in interaction with each other. The contributions of this work are given along two core problems at the heart of action understanding: a.) the grounding of relevant information about actions in perception (the perception - action integration problem), and b.) the organization of perceptual and high-level symbolic information for interpreting the actions (the sequencing problem). At the high level, actions are represented with the Manipulation Action Context-free Grammar (MACFG) , a syntactic grammar and associated parsing algorithms, which organizes actions as a sequence of sub-events. Each sub-event is described by the hand (as well as grasp type), movements (actions) and the objects and tools involved, and the relevant information about these quantities is obtained from biological-inspired perception modules. These modules track the hands and objects and recognize the hand grasp, actions, segmentation, and action consequences. Furthermore, a probabilistic semantic parsing framework based on CCG (Combinatory Categorial Grammar) theory is adopted to model the semantic meaning of human manipulation actions. Additionally, the lesson from the findings on mirror neurons is that the two processes of interpreting visually observed action and generating actions, should share the same underlying cognitive process. Recent studies have shown that grammatical structures underlie the representation of manipulation actions, which are used both to understand and to execute these actions. Analogically, understanding manipulation actions is like understanding language, while executing them is like generating language. Experiments on two tasks, 1) a robot observing people performing manipulation actions, and 2) a robot then executing manipulation actions accordingly, are presented to validate the formalism. The technical parts of this thesis are devoted to the experimental setting of task (1), while the task (2) is given as a live demonstration

Digital Repository at the University of Maryland

Detection, monitoring and management of small water bodies:: A case study of Shahjadpur Thana, Sirajgonj district, Bangladesh

Author: Huda Khondaker Mohammod Shariful
Publication venue
Publication date: 01/01/2004
Field of study

Bangladesh is a low-lying flood prone deltaic plain. Excavations are needed to create raised land for safe flood-free homesteads and water bodies for irrigation, and these result in the creation of doba, pukur, dighi and jola. All of these types of small water bodies are almost equally distributed all over the country, except for the heel, which is a natural, saucer shaped depression. For every eight people there is approximately an acre of small water bodies, which range in size from 25-400 sq.m. (doba), 150-1000 sq.m. (pukur), >750 sq.m. (dighi), >2000 sq.m. (jola) and >1000 sq.m. (heel). These small water bodies are commonly used for drinking, bathing and washing, fisheries and aquaculture, duck raising, irrigation, cattle feeding and washing. Despite the importance of small water bodies to the local economy there is no up to date inventory. For this purpose, in my research I have employed integrated participatory remote sensing, GIS and socio-cultural approaches. Although these have not been used before in Bangladesh, 1 argue that they are ideal for effective resource management and sustainable development planning. This research investigated the historical development of the present spatial distribution and use patterns of SWB using Remote Sensing and GIS. This was at a regional scale in four mouzas of Shahjadpur Thana. The data sources were topographical maps, aerial photographs, satellite images, agricultural census data, in-depth questionnaire, focus group meetings and interviewing key informants. An integrated RS-GIS and social sciences methodology was employed to produce maps of change and overlays of the socio-cultural factors involved. Results show that the doba, pukur and dighi, when these are not obstructed by surrounding vegetation, can be detected easily in high resolution panchromatic CORONA satellite photography, IRS-ID Panchromatic image and aerial photography. Comparatively large pukurs, dighis and all jo las and heels are detected in all other optical sensors and the SIR-C radar imagery. Multi-temporal images are helpful for identifying the different types of small water bodies as well separating those from other seasonal large water bodies and flooded areas. It is hoped that the proposed computer assisted participatory management system, including some locally specific guidelines, may be applicable for the planning of other thanas (total 490) in Bangladesh. The proposed management system will facilitate the integration of local planning with the national level planning process, which has not been possible hitherto

Durham e-Theses

OpenGrey Repository