Search CORE

133,512 research outputs found

Text Categorization based on Associative Classification

Author: Ansari Uzma
Shrivastava Padmavati
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 14/08/2020
Field of study

Text mining is an emerging technology that can be used to augment existing data in corporate databases by making unstructured text data available for analysis. The incredible increase in online documents, which has been mostly due to the expanding internet, has renewed the interest in automated document classification and data mining. The demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Text classification is the process of classifying documents into predefined categories based on their content. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Both classification and association rule mining are indispensable to practical applications. For association rule mining, the target of discovery is not pre-determined, while for classification rule mining there is one and only one predetermined target. Thus, great savings and conveniences to the user could result if the two mining techniques can somehow be integrated. In this paper, such an integrated framework, called associative classification is used for text categorization The algorithm presented here for text classification uses words as features , to derive feature set from preclassified text documents. The concept of Naïve Bayes classifier is then used on derived features for final classification

Interscience Research Network

MAINTENANCE OF DATA RICHNESS IN BUSINESS COMMUNICATION DATA

Author: Kaya Muhammed-Fatih
Schoop Mareike
Publication venue: AIS Electronic Library (AISeL)
Publication date: 15/06/2020
Field of study

Business negotiations – be they face-to-face or electronic – are conducted through communication enabling the declaration of negotiation objectives and active implementation of negotiation strategies to achieve pre-defined goals and the declaration of a successful or unsuccessful end of the negotiation. The processing of exchanged textual communication enables the automatic transformation of unstructured data into processable structured datasets and subsequently the analysis of textual content without losing the data richness of exchanged communication messages. For this purpose, the paper presents Text Mining-based pre-processing approaches and dimensionality reduction algorithms from Feature Extraction and Feature Selection in a research framework and evaluates those to counteract common dimensionality problems with textual processing. In doing so, the maintenance of data richness in communication data is considered as the overall goal to determine the dataset with minimal information loss. In this sense, various pre-processed and transformed communication datasets derived from dimensionality reduction are integrated as input data into selected classification models to measure the prediction performance regarding the final negotiation outcome with ROC analysis. The central results of the ROC show that quantified business communication generated by Optimized Selection delivers the best data based on Lovins’ stemming algorithm compared to stemming variations of Forward Selection and SVD

AIS Electronic Library (AISeL)

Using a neural network-based feature extraction method to facilitate citation screening for systematic reviews

Author: KONTONATSIOS GEORGIOS
KORKONTZELOS YANNIS
MATTHEW PETER
SPENCER SALLY
Publication venue: 'Elsevier BV'
Publication date: 01/07/2020
Field of study

Edge Hill University Research Information Repository

Recommended from our members

Crisis Event Extraction Service (CREES) - Automatic Detection and Classification of Crisis-related Content on Social Media

Author: Alani Harith
Burel Gregoire
Publication venue
Publication date: 18/05/2018
Field of study

Social media posts tend to provide valuable reports during crises. However, this information can be hidden in large amounts of unrelated documents. Providing tools that automatically identify relevant posts, event types (e.g., hurricane, floods, etc.) and information categories (e.g., reports on affected individuals, donations and volunteering, etc.) in social media posts is vital for their efficient handling and consumption. We introduce the Crisis Event Extraction Service (CREES), an open-source web API that automatically classifies posts during crisis situations. The API provides annotations for crisis-related documents, event types and information categories through an easily deployable and accessible web API that can be integrated into multiple platform and tools. The annotation service is backed by Convolutional Neural Networks (CNNs) and validated against traditional machine learning models. Results show that the CNN-based API results can be relied upon when dealing with specific crises with the benefits associated with the usage word embeddings

Open Research Online (The Open University)

Bimodal network architectures for automatic generation of image annotation from text

Author: Guo Yufan
Gur Yaniv
Madani Ali
Moradi Mehdi
Syeda-Mahmood Tanveer
Publication venue
Publication date: 05/09/2018
Field of study

Medical image analysis practitioners have embraced big data methodologies. This has created a need for large annotated datasets. The source of big data is typically large image collections and clinical reports recorded for these images. In many cases, however, building algorithms aimed at segmentation and detection of disease requires a training dataset with markings of the areas of interest on the image that match with the described anomalies. This process of annotation is expensive and needs the involvement of clinicians. In this work we propose two separate deep neural network architectures for automatic marking of a region of interest (ROI) on the image best representing a finding location, given a textual report or a set of keywords. One architecture consists of LSTM and CNN components and is trained end to end with images, matching text, and markings of ROIs for those images. The output layer estimates the coordinates of the vertices of a polygonal region. The second architecture uses a network pre-trained on a large dataset of the same image types for learning feature representations of the findings of interest. We show that for a variety of findings from chest X-ray images, both proposed architectures learn to estimate the ROI, as validated by clinical annotations. There is a clear advantage obtained from the architecture with pre-trained imaging network. The centroids of the ROIs marked by this network were on average at a distance equivalent to 5.1% of the image width from the centroids of the ground truth ROIs.Comment: Accepted to MICCAI 2018, LNCS 1107

arXiv.org e-Print Archive

Crossref

Event based text mining for integrated network construction

Author: Saeys Yvan
Van de Peer Yves
Van Landeghem Sofie
Publication venue: Microtome Publishing
Publication date: 01/01/2010
Field of study

The scientific literature is a rich and challenging data source for research in systems biology, providing numerous interactions between biological entities. Text mining techniques have been increasingly useful to extract such information from the literature in an automatic way, but up to now the main focus of text mining in the systems biology field has been restricted mostly to the discovery of protein-protein interactions. Here, we take this approach one step further, and use machine learning techniques combined with text mining to extract a much wider variety of interactions between biological entities. Each particular interaction type gives rise to a separate network, represented as a graph, all of which can be subsequently combined to yield a so-called integrated network representation. This provides a much broader view on the biological system as a whole, which can then be used in further investigations to analyse specific properties of the networ

Ghent University Academic Bibliography