133,512 research outputs found
Text Categorization based on Associative Classification
Text mining is an emerging technology that can be used to augment existing data in corporate databases by making unstructured text data available for analysis. The incredible increase in online documents, which has been mostly due to the expanding internet, has renewed the interest in automated document classification and data mining. The demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Text classification is the process of classifying documents into predefined categories based on their content. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Both classification and association rule mining are indispensable to practical applications. For association rule mining, the target of discovery is not pre-determined, while for classification rule mining there is one and only one predetermined target. Thus, great savings and conveniences to the user could result if the two mining techniques can somehow be integrated. In this paper, such an integrated framework, called associative classification is used for text categorization The algorithm presented here for text classification uses words as features , to derive feature set from preclassified text documents. The concept of Naïve Bayes classifier is then used on derived features for final classification
MAINTENANCE OF DATA RICHNESS IN BUSINESS COMMUNICATION DATA
Business negotiations – be they face-to-face or electronic – are conducted through communication enabling the declaration of negotiation objectives and active implementation of negotiation strategies to achieve pre-defined goals and the declaration of a successful or unsuccessful end of the negotiation. The processing of exchanged textual communication enables the automatic transformation of unstructured data into processable structured datasets and subsequently the analysis of textual content without losing the data richness of exchanged communication messages. For this purpose, the paper presents Text Mining-based pre-processing approaches and dimensionality reduction algorithms from Feature Extraction and Feature Selection in a research framework and evaluates those to counteract common dimensionality problems with textual processing. In doing so, the maintenance of data richness in communication data is considered as the overall goal to determine the dataset with minimal information loss. In this sense, various pre-processed and transformed communication datasets derived from dimensionality reduction are integrated as input data into selected classification models to measure the prediction performance regarding the final negotiation outcome with ROC analysis. The central results of the ROC show that quantified business communication generated by Optimized Selection delivers the best data based on Lovins’ stemming algorithm compared to stemming variations of Forward Selection and SVD
Recommended from our members
Crisis Event Extraction Service (CREES) - Automatic Detection and Classification of Crisis-related Content on Social Media
Social media posts tend to provide valuable reports during crises. However, this information can be hidden in large amounts of unrelated documents. Providing tools that automatically identify relevant posts, event types (e.g., hurricane, floods, etc.) and information categories (e.g., reports on affected individuals, donations and volunteering, etc.) in social media posts is vital for their efficient handling and consumption. We introduce the Crisis Event Extraction Service (CREES), an open-source web API that automatically classifies posts during crisis situations. The API provides annotations for crisis-related documents, event types and information categories through an easily deployable and accessible web API that can be integrated into multiple platform and tools. The annotation service is backed by Convolutional Neural Networks (CNNs) and validated against traditional machine learning models. Results show that the CNN-based API results can be relied upon when dealing with specific crises with the benefits associated with the usage word embeddings
Bimodal network architectures for automatic generation of image annotation from text
Medical image analysis practitioners have embraced big data methodologies.
This has created a need for large annotated datasets. The source of big data is
typically large image collections and clinical reports recorded for these
images. In many cases, however, building algorithms aimed at segmentation and
detection of disease requires a training dataset with markings of the areas of
interest on the image that match with the described anomalies. This process of
annotation is expensive and needs the involvement of clinicians. In this work
we propose two separate deep neural network architectures for automatic marking
of a region of interest (ROI) on the image best representing a finding
location, given a textual report or a set of keywords. One architecture
consists of LSTM and CNN components and is trained end to end with images,
matching text, and markings of ROIs for those images. The output layer
estimates the coordinates of the vertices of a polygonal region. The second
architecture uses a network pre-trained on a large dataset of the same image
types for learning feature representations of the findings of interest. We show
that for a variety of findings from chest X-ray images, both proposed
architectures learn to estimate the ROI, as validated by clinical annotations.
There is a clear advantage obtained from the architecture with pre-trained
imaging network. The centroids of the ROIs marked by this network were on
average at a distance equivalent to 5.1% of the image width from the centroids
of the ground truth ROIs.Comment: Accepted to MICCAI 2018, LNCS 1107
Event based text mining for integrated network construction
The scientific literature is a rich and challenging data source for research in systems biology, providing numerous interactions between biological entities. Text mining techniques have been increasingly useful to extract such information from the literature in an automatic way, but up to now the main focus of text mining in the systems biology field has been restricted mostly to the discovery of protein-protein interactions. Here, we take this approach one step further, and use machine learning techniques combined with text mining to extract a much wider variety of interactions between biological entities. Each particular interaction type gives rise to a separate network, represented as a graph, all of which can be subsequently combined to yield a so-called integrated network representation. This provides a much broader view on the biological system as a whole, which can then be used in further investigations to analyse specific properties of the networ
- …