133,512 research outputs found

    Text Categorization based on Associative Classification

    Get PDF
    Text mining is an emerging technology that can be used to augment existing data in corporate databases by making unstructured text data available for analysis. The incredible increase in online documents, which has been mostly due to the expanding internet, has renewed the interest in automated document classification and data mining. The demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Text classification is the process of classifying documents into predefined categories based on their content. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Both classification and association rule mining are indispensable to practical applications. For association rule mining, the target of discovery is not pre-determined, while for classification rule mining there is one and only one predetermined target. Thus, great savings and conveniences to the user could result if the two mining techniques can somehow be integrated. In this paper, such an integrated framework, called associative classification is used for text categorization The algorithm presented here for text classification uses words as features , to derive feature set from preclassified text documents. The concept of Naïve Bayes classifier is then used on derived features for final classification

    MAINTENANCE OF DATA RICHNESS IN BUSINESS COMMUNICATION DATA

    Get PDF
    Business negotiations – be they face-to-face or electronic – are conducted through communication enabling the declaration of negotiation objectives and active implementation of negotiation strategies to achieve pre-defined goals and the declaration of a successful or unsuccessful end of the negotiation. The processing of exchanged textual communication enables the automatic transformation of unstructured data into processable structured datasets and subsequently the analysis of textual content without losing the data richness of exchanged communication messages. For this purpose, the paper presents Text Mining-based pre-processing approaches and dimensionality reduction algorithms from Feature Extraction and Feature Selection in a research framework and evaluates those to counteract common dimensionality problems with textual processing. In doing so, the maintenance of data richness in communication data is considered as the overall goal to determine the dataset with minimal information loss. In this sense, various pre-processed and transformed communication datasets derived from dimensionality reduction are integrated as input data into selected classification models to measure the prediction performance regarding the final negotiation outcome with ROC analysis. The central results of the ROC show that quantified business communication generated by Optimized Selection delivers the best data based on Lovins’ stemming algorithm compared to stemming variations of Forward Selection and SVD

    Bimodal network architectures for automatic generation of image annotation from text

    Full text link
    Medical image analysis practitioners have embraced big data methodologies. This has created a need for large annotated datasets. The source of big data is typically large image collections and clinical reports recorded for these images. In many cases, however, building algorithms aimed at segmentation and detection of disease requires a training dataset with markings of the areas of interest on the image that match with the described anomalies. This process of annotation is expensive and needs the involvement of clinicians. In this work we propose two separate deep neural network architectures for automatic marking of a region of interest (ROI) on the image best representing a finding location, given a textual report or a set of keywords. One architecture consists of LSTM and CNN components and is trained end to end with images, matching text, and markings of ROIs for those images. The output layer estimates the coordinates of the vertices of a polygonal region. The second architecture uses a network pre-trained on a large dataset of the same image types for learning feature representations of the findings of interest. We show that for a variety of findings from chest X-ray images, both proposed architectures learn to estimate the ROI, as validated by clinical annotations. There is a clear advantage obtained from the architecture with pre-trained imaging network. The centroids of the ROIs marked by this network were on average at a distance equivalent to 5.1% of the image width from the centroids of the ground truth ROIs.Comment: Accepted to MICCAI 2018, LNCS 1107

    Event based text mining for integrated network construction

    Get PDF
    The scientific literature is a rich and challenging data source for research in systems biology, providing numerous interactions between biological entities. Text mining techniques have been increasingly useful to extract such information from the literature in an automatic way, but up to now the main focus of text mining in the systems biology field has been restricted mostly to the discovery of protein-protein interactions. Here, we take this approach one step further, and use machine learning techniques combined with text mining to extract a much wider variety of interactions between biological entities. Each particular interaction type gives rise to a separate network, represented as a graph, all of which can be subsequently combined to yield a so-called integrated network representation. This provides a much broader view on the biological system as a whole, which can then be used in further investigations to analyse specific properties of the networ
    • …
    corecore