133 research outputs found

    Mining Mid-level Features for Image Classification

    No full text
    International audienceMid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In par- ticular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC

    Feature Extraction and Duplicate Detection for Text Mining: A Survey

    Get PDF
    Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user

    Hyperspectral Remote Sensing Data Analysis and Future Challenges

    Full text link

    Segmentation and classification of lung nodules from Thoracic CT scans : methods based on dictionary learning and deep convolutional neural networks.

    Get PDF
    Lung cancer is a leading cause of cancer death in the world. Key to survival of patients is early diagnosis. Studies have demonstrated that screening high risk patients with Low-dose Computed Tomography (CT) is invaluable for reducing morbidity and mortality. Computer Aided Diagnosis (CADx) systems can assist radiologists and care providers in reading and analyzing lung CT images to segment, classify, and keep track of nodules for signs of cancer. In this thesis, we propose a CADx system for this purpose. To predict lung nodule malignancy, we propose a new deep learning framework that combines Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to learn best in-plane and inter-slice visual features for diagnostic nodule classification. Since a nodule\u27s volumetric growth and shape variation over a period of time may reveal information regarding the malignancy of nodule, separately, a dictionary learning based approach is proposed to segment the nodule\u27s shape at two time points from two scans, one year apart. The output of a CNN classifier trained to learn visual appearance of malignant nodules is then combined with the derived measures of shape change and volumetric growth in assigning a probability of malignancy to the nodule. Due to the limited number of available CT scans of benign and malignant nodules in the image database from the National Lung Screening Trial (NLST), we chose to initially train a deep neural network on the larger LUNA16 Challenge database which was built for the purpose of eliminating false positives from detected nodules in thoracic CT scans. Discriminative features that were learned in this application were transferred to predict malignancy. The algorithm for segmenting nodule shapes in serial CT scans utilizes a sparse combination of training shapes (SCoTS). This algorithm captures a sparse representation of a shape in input data through a linear span of previously delineated shapes in a training repository. The model updates shape prior over level set iterations and captures variabilities in shapes by a sparse combination of the training data. The level set evolution is therefore driven by a data term as well as a term capturing valid prior shapes. During evolution, the shape prior influence is adjusted based on shape reconstruction, with the assigned weight determined from the degree of sparsity of the representation. The discriminative nature of sparse representation, affords us the opportunity to compare nodules\u27 variations in consecutive time points and to predict malignancy. Experimental validations of the proposed segmentation algorithm have been demonstrated on 542 3-D lung nodule data from the LIDC-IDRI database which includes radiologist delineated nodule boundaries. The effectiveness of the proposed deep learning and dictionary learning architectures for malignancy prediction have been demonstrated on CT data from 370 biopsied subjects collected from the NLST database. Each subject in this database had at least two serial CT scans at two separate time points one year apart. The proposed RNN CAD system achieved an ROC Area Under the Curve (AUC) of 0.87, when validated on CT data from nodules at second sequential time point and 0.83 based on dictionary learning method; however, when nodule shape change and appearance were combined, the classifier performance improved to AUC=0.89
    corecore