1,489 research outputs found

    Exploring the similarity of medical imaging classification problems

    Full text link
    Supervised learning is ubiquitous in medical image analysis. In this paper we consider the problem of meta-learning -- predicting which methods will perform well in an unseen classification problem, given previous experience with other classification problems. We investigate the first step of such an approach: how to quantify the similarity of different classification problems. We characterize datasets sampled from six classification problems by performance ranks of simple classifiers, and define the similarity by the inverse of Euclidean distance in this meta-feature space. We visualize the similarities in a 2D space, where meaningful clusters start to emerge, and show that the proposed representation can be used to classify datasets according to their origin with 89.3\% accuracy. These findings, together with the observations of recent trends in machine learning, suggest that meta-learning could be a valuable tool for the medical imaging community

    Probing the topological properties of complex networks modeling short written texts

    Get PDF
    In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well -- many informative discoveries have been made this way -- but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyzes performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks

    Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

    Full text link
    Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics

    Machine learning methods for discriminating natural targets in seabed imagery

    Get PDF
    The research in this thesis concerns feature-based machine learning processes and methods for discriminating qualitative natural targets in seabed imagery. The applications considered, typically involve time-consuming manual processing stages in an industrial setting. An aim of the research is to facilitate a means of assisting human analysts by expediting the tedious interpretative tasks, using machine methods. Some novel approaches are devised and investigated for solving the application problems. These investigations are compartmentalised in four coherent case studies linked by common underlying technical themes and methods. The first study addresses pockmark discrimination in a digital bathymetry model. Manual identification and mapping of even a relatively small number of these landform objects is an expensive process. A novel, supervised machine learning approach to automating the task is presented. The process maps the boundaries of ≈ 2000 pockmarks in seconds - a task that would take days for a human analyst to complete. The second case study investigates different feature creation methods for automatically discriminating sidescan sonar image textures characteristic of Sabellaria spinulosa colonisation. Results from a comparison of several textural feature creation methods on sonar waterfall imagery show that Gabor filter banks yield some of the best results. A further empirical investigation into the filter bank features created on sonar mosaic imagery leads to the identification of a useful configuration and filter parameter ranges for discriminating the target textures in the imagery. Feature saliency estimation is a vital stage in the machine process. Case study three concerns distance measures for the evaluation and ranking of features on sonar imagery. Two novel consensus methods for creating a more robust ranking are proposed. Experimental results show that the consensus methods can improve robustness over a range of feature parameterisations and various seabed texture classification tasks. The final case study is more qualitative in nature and brings together a number of ideas, applied to the classification of target regions in real-world sonar mosaic imagery. A number of technical challenges arose and these were surmounted by devising a novel, hybrid unsupervised method. This fully automated machine approach was compared with a supervised approach in an application to the problem of image-based sediment type discrimination. The hybrid unsupervised method produces a plausible class map in a few minutes of processing time. It is concluded that the versatile, novel process should be generalisable to the discrimination of other subjective natural targets in real-world seabed imagery, such as Sabellaria textures and pockmarks (with appropriate features and feature tuning.) Further, the full automation of pockmark and Sabellaria discrimination is feasible within this framework

    A fully automatic CAD-CTC system based on curvature analysis for standard and low-dose CT data

    Get PDF
    Computed tomography colonography (CTC) is a rapidly evolving noninvasive medical investigation that is viewed by radiologists as a potential screening technique for the detection of colorectal polyps. Due to the technical advances in CT system design, the volume of data required to be processed by radiologists has increased significantly, and as a consequence the manual analysis of this information has become an increasingly time consuming process whose results can be affected by inter- and intrauser variability. The aim of this paper is to detail the implementation of a fully integrated CAD-CTC system that is able to robustly identify the clinically significant polyps in the CT data. The CAD-CTC system described in this paper is a multistage implementation whose main system components are: 1) automatic colon segmentation; 2) candidate surface extraction; 3) feature extraction; and 4) classification. Our CAD-CTC system performs at 100% sensitivity for polyps larger than 10 mm, 92% sensitivity for polyps in the range 5 to 10 mm, and 57.14% sensitivity for polyps smaller than 5 mm with an average of 3.38 false positives per dataset. The developed system has been evaluated on synthetic and real patient CT data acquired with standard and low-dose radiation levels
    corecore