1,422 research outputs found

    Sparse feature learning for image analysis in segmentation, classification, and disease diagnosis.

    Get PDF
    The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep models, and Alzheimer\u27s disease classification. Nonnegative Matrix Factorization, Autoencoder and 3D Convolutional Autoencoder are used as architectures or models for unsupervised feature learning. They are investigated along with nonnegativity, sparsity and part-based representation constraints for generalized and transferable feature extraction

    Context based multimedia information retrieval

    Get PDF

    Manifold-driven Grouping of Skeletal Muscle Fibers

    Get PDF
    In this report, we present a manifold clustering method for the classification of fibers obtained from diffusion tensor images (DTI) of the human skeletal muscle. To this end, we propose the use of angular Hilbertian metrics between multivariate normal distributions to define a family of distances between tensors that we generalize to fibers. The obtained metrics between fiber tracts encompasses both diffusion and localization information. As far as clustering is concerned, we use two methods. The first approach is based on diffusion maps and k-means clustering in the spectral embedding space. The second approach uses a linear programming formulation of prototype-based clustering. This formulation allows for classification over manifolds without the necessity to embed the data in low dimensional spaces and determines automatically the number of clusters. The experimental validation of the proposed framework is done using a manually annotated significant dataset of DTI of the calf muscle for healthy and diseased subjects

    MODELING AND QUANTITATIVE ANALYSIS OF WHITE MATTER FIBER TRACTS IN DIFFUSION TENSOR IMAGING

    Get PDF
    Diffusion tensor imaging (DTI) is a structural magnetic resonance imaging (MRI) technique to record incoherent motion of water molecules and has been used to detect micro structural white matter alterations in clinical studies to explore certain brain disorders. A variety of DTI based techniques for detecting brain disorders and facilitating clinical group analysis have been developed in the past few years. However, there are two crucial issues that have great impacts on the performance of those algorithms. One is that brain neural pathways appear in complicated 3D structures which are inappropriate and inaccurate to be approximated by simple 2D structures, while the other involves the computational efficiency in classifying white matter tracts. The first key area that this dissertation focuses on is to implement a novel computing scheme for estimating regional white matter alterations along neural pathways in 3D space. The mechanism of the proposed method relies on white matter tractography and geodesic distance mapping. We propose a mask scheme to overcome the difficulty to reconstruct thin tract bundles. Real DTI data are employed to demonstrate the performance of the pro- posed technique. Experimental results show that the proposed method bears great potential to provide a sensitive approach for determining the white matter integrity in human brain. Another core objective of this work is to develop a class of new modeling and clustering techniques with improved performance and noise resistance for separating reconstructed white matter tracts to facilitate clinical group analysis. Different strategies are presented to handle different scenarios. For whole brain tractography reconstructed white matter tracts, a Fourier descriptor model and a clustering algorithm based on multivariate Gaussian mixture model and expectation maximization are proposed. Outliers are easily handled in this framework. Real DTI data experimental results show that the proposed algorithm is relatively effective and may offer an alternative for existing white matter fiber clustering methods. For a small amount of white matter fibers, a modeling and clustering algorithm with the capability of handling white matter fibers with unequal length and sharing no common starting region is also proposed and evaluated with real DTI data

    Software expert discovery via knowledge domain embeddings in a collaborative network

    Full text link
    © 2018 Elsevier B.V. Community Question Answering (CQA) websites can be claimed as the most major venues for knowledge sharing, and the most effective way of exchanging knowledge at present. Considering that massive amount of users are participating online and generating huge amount data, management of knowledge here systematically can be challenging. Expert recommendation is one of the major challenges, as it highlights users in CQA with potential expertise, which may help match unresolved questions with existing high quality answers while at the same time may help external services like human resource systems as another reference to evaluate their candidates. In this paper, we in this work we propose to exploring experts in CQA websites. We take advantage of recent distributed word representation technology to help summarize text chunks, and in a semantic view exploiting the relationships between natural language phrases to extract latent knowledge domains. By domains, the users’ expertise is determined on their historical performance, and a rank can be compute to given recommendation accordingly. In particular, Stack Overflow is chosen as our dataset to test and evaluate our work, where inclusive experiment shows our competence

    Constructing and modeling text-rich information networks: a phrase mining-based approach

    Get PDF
    A lot of digital ink has been spilled on "big data" over the past few years, which is often characterized by an explosion of information. Most of this surge owes its origin to the unstructured data in the wild like words, images and video as comparing to the structured information stored in fielded form in databases. The proliferation of text-heavy data is particularly overwhelming, reflected in everyone's daily life in forms of web documents, business reviews, news, social posts, etc. In the mean time, textual data and structured entities often come in intertwined, such as authors/posters, document categories and tags, and document-associated geo locations. With this background, a core research challenge presents itself as how to turn massive, (semi-)unstructured data into structured knowledge. One promising paradigm studied in this dissertation is to integrate structured and unstructured data, constructing an organized heterogeneous information network, and developing powerful modeling mechanisms on such organized network. We name it text-rich information network, since it is an integrated representation of both structured and unstructured textual data. To thoroughly develop the construction and modeling paradigm, this dissertation will focus on forming a scalable data-driven framework and propose a new line of techniques relying on the idea of phrase mining to bridge textual documents and structured entities. We will first introduce the phrase mining method named SegPhrase+ to globally discover semantically meaningful phrases from massive textual data, providing a high quality dictionary for text structuralization. Clearly distinct from previous works that mostly focused on raw statistics of string matching, SegPhrase+ looks into the phrase context and effectively rectifies raw statistics to significantly boost the performance. Next, a novel algorithm based on latent keyphrases is developed and adopted to largely eliminate irregularities in massive text via providing an consistent and interpretable document representation. As a critical process in constructing the network, it uses the quality phrases generated in the previous step as candidates. From them a set of keyphrases are extracted to represent a particular document with inferred strength through a statistical model. After this step, documents become more structured and are consistently represented in the form of a bipartite network connecting documents with quality keyphrases. A more heterogeneous text-rich information network can be constructed by incorporating different types of document-associated entities as additional nodes. Lastly, a general and scalable framework, Tensor2vec, are to be added to trational data minining machanism, as the latter cannot readily solve the problem when the organized heterogeneous network has nodes with different types. Tensor2vec is expected to elegantly handle relevance search, entity classification, summarization and recommendation problems, by making use of higher-order link information and projecting multi-typed nodes into a shared low-dimensional vectorial space such that node proximity can be easily computed and accurately predicted

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Full text link
    Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research
    • …
    corecore