Search CORE

1,422 research outputs found

Sparse feature learning for image analysis in segmentation, classification, and disease diagnosis.

Author: Hosseini-Asl Ehsan
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2016
Field of study

The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep models, and Alzheimer\u27s disease classification. Nonnegative Matrix Factorization, Autoencoder and 3D Convolutional Autoencoder are used as architectures or models for unsupervised feature learning. They are investigated along with nonnegativity, sparsity and part-based representation constraints for generalized and transferable feature extraction

University of Louisville

Context based multimedia information retrieval

Author: Mølgaard Lasse Lohilahti
Publication venue: Technical University of Denmark
Publication date: 01/12/2009
Field of study

Manifold-driven Grouping of Skeletal Muscle Fibers

Author: Bassez Guillaume
Besbes Ahmed
Deux Jean-Francois
Fleury Gilles
Komodakis Nikos
Langs Georg
Maatouk Mezri
Neji Radhouene
Paragios Nikolaos
Rahmouni Alain
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

In this report, we present a manifold clustering method for the classification of fibers obtained from diffusion tensor images (DTI) of the human skeletal muscle. To this end, we propose the use of angular Hilbertian metrics between multivariate normal distributions to define a family of distances between tensors that we generalize to fibers. The obtained metrics between fiber tracts encompasses both diffusion and localization information. As far as clustering is concerned, we use two methods. The first approach is based on diffusion maps and k-means clustering in the spectral embedding space. The second approach uses a linear programming formulation of prototype-based clustering. This formulation allows for classification over manifolds without the necessity to embed the data in low dimensional spaces and determines automatically the number of clusters. The experimental validation of the proposed framework is done using a manually annotated significant dataset of DTI of the calf muscle for healthy and diseased subjects

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL - UPEC / UPEM

HAL-Rennes 1

MODELING AND QUANTITATIVE ANALYSIS OF WHITE MATTER FIBER TRACTS IN DIFFUSION TENSOR IMAGING

Author: Liang Xuwei
Publication venue: UKnowledge
Publication date: 01/01/2011
Field of study

Diffusion tensor imaging (DTI) is a structural magnetic resonance imaging (MRI) technique to record incoherent motion of water molecules and has been used to detect micro structural white matter alterations in clinical studies to explore certain brain disorders. A variety of DTI based techniques for detecting brain disorders and facilitating clinical group analysis have been developed in the past few years. However, there are two crucial issues that have great impacts on the performance of those algorithms. One is that brain neural pathways appear in complicated 3D structures which are inappropriate and inaccurate to be approximated by simple 2D structures, while the other involves the computational efficiency in classifying white matter tracts. The first key area that this dissertation focuses on is to implement a novel computing scheme for estimating regional white matter alterations along neural pathways in 3D space. The mechanism of the proposed method relies on white matter tractography and geodesic distance mapping. We propose a mask scheme to overcome the difficulty to reconstruct thin tract bundles. Real DTI data are employed to demonstrate the performance of the pro- posed technique. Experimental results show that the proposed method bears great potential to provide a sensitive approach for determining the white matter integrity in human brain. Another core objective of this work is to develop a class of new modeling and clustering techniques with improved performance and noise resistance for separating reconstructed white matter tracts to facilitate clinical group analysis. Different strategies are presented to handle different scenarios. For whole brain tractography reconstructed white matter tracts, a Fourier descriptor model and a clustering algorithm based on multivariate Gaussian mixture model and expectation maximization are proposed. Outliers are easily handled in this framework. Real DTI data experimental results show that the proposed algorithm is relatively effective and may offer an alternative for existing white matter fiber clustering methods. For a small amount of white matter fibers, a modeling and clustering algorithm with the capability of handling white matter fibers with unequal length and sharing no common starting region is also proposed and evaluated with real DTI data

Software expert discovery via knowledge domain embeddings in a collaborative network

Author: Benatallah B
Huang C
Wang X
Yao L
Zhang X
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

© 2018 Elsevier B.V. Community Question Answering (CQA) websites can be claimed as the most major venues for knowledge sharing, and the most effective way of exchanging knowledge at present. Considering that massive amount of users are participating online and generating huge amount data, management of knowledge here systematically can be challenging. Expert recommendation is one of the major challenges, as it highlights users in CQA with potential expertise, which may help match unresolved questions with existing high quality answers while at the same time may help external services like human resource systems as another reference to evaluate their candidates. In this paper, we in this work we propose to exploring experts in CQA websites. We take advantage of recent distributed word representation technology to help summarize text chunks, and in a semantic view exploiting the relationships between natural language phrases to extract latent knowledge domains. By domains, the users’ expertise is determined on their historical performance, and a rank can be compute to given recommendation accordingly. In particular, Stack Overflow is chosen as our dataset to test and evaluate our work, where inclusive experiment shows our competence

OPUS - University of Technology Sydney

Constructing and modeling text-rich information networks: a phrase mining-based approach

Author: Liu Jialu
Publication venue
Publication date: 01/08/2016
Field of study

A lot of digital ink has been spilled on "big data" over the past few years, which is often characterized by an explosion of information. Most of this surge owes its origin to the unstructured data in the wild like words, images and video as comparing to the structured information stored in fielded form in databases. The proliferation of text-heavy data is particularly overwhelming, reflected in everyone's daily life in forms of web documents, business reviews, news, social posts, etc. In the mean time, textual data and structured entities often come in intertwined, such as authors/posters, document categories and tags, and document-associated geo locations. With this background, a core research challenge presents itself as how to turn massive, (semi-)unstructured data into structured knowledge. One promising paradigm studied in this dissertation is to integrate structured and unstructured data, constructing an organized heterogeneous information network, and developing powerful modeling mechanisms on such organized network. We name it text-rich information network, since it is an integrated representation of both structured and unstructured textual data. To thoroughly develop the construction and modeling paradigm, this dissertation will focus on forming a scalable data-driven framework and propose a new line of techniques relying on the idea of phrase mining to bridge textual documents and structured entities. We will first introduce the phrase mining method named SegPhrase+ to globally discover semantically meaningful phrases from massive textual data, providing a high quality dictionary for text structuralization. Clearly distinct from previous works that mostly focused on raw statistics of string matching, SegPhrase+ looks into the phrase context and effectively rectifies raw statistics to significantly boost the performance. Next, a novel algorithm based on latent keyphrases is developed and adopted to largely eliminate irregularities in massive text via providing an consistent and interpretable document representation. As a critical process in constructing the network, it uses the quality phrases generated in the previous step as candidates. From them a set of keyphrases are extracted to represent a particular document with inferred strength through a statistical model. After this step, documents become more structured and are consistently represented in the form of a bipartite network connecting documents with quality keyphrases. A more heterogeneous text-rich information network can be constructed by incorporating different types of document-associated entities as additional nodes. Lastly, a general and scalable framework, Tensor2vec, are to be added to trational data minining machanism, as the latter cannot readily solve the problem when the organized heterogeneous network has nodes with different types. Tensor2vec is expected to elegantly handle relevance search, entity classification, summarization and recommendation problems, by making use of higher-order link information and projecting multi-typed nodes into a shared low-dimensional vectorial space such that node proximity can be easily computed and accurately predicted

Multiple Instance Learning: A Survey of Problem Characteristics and Applications

Author: Carbonneau Marc-André
Cheplygina Veronika
Gagnon Ghyslain
Granger Eric
Publication venue: 'Elsevier BV'
Publication date: 10/12/2016
Field of study

Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

arXiv.org e-Print Archive