71 research outputs found
Robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization
Non-negative matrix factorization (NMF) has been widely used in machine learning and data mining fields. As an extension of NMF, non-negative matrix tri-factorization (NMTF) provides more degrees of freedom than NMF. However, standard NMTF algorithm utilizes Frobenius norm to calculate residual error, which can be dramatically affected by noise and outliers. Moreover, the hidden geometric information in feature manifold and sample manifold is rarely learned. Hence, a novel robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization (RCHNMTF) is proposed. First, a robust capped norm is adopted to handle extreme outliers. Second, dual hyper-graph regularization is considered to exploit intrinsic geometric information in feature manifold and sample manifold. Third, orthogonality constraints are added to learn unique data presentation and improve clustering performance. The experiments on seven datasets testify the robustness and superiority of RCHNMTF
Non-negative Matrix Factorization: A Survey
CAUL read and publish agreement 2022Publishe
Interpretable Hyperspectral AI: When Non-Convex Modeling meets Hyperspectral Remote Sensing
Hyperspectral imaging, also known as image spectrometry, is a landmark
technique in geoscience and remote sensing (RS). In the past decade, enormous
efforts have been made to process and analyze these hyperspectral (HS) products
mainly by means of seasoned experts. However, with the ever-growing volume of
data, the bulk of costs in manpower and material resources poses new challenges
on reducing the burden of manual labor and improving efficiency. For this
reason, it is, therefore, urgent to develop more intelligent and automatic
approaches for various HS RS applications. Machine learning (ML) tools with
convex optimization have successfully undertaken the tasks of numerous
artificial intelligence (AI)-related applications. However, their ability in
handling complex practical problems remains limited, particularly for HS data,
due to the effects of various spectral variabilities in the process of HS
imaging and the complexity and redundancy of higher dimensional HS signals.
Compared to the convex models, non-convex modeling, which is capable of
characterizing more complex real scenes and providing the model
interpretability technically and theoretically, has been proven to be a
feasible solution to reduce the gap between challenging HS vision tasks and
currently advanced intelligent data processing models
Mixed Membership Distribution-Free Model
We consider the problem of detecting latent community information of mixed
membership weighted networks in which nodes have mixed memberships and edge
weights connecting between nodes can be finite real numbers. We propose a
general mixed membership distribution-free model for this problem. The model
has no distribution constraints of adjacency matrix's elements but only the
expected values and can be viewed as generalizations of some previous models
including the famous mixed membership stochastic blockmodels. Especially,
signed networks in which nodes can belong to multiple communities can be
generated from our model. We use an efficient spectral algorithm to estimate
community memberships under the model. We derive the convergence rate of the
proposed algorithm under the model using spectral analysis. We demonstrate the
advantages of the mixed membership distribution-free model and the algorithm
with applications to a small scale of simulated networks when adjacency
matrix's elements follow different distributions. We have also applied the
algorithm to five real-world weighted network data sets with encouraging
results.Comment: 23 pages, 14 figures, 3 tabels, comments are welcom
Bipartite Mixed Membership Distribution-Free Model. A novel model for community detection in overlapping bipartite weighted networks
Modeling and estimating mixed memberships for un-directed un-weighted
networks in which nodes can belong to multiple communities has been well
studied in recent years. However, for a more general case, the bipartite
weighted networks in which nodes can belong to multiple communities, row nodes
can be different from column nodes, and all elements of adjacency matrices can
be any finite real values, to our knowledge, there is no model for such
bipartite weighted networks. To close this gap, this paper introduces a novel
model, the Bipartite Mixed Membership Distribution-Free (BiMMDF) model. As a
special case, bipartite signed networks with mixed memberships can also be
generated from BiMMDF. Our model enjoys its advantage by allowing all elements
of an adjacency matrix to be generated from any distribution as long as the
expectation adjacency matrix has a block structure related to node memberships
under BiMMDF. The proposed model can be viewed as an extension of many previous
models, including the popular mixed membership stochastic blcokmodels. An
efficient algorithm with a theoretical guarantee of consistent estimation is
applied to fit BiMMDF. In particular, for a standard bipartite weighted network
with two row (and column) communities, to make the algorithm's error rates
small with high probability, separation conditions are obtained when adjacency
matrices are generated from different distributions under BiMMDF. The behavior
differences of different distributions on separation conditions are verified by
extensive synthetic bipartite weighted networks generated under BiMMDF.
Experiments on real-world directed weighted networks illustrate the advantage
of the algorithm in studying highly mixed nodes and asymmetry between row and
column communities.Comment: 33 pages, 12 figures, 4 table
COMMUNITY DETECTION IN GRAPHS
Thesis (Ph.D.) - Indiana University, Luddy School of Informatics, Computing, and Engineering/University Graduate School, 2020Community detection has always been one of the fundamental research topics in graph mining. As a type of unsupervised or semi-supervised approach, community detection aims to explore node high-order closeness by leveraging graph topological structure. By grouping similar nodes or edges into the same community while separating dissimilar ones apart into different communities, graph structure can be revealed in a coarser resolution. It can be beneficial for numerous applications such as user shopping recommendation and advertisement in e-commerce, protein-protein interaction prediction in the bioinformatics, and literature recommendation or scholar collaboration in citation
analysis. However, identifying communities is an ill-defined problem. Due to the No Free Lunch theorem [1], there is neither gold standard to represent perfect community partition nor universal methods that are able to detect satisfied communities for all tasks under various types of graphs. To have a global view of this research topic, I summarize state-of-art community detection methods by categorizing them based on graph types, research tasks and methodology frameworks. As academic exploration on community detection grows rapidly in recent years, I hereby particularly focus on the state-of-art works published in the latest decade, which may leave out some classic models published decades ago. Meanwhile, three subtle community detection tasks are proposed and assessed in this dissertation as well. First, apart from general models which consider only graph structures, personalized community detection considers user need as auxiliary information to guide community detection. In the end, there will be fine-grained communities for nodes better matching user needs while coarser-resolution communities for the rest of less relevant nodes. Second, graphs always suffer from the sparse connectivity issue. Leveraging conventional models directly on such graphs may hugely distort the quality of generate communities. To tackle such a problem, cross-graph techniques are involved to propagate external graph information as a support for target graph community detection. Third, graph community structure supports a natural language processing (NLP) task to depict node intrinsic characteristics by generating node summarizations via a text generative model. The contribution of this dissertation is threefold. First, a decent amount of researches are reviewed and summarized under a well-defined taxonomy. Existing works about methods, evaluation and applications are all addressed in the literature review. Second, three novel community detection tasks are demonstrated and associated models are proposed and evaluated by comparing with state-of-art baselines under various datasets. Third, the limitations of current works are pointed out and future research tracks with potentials are discussed as well
- …