14 research outputs found
Multi-Source Multi-View Clustering via Discrepancy Penalty
With the advance of technology, entities can be observed in multiple views.
Multiple views containing different types of features can be used for
clustering. Although multi-view clustering has been successfully applied in
many applications, the previous methods usually assume the complete instance
mapping between different views. In many real-world applications, information
can be gathered from multiple sources, while each source can contain multiple
views, which are more cohesive for learning. The views under the same source
are usually fully mapped, but they can be very heterogeneous. Moreover, the
mappings between different sources are usually incomplete and partially
observed, which makes it more difficult to integrate all the views across
different sources. In this paper, we propose MMC (Multi-source Multi-view
Clustering), which is a framework based on collective spectral clustering with
a discrepancy penalty across sources, to tackle these challenges. MMC has
several advantages compared with other existing methods. First, MMC can deal
with incomplete mapping between sources. Second, it considers the disagreements
between sources while treating views in the same source as a cohesive set.
Third, MMC also tries to infer the instance similarities across sources to
enhance the clustering performance. Extensive experiments conducted on
real-world data demonstrate the effectiveness of the proposed approach
Online Unsupervised Multi-view Feature Selection
In the era of big data, it is becoming common to have data with multiple
modalities or coming from multiple sources, known as "multi-view data".
Multi-view data are usually unlabeled and come from high-dimensional spaces
(such as language vocabularies), unsupervised multi-view feature selection is
crucial to many applications. However, it is nontrivial due to the following
challenges. First, there are too many instances or the feature dimensionality
is too large. Thus, the data may not fit in memory. How to select useful
features with limited memory space? Second, how to select features from
streaming data and handles the concept drift? Third, how to leverage the
consistent and complementary information from different views to improve the
feature selection in the situation when the data are too big or come in as
streams? To the best of our knowledge, none of the previous works can solve all
the challenges simultaneously. In this paper, we propose an Online unsupervised
Multi-View Feature Selection, OMVFS, which deals with large-scale/streaming
multi-view data in an online fashion. OMVFS embeds unsupervised feature
selection into a clustering algorithm via NMF with sparse learning. It further
incorporates the graph regularization to preserve the local structure
information and help select discriminative features. Instead of storing all the
historical data, OMVFS processes the multi-view data chunk by chunk and
aggregates all the necessary information into several small matrices. By using
the buffering technique, the proposed OMVFS can reduce the computational and
storage cost while taking advantage of the structure information. Furthermore,
OMVFS can capture the concept drifts in the data streams. Extensive experiments
on four real-world datasets show the effectiveness and efficiency of the
proposed OMVFS method. More importantly, OMVFS is about 100 times faster than
the off-line methods
Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial
Objective
It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.
Methods
We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression.
Results
Article pairs from the same trial were identified with high accuracy (F1 score = 0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial.
Discussion
Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial
Unsupervised Learning from Multi-view Data
With the advance of technology, data are often with multiple modalities or coming from multiple sources. Such data are called multi-view data. Usually, multiple views provide complementary information for the semantically same data. Learning from multi-view data can obtain better performance than relying on just one single view. Also, as the data explodes, most of the multi-view data are unlabeled and it is expensive to label the data. Thus, unsupervised learning from multi-view data is very important in many real-world applications. However, in real-world application, multi-view data are usually heterogeneous (different feature spaces for different views), incomplete, large-scale and high-dimensional. These challenges prevent us from applying existing unsupervised learning methods to real-world multi-view data.
This dissertation presents my Ph.D. research works on unsupervised learning from multi-view data. First, we present the first algorithm to solve the multiple incomplete views clustering problem by collectively learning the kernel matrices for different views. Furthermore, we propose a more general tensor based multi-incomplete-view clustering method, which uses a tensor to model the multiple incomplete views and learns the latent features by sparse tensor factorization. Third, we present a faster multi-incomplete-view clustering algorithm based on weighted nonnegative matrix factorization. Lastly, we propose an online multi-view unsupervised feature selection algorithm to solve the scalability and high-dimensionality challenges
Nuggets: findings shared in multiple clinical case reports
OBJECTIVE: The researchers assessed prevalence in the clinical case report literature of multiple reports independently reporting the same (or nearly the same) main finding. METHODS: Results from forty-five PubMed queries were examined for incidence and features of main findings (“nuggets”) shared in at least four case reports. RESULTS: The authors found that nuggets are surprisingly prevalent and large in the case report literature, the largest found so far was reported in seventeen articles. In most cases, the main findings of case reports were evident from examining titles alone. CONCLUSIONS: Our curated examples should serve as gold standards for developing specific automated methods for finding nuggets. Nuggets potentially enable finding-based (instead of topic-based) information retrieval
Improving Soil Enzyme Activities and Related Quality Properties of Reclaimed Soil by Applying Weathered Coal in Opencast-Mining Areas of the Chinese Loess Plateau
There are many problems for the reclaimed soil in opencast-mining areas of the Loess Plateau of China such as poor soil structure and extreme poverty in soil nutrients and so on. For the sake of finding a better way to improve soil quality, the current study was to apply the weathered coal for repairing soil media and investigate the physicochemical properties of the reclaimed soil and the changes in enzyme activities after planting Robinia pseucdoacacia. The results showed that the application of the weathered coal significantly improved the quality of soil aggregates, increased the content of water stable aggregates, and the organic matter, humus, and the cation exchange capacity of topsoil were significantly improved, but it did not have a significant effect on soil pH. Planting R. pseucdoacacia significantly enhanced the activities of soil catalase, urease, and invertase, but the application of the weathered coal inhibited the activity of catalase. Although the application of appropriate weathered coal was able to significantly increase urease activity, the activities of catalase, urease, or invertase had a close link with the soil profile levels and time. This study suggests that applying weathered coals could improve the physicochemical properties and soil enzyme activities of the reclaimed soil in opencast-mining areas of the Loess Plateau of China and the optimum applied amount of the weathered coal for reclaimed soil remediation is about 27?000?kg?hm-2.There are many problems for the reclaimed soil in opencast-mining areas of the Loess Plateau of China such as poor soil structure and extreme poverty in soil nutrients and so on. For the sake of finding a better way to improve soil quality, the current study was to apply the weathered coal for repairing soil media and investigate the physicochemical properties of the reclaimed soil and the changes in enzyme activities after planting Robinia pseucdoacacia. The results showed that the application of the weathered coal significantly improved the quality of soil aggregates, increased the content of water stable aggregates, and the organic matter, humus, and the cation exchange capacity of topsoil were significantly improved, but it did not have a significant effect on soil pH. Planting R. pseucdoacacia significantly enhanced the activities of soil catalase, urease, and invertase, but the application of the weathered coal inhibited the activity of catalase. Although the application of appropriate weathered coal was able to significantly increase urease activity, the activities of catalase, urease, or invertase had a close link with the soil profile levels and time. This study suggests that applying weathered coals could improve the physicochemical properties and soil enzyme activities of the reclaimed soil in opencast-mining areas of the Loess Plateau of China and the optimum applied amount of the weathered coal for reclaimed soil remediation is about 27?000?kg?hm-2
Recommended from our members
lncRNA BREA2 promotes metastasis by disrupting the WWP2-mediated ubiquitination of Notch1.
Notch has been implicated in human cancers and is a putative therapeutic target. However, the regulation of Notch activation in the nucleus remains largely uncharacterized. Therefore, characterizing the detailed mechanisms governing Notch degradation will identify attractive strategies for treating Notch-activated cancers. Here, we report that the long noncoding RNA (lncRNA) BREA2 drives breast cancer metastasis by stabilizing the Notch1 intracellular domain (NICD1). Moreover, we reveal WW domain containing E3 ubiquitin protein ligase 2 (WWP2) as an E3 ligase for NICD1 at K1821 and a suppressor of breast cancer metastasis. Mechanistically, BREA2 impairs WWP2-NICD1 complex formation and in turn stabilizes NICD1, leading to Notch signaling activation and lung metastasis. BREA2 loss sensitizes breast cancer cells to inhibition of Notch signaling and suppresses the growth of breast cancer patient-derived xenograft tumors, highlighting its therapeutic potential in breast cancer. Taken together, these results reveal the lncRNA BREA2 as a putative regulator of Notch signaling and an oncogenic player driving breast cancer metastasis