18 research outputs found
Discovering core terms for effective short text clustering
This thesis aims to address the current limitations in short texts clustering and provides a systematic framework that includes three novel methods to effectively measure similarity of two short texts, efficiently group short texts, and dynamically cluster short text streams
Hybrid Variational Autoencoder for Time Series Forecasting
Variational autoencoders (VAE) are powerful generative models that learn the
latent representations of input data as random variables. Recent studies show
that VAE can flexibly learn the complex temporal dynamics of time series and
achieve more promising forecasting results than deterministic models. However,
a major limitation of existing works is that they fail to jointly learn the
local patterns (e.g., seasonality and trend) and temporal dynamics of time
series for forecasting. Accordingly, we propose a novel hybrid variational
autoencoder (HyVAE) to integrate the learning of local patterns and temporal
dynamics by variational inference for time series forecasting. Experimental
results on four real-world datasets show that the proposed HyVAE achieves
better forecasting results than various counterpart methods, as well as two
HyVAE variants that only learn the local patterns or temporal dynamics of time
series, respectively
SE-shapelets: Semi-supervised Clustering of Time Series Using Representative Shapelets
Shapelets that discriminate time series using local features (subsequences)
are promising for time series clustering. Existing time series clustering
methods may fail to capture representative shapelets because they discover
shapelets from a large pool of uninformative subsequences, and thus result in
low clustering accuracy. This paper proposes a Semi-supervised Clustering of
Time Series Using Representative Shapelets (SE-Shapelets) method, which
utilizes a small number of labeled and propagated pseudo-labeled time series to
help discover representative shapelets, thereby improving the clustering
accuracy. In SE-Shapelets, we propose two techniques to discover representative
shapelets for the effective clustering of time series. 1) A \textit{salient
subsequence chain} () that can extract salient subsequences (as candidate
shapelets) of a labeled/pseudo-labeled time series, which helps remove massive
uninformative subsequences from the pool. 2) A \textit{linear discriminant
selection} () algorithm to identify shapelets that can capture
representative local features of time series in different classes, for
convenient clustering. Experiments on UCR time series datasets demonstrate that
SE-shapelets discovers representative shapelets and achieves higher clustering
accuracy than counterpart semi-supervised time series clustering methods
Altered brain spontaneous activity in patients with cerebral small vessel disease using the amplitude of low-frequency fluctuation of different frequency bands
BackgroundPrevious studies showed that cerebral small vessel disease (cSVD) is a leading cause of cognitive decline in elderly people and the development of Alzheimer’s disease. Although brain structural changes of cSVD have been documented well, it remains unclear about the properties of brain intrinsic spontaneous activity in patients with cSVD.MethodsWe collected resting-state fMRI (rs-fMRI) and T1-weighted 3D high-resolution brain structural images from 41 cSVD patients and 32 healthy controls (HC). By estimating the amplitude of low-frequency fluctuation (ALFF) under three different frequency bands (typical band: 0.01–0.1 Hz; slow-4: 0.027–0.073 Hz; and slow-5: 0.01–0.027 Hz) in the whole-brain, we analyzed band-specific ALFF differences between the cSVD patients and controls.ResultsThe cSVD patients showed uniformly lower ALFF than the healthy controls in the typical and slow-4 bands (pFWE < 0.05). In the typical band, cSVD patients showed lower ALFF involving voxels of the fusiform, hippocampus, inferior occipital cortex, middle occipital cortex, insula, inferior frontal cortex, rolandic operculum, and cerebellum compared with the controls. In the slow-4 band, cSVD patients showed lower ALFF involving voxels of the cerebellum, hippocampus, occipital, and fusiform compared with the controls. However, there is no significant between-group difference of ALFF in the slow-5 band. Moreover, we found significant “group × frequency” interactions in the left precuneus.ConclusionOur results suggested that brain intrinsic spontaneous activity of cSVD patients was abnormal and showed a frequency-specific characteristic. The ALFF in the slow-4 band may be more sensitive to detecting a malfunction in cSVD patients
The Vehicle Determines the Destination: The Significance of Seminal Plasma Factors for Male Fertility
Of all human infertility cases, up to 50% show contributing factors leading to defects in the male reproductive physiology. Seminal plasma (SP) is the biological fluid derived from the male accessory sex gland which carries spermatozoa passing throughout the male and female reproductive tract during ejaculation. It contains a complicated set of heterogeneous molecular structures, including proteins, cell-free nucleic acid (DNA, microRNA and LncRNA), and small-molecule metabolites as well as inorganic chemicals (ions). For a long time, the substantial significance of seminal plasma factors’ functions has been underestimated, which is restricted to spermatozoa transport and protection. Notably, significant advancements have been made in dissecting seminal plasma components, revealing new insights into multiple aspects of sperm function, as well as fertilization and pregnancy outcomes in recent years. In this review, we summarize the state-of-art discoveries regarding SP compositions and their implications in male fertility, particularly describing the novel understanding of seminal plasma components and related modifications using “omics” approaches and mainly focusing on proteome and RNA-seq data in the latest decade. Meanwhile, we highlighted the proposed mechanism of the regulation of SP molecules on immunomodulation in the female reproductive tract. Moreover, we also discussed the proteins investigated as non-invasive diagnosis biomarkers for male infertility in the clinic
Modeling user preferences on spatiotemporal topics for point-of-interest recommendation
With the development of the location-based social networks (LBSNs) and the popular of mobile devices, a plenty of user's check-in data accumulated enough to enable personalized Point-of-Interest recommendations services. In this paper, we propose a scheme of modeling user's preferences on spatiotemporal topics (UPOST scheme) for accurate individualized POI recommendation. In the UPOST scheme, we discover temporal topics from semantic locations (i.e., people's description words for a location) to learn users' preferences. UPOST infers user's preference for different types of places during different periods by learning the spatiotemporal topics from the historical semantic locations of users. We have developed two algorithms under the UPOST scheme: the time division LDA algorithm (TDLDA) and the time adaptive topic discovery algorithm (TATD). In TDLDA, we divide the check-in dataset into different time segments and use one LDA for one segment. Then we improve TDLDA further by developing a new TATD algorithm to discover spatiotemporal topics. The experimental results demonstrate the effectiveness of our UPOST scheme, both TDLDA and TATD outperform the counterpart method that do not consider semantic locations
Fake News Quick Detection on Dynamic Heterogeneous Information Networks
The spread of fake news has caused great harm to society in recent years. So
the quick detection of fake news has become an important task. Some current
detection methods often model news articles and other related components as a
static heterogeneous information network (HIN) and use expensive
message-passing algorithms. However, in the real-world, quickly identifying
fake news is of great significance and the network may vary over time in terms
of dynamic nodes and edges. Therefore, in this paper, we propose a novel
Dynamic Heterogeneous Graph Neural Network (DHGNN) for fake news quick
detection. More specifically, we first implement BERT and fine-tuned BERT to
get a semantic representation of the news article contents and author profiles
and convert it into graph data. Then, we construct the heterogeneous
news-author graph to reflect contextual information and relationships.
Additionally, we adapt ideas from personalized PageRank propagation and dynamic
propagation to heterogeneous networks in order to reduce the time complexity of
back-propagating through many nodes during training. Experiments on three
real-world fake news datasets show that DHGNN can outperform other GNN-based
models in terms of both effectiveness and efficiency
Reconnecting the Estranged Relationships: Optimizing the Influence Propagation in Evolving Networks
Influence Maximization (IM), which aims to select a set of users from a
social network to maximize the expected number of influenced users, has
recently received significant attention for mass communication and commercial
marketing. Existing research efforts dedicated to the IM problem depend on a
strong assumption: the selected seed users are willing to spread the
information after receiving benefits from a company or organization. In
reality, however, some seed users may be reluctant to spread the information,
or need to be paid higher to be motivated. Furthermore, the existing IM works
pay little attention to capture user's influence propagation in the future
period as well. In this paper, we target a new research problem, named
Reconnecting Top-l Relationships (RTlR) query, which aims to find l number of
previous existing relationships but being stranged later, such that
reconnecting these relationships will maximize the expected benefit of
influenced users by the given group in a future period. We prove that the RTlR
problem is NP-hard. An efficient greedy algorithm is proposed to answer the
RTlR queries with the influence estimation technique and the well-chosen link
prediction method to predict the near future network structure. We also design
a pruning method to reduce unnecessary probing from candidate edges. Further, a
carefully designed order-based algorithm is proposed to accelerate the RTlR
queries. Finally, we conduct extensive experiments on real-world datasets to
demonstrate the effectiveness and efficiency of our proposed methods