18 research outputs found

    Discovering core terms for effective short text clustering

    Get PDF
    This thesis aims to address the current limitations in short texts clustering and provides a systematic framework that includes three novel methods to effectively measure similarity of two short texts, efficiently group short texts, and dynamically cluster short text streams

    Hybrid Variational Autoencoder for Time Series Forecasting

    Full text link
    Variational autoencoders (VAE) are powerful generative models that learn the latent representations of input data as random variables. Recent studies show that VAE can flexibly learn the complex temporal dynamics of time series and achieve more promising forecasting results than deterministic models. However, a major limitation of existing works is that they fail to jointly learn the local patterns (e.g., seasonality and trend) and temporal dynamics of time series for forecasting. Accordingly, we propose a novel hybrid variational autoencoder (HyVAE) to integrate the learning of local patterns and temporal dynamics by variational inference for time series forecasting. Experimental results on four real-world datasets show that the proposed HyVAE achieves better forecasting results than various counterpart methods, as well as two HyVAE variants that only learn the local patterns or temporal dynamics of time series, respectively

    SE-shapelets: Semi-supervised Clustering of Time Series Using Representative Shapelets

    Full text link
    Shapelets that discriminate time series using local features (subsequences) are promising for time series clustering. Existing time series clustering methods may fail to capture representative shapelets because they discover shapelets from a large pool of uninformative subsequences, and thus result in low clustering accuracy. This paper proposes a Semi-supervised Clustering of Time Series Using Representative Shapelets (SE-Shapelets) method, which utilizes a small number of labeled and propagated pseudo-labeled time series to help discover representative shapelets, thereby improving the clustering accuracy. In SE-Shapelets, we propose two techniques to discover representative shapelets for the effective clustering of time series. 1) A \textit{salient subsequence chain} (SSCSSC) that can extract salient subsequences (as candidate shapelets) of a labeled/pseudo-labeled time series, which helps remove massive uninformative subsequences from the pool. 2) A \textit{linear discriminant selection} (LDSLDS) algorithm to identify shapelets that can capture representative local features of time series in different classes, for convenient clustering. Experiments on UCR time series datasets demonstrate that SE-shapelets discovers representative shapelets and achieves higher clustering accuracy than counterpart semi-supervised time series clustering methods

    Altered brain spontaneous activity in patients with cerebral small vessel disease using the amplitude of low-frequency fluctuation of different frequency bands

    Get PDF
    BackgroundPrevious studies showed that cerebral small vessel disease (cSVD) is a leading cause of cognitive decline in elderly people and the development of Alzheimer’s disease. Although brain structural changes of cSVD have been documented well, it remains unclear about the properties of brain intrinsic spontaneous activity in patients with cSVD.MethodsWe collected resting-state fMRI (rs-fMRI) and T1-weighted 3D high-resolution brain structural images from 41 cSVD patients and 32 healthy controls (HC). By estimating the amplitude of low-frequency fluctuation (ALFF) under three different frequency bands (typical band: 0.01–0.1 Hz; slow-4: 0.027–0.073 Hz; and slow-5: 0.01–0.027 Hz) in the whole-brain, we analyzed band-specific ALFF differences between the cSVD patients and controls.ResultsThe cSVD patients showed uniformly lower ALFF than the healthy controls in the typical and slow-4 bands (pFWE < 0.05). In the typical band, cSVD patients showed lower ALFF involving voxels of the fusiform, hippocampus, inferior occipital cortex, middle occipital cortex, insula, inferior frontal cortex, rolandic operculum, and cerebellum compared with the controls. In the slow-4 band, cSVD patients showed lower ALFF involving voxels of the cerebellum, hippocampus, occipital, and fusiform compared with the controls. However, there is no significant between-group difference of ALFF in the slow-5 band. Moreover, we found significant “group × frequency” interactions in the left precuneus.ConclusionOur results suggested that brain intrinsic spontaneous activity of cSVD patients was abnormal and showed a frequency-specific characteristic. The ALFF in the slow-4 band may be more sensitive to detecting a malfunction in cSVD patients

    The Vehicle Determines the Destination: The Significance of Seminal Plasma Factors for Male Fertility

    No full text
    Of all human infertility cases, up to 50% show contributing factors leading to defects in the male reproductive physiology. Seminal plasma (SP) is the biological fluid derived from the male accessory sex gland which carries spermatozoa passing throughout the male and female reproductive tract during ejaculation. It contains a complicated set of heterogeneous molecular structures, including proteins, cell-free nucleic acid (DNA, microRNA and LncRNA), and small-molecule metabolites as well as inorganic chemicals (ions). For a long time, the substantial significance of seminal plasma factors’ functions has been underestimated, which is restricted to spermatozoa transport and protection. Notably, significant advancements have been made in dissecting seminal plasma components, revealing new insights into multiple aspects of sperm function, as well as fertilization and pregnancy outcomes in recent years. In this review, we summarize the state-of-art discoveries regarding SP compositions and their implications in male fertility, particularly describing the novel understanding of seminal plasma components and related modifications using “omics” approaches and mainly focusing on proteome and RNA-seq data in the latest decade. Meanwhile, we highlighted the proposed mechanism of the regulation of SP molecules on immunomodulation in the female reproductive tract. Moreover, we also discussed the proteins investigated as non-invasive diagnosis biomarkers for male infertility in the clinic

    Modeling user preferences on spatiotemporal topics for point-of-interest recommendation

    Full text link
    With the development of the location-based social networks (LBSNs) and the popular of mobile devices, a plenty of user's check-in data accumulated enough to enable personalized Point-of-Interest recommendations services. In this paper, we propose a scheme of modeling user's preferences on spatiotemporal topics (UPOST scheme) for accurate individualized POI recommendation. In the UPOST scheme, we discover temporal topics from semantic locations (i.e., people's description words for a location) to learn users' preferences. UPOST infers user's preference for different types of places during different periods by learning the spatiotemporal topics from the historical semantic locations of users. We have developed two algorithms under the UPOST scheme: the time division LDA algorithm (TDLDA) and the time adaptive topic discovery algorithm (TATD). In TDLDA, we divide the check-in dataset into different time segments and use one LDA for one segment. Then we improve TDLDA further by developing a new TATD algorithm to discover spatiotemporal topics. The experimental results demonstrate the effectiveness of our UPOST scheme, both TDLDA and TATD outperform the counterpart method that do not consider semantic locations

    Fake News Quick Detection on Dynamic Heterogeneous Information Networks

    Full text link
    The spread of fake news has caused great harm to society in recent years. So the quick detection of fake news has become an important task. Some current detection methods often model news articles and other related components as a static heterogeneous information network (HIN) and use expensive message-passing algorithms. However, in the real-world, quickly identifying fake news is of great significance and the network may vary over time in terms of dynamic nodes and edges. Therefore, in this paper, we propose a novel Dynamic Heterogeneous Graph Neural Network (DHGNN) for fake news quick detection. More specifically, we first implement BERT and fine-tuned BERT to get a semantic representation of the news article contents and author profiles and convert it into graph data. Then, we construct the heterogeneous news-author graph to reflect contextual information and relationships. Additionally, we adapt ideas from personalized PageRank propagation and dynamic propagation to heterogeneous networks in order to reduce the time complexity of back-propagating through many nodes during training. Experiments on three real-world fake news datasets show that DHGNN can outperform other GNN-based models in terms of both effectiveness and efficiency

    Reconnecting the Estranged Relationships: Optimizing the Influence Propagation in Evolving Networks

    Full text link
    Influence Maximization (IM), which aims to select a set of users from a social network to maximize the expected number of influenced users, has recently received significant attention for mass communication and commercial marketing. Existing research efforts dedicated to the IM problem depend on a strong assumption: the selected seed users are willing to spread the information after receiving benefits from a company or organization. In reality, however, some seed users may be reluctant to spread the information, or need to be paid higher to be motivated. Furthermore, the existing IM works pay little attention to capture user's influence propagation in the future period as well. In this paper, we target a new research problem, named Reconnecting Top-l Relationships (RTlR) query, which aims to find l number of previous existing relationships but being stranged later, such that reconnecting these relationships will maximize the expected benefit of influenced users by the given group in a future period. We prove that the RTlR problem is NP-hard. An efficient greedy algorithm is proposed to answer the RTlR queries with the influence estimation technique and the well-chosen link prediction method to predict the near future network structure. We also design a pruning method to reduce unnecessary probing from candidate edges. Further, a carefully designed order-based algorithm is proposed to accelerate the RTlR queries. Finally, we conduct extensive experiments on real-world datasets to demonstrate the effectiveness and efficiency of our proposed methods

    An effective authentication for client application using ARM TrustZone

    Full text link
    corecore