Search CORE

684 research outputs found

Modeling Emotion Influence from Images in Social Networks

Author: Cai Lianhong
Jia Jia
Tang Jie
Wang Xiaohui
Publication venue
Publication date: 17/01/2014
Field of study

Images become an important and prevalent way to express users' activities, opinions and emotions. In a social network, individual emotions may be influenced by others, in particular by close friends. We focus on understanding how users embed emotions into the images they uploaded to the social websites and how social influence plays a role in changing users' emotions. We first verify the existence of emotion influence in the image networks, and then propose a probabilistic factor graph based emotion influence model to answer the questions of "who influences whom". Employing a real network from Flickr as experimental data, we study the effectiveness of factors in the proposed model with in-depth data analysis. Our experiments also show that our model, by incorporating the emotion influence, can significantly improve the accuracy (+5%) for predicting emotions from images. Finally, a case study is used as the anecdotal evidence to further demonstrate the effectiveness of the proposed model

arXiv.org e-Print Archive

CiteSeerX

Probing the Metal Enrichment of the Intergalactic Medium at $z=5-6$ Using the Hubble Space Telescope

Author: Cai Zheng
Dave Romeel
Fan Xiaohui
Finlator Kristian
Oppenheimer Ben
Publication venue: 'American Astronomical Society'
Publication date: 28/09/2017
Field of study

We test the galactic outflow model by probing associated galaxies of four strong intergalactic CIV absorbers at

z=5

--6 using the Hubble Space Telescope (HST) ACS ramp narrowband filters. The four strong CIV absorbers reside at

z=5.74

5.52

4.95

, and

4.87

, with column densities ranging from

N_{\rm{CIV}}=10^{13.8}

^{-2}

10^{14.8}

^{-2}

. At

z=5.74

, we detect an i-dropout Ly

\alpha

emitter (LAE) candidate with a projected impact parameter of 42 physical kpc from the CIV absorber. This LAE candidate has a Ly

\alpha

-based star formation rate (SFR

_{\rm{Ly\alpha}}

) of 2

M_\odot

^{-1}

and a UV-based SFR of 4

M_\odot

^{-1}

. Although we cannot completely rule out that this

i

-dropout emitter may be an [OII] interloper, its measured properties are consistent with the CIV powering galaxy at

z=5.74

. For CIV absorbers at

z=4.95

and

z=4.87

, although we detect two LAE candidates with impact parameters of 160 kpc and 200 kpc, such distances are larger than that predicted from the simulations. Therefore we treat them as non-detections. For the system at

z=5.52

, we do not detect LAE candidates, placing a 3-

\sigma

upper limit of SFR

_{\rm{Ly\alpha}}\approx 1.5\ M_\odot

^{-1}

. In summary, in these four cases, we only detect one plausible CIV source at

z=5.74

. Combining the modest SFR of the one detection and the three non-detections, our HST observations strongly support that smaller galaxies (SFR

_{\rm{Ly\alpha}} \lesssim 2\ M_\odot

^{-1}

) are main sources of intergalactic CIV absorbers, and such small galaxies play a major role in the metal enrichment of the intergalactic medium at

z\gtrsim5

.Comment: Accepted for Publications in ApJ

arXiv.org e-Print Archive

Crossref

The University of Arizona

Mining heterogeneous information graph for health status classification

Author: Cai Yi
Pham Thuan
Tao Xiaohui
Yong Jianming
Zhang Ji
Zhang Wenping
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2018
Field of study

In the medical domain, there exists a large volume of data from multiple sources such as electronic health records, general health examination results, and surveys. The data contain useful information reflecting people’s health and provides great opportunities for studies to improve the quality of healthcare. However, how to mine these data effectively and efficiently still remains a critical challenge. In this paper, we propose an innovative classification model for knowledge discovery from patients’ personal health repositories. By based on analytics of massive data in the National Health and Nutrition Examination Survey, the study builds a classification model to classify patients’health status and reveal the specific disease potentially suffered by the patient. This paper makes significant contributions to the advancement of knowledge in data mining with an innovative classification model specifically crafted for domain-based data. Moreover, this research contributes to the healthcare community by providing a deep understanding of people’s health with accessibility to the patterns in various observations

Crossref

University of Southern Queensland ePrints

A hybrid representation based simile component extraction

Author: Cai Yi
Chen Junying
Li Qing
Ren Da
Tao Xiaohui
Zhang Pengfei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/03/2020
Field of study

Simile, a special type of metaphor, can help people to express their ideas more clearly. Simile component extraction is to extract tenors and vehicles from sentences. This task has a realistic significance since it is useful for building cognitive knowledge base. With the development of deep neural networks, researchers begin to apply neural models to component extraction. Simile components should be in cross-domain. According to our observations, words in cross-domain always have different concepts. Thus, concept is important when identifying whether two words are simile components or not. However, existing models do not integrate concept into their models. It is difficult for these models to identify the concept of a word. What’s more, corpus about simile component extraction is limited. There are a number of rare words or unseen words, and the representations of these words are always not proper enough. Exiting models can hardly extract simile components accurately when there are low-frequency words in sentences. To solve these problems, we propose a hybrid representation-based component extraction (HRCE) model. Each word in HRCE is represented in three different levels: word level, concept level and character level. Concept representations (representations in concept level) can help HRCE to identify the words in cross-domain more accurately. Moreover, with the help of character representations (representations in character levels), HRCE can represent the meaning of a word more properly since words are consisted of characters and these characters can partly represent the meaning of words. We conduct experiments to compare the performance between HRCE and existing models. The experiment results show that HRCE significantly outperforms current models

University of Southern Queensland ePrints

A new measurement of sequence conservation

Author: Cai Xiaohui
Hu Haiyan
Li Xiaoman
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional regions that contain conserved segments separated by relatively long divergent segments are ignored. Our goal in this paper is to define a new measurement of sequence conservation such that both contiguously conserved regions and discontiguously conserved regions can be detected based on this new measurement. Here and in the following, conserved regions are those regions that share similarity higher than a pre-specified similarity threshold with their homologous regions in other species. That is, conserved regions are good candidates of functional regions and may not be always functional. Moreover, conserved regions may contain long and divergent segments. Results To identify both discontiguously and contiguously conserved regions, we proposed a new measurement of sequence conservation, which measures sequence similarity based only on the conserved segments within the regions. By defining conserved segments using the local alignment tool CHAOS, under the new measurement, we analyzed the conservation of 1642 experimentally verified human functional non-coding regions in the mouse genome. We found that the conservation in at least 11% of these functional regions could be missed by the current conservation analysis methods. We also found that 72% of the mouse homologous regions identified based on the new measurement are more similar to the human functional sequences than the aligned mouse sequences from the UCSC genome browser. We further compared BLAST and discontiguous MegaBLAST with our method. We found that our method picks up many more conserved segments than BLAST and discontiguous MegaBLAST in these regions. Conclusions It is critical to have a new measurement of sequence conservation that is based only on the conserved segments in one region. Such a new measurement can aid the identification of better local "orthologous" regions. It will also shed light on the identification of new types of conserved functional regions in vertebrate genomes <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Mining health knowledge graph for health risk prediction

Author: Cai Yi
Goh Wee Pheng
Pham Thuan
Tao Xiaohui
Yong Jianming
Zhang Ji
Zhang Wenping
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/03/2020
Field of study

Nowadays classification models have been widely adopted in healthcare, aiming at supporting practitioners for disease diagnosis and human error reduction. The challenge is utilising effective methods to mine real-world data in the medical domain, as many different models have been proposed with varying results. A large number of researchers focus on the diversity problem of real-time data sets in classification models. Some previous works developed methods comprising of homogeneous graphs for knowledge representation and then knowledge discovery. However, such approaches are weak in discovering different relationships among elements. In this paper, we propose an innovative classification model for knowledge discovery from patients’ personal health repositories. The model discovers medical domain knowledge from the massive data in the National Health and Nutrition Examination Survey (NHANES). The knowledge is conceptualised in a heterogeneous knowledge graph. On the basis of the model, an innovative method is developed to help uncover potential diseases suffered by people and, furthermore, to classify patients’ health risk. The proposed model is evaluated by comparison to a baseline model also built on the NHANES data set in an empirical experiment. The performance of proposed model is promising. The paper makes significant contributions to the advancement of knowledge in data mining with an innovative classification model specifically crafted for domain-based data. In addition, by accessing the patterns of various observations, the research contributes to the work of practitioners by providing a multifaceted understanding of individual and public health

University of Southern Queensland ePrints

Split Bregman Method for Sparse Inverse Covariance Estimation with Matrix Iteration Acceleration

Author: Cai Jian-Feng
Xie Xiaohui
Ye Gui-Bo
Publication venue
Publication date: 23/12/2010
Field of study

We consider the problem of estimating the inverse covariance matrix by maximizing the likelihood function with a penalty added to encourage the sparsity of the resulting matrix. We propose a new approach based on the split Bregman method to solve the regularized maximum likelihood estimation problem. We show that our method is significantly faster than the widely used graphical lasso method, which is based on blockwise coordinate descent, on both artificial and real-world data. More importantly, different from the graphical lasso, the split Bregman based method is much more general, and can be applied to a class of regularization terms other than the

\ell_1

nor

arXiv.org e-Print Archive

CiteSeerX