Search CORE

143 research outputs found

Concept learning of text documents

Author: An Jiyuan
Chen Yi-Ping Phoebe
Publication venue: IEEE Xplore
Publication date: 01/01/2004
Field of study

Concept learning of text documents can be viewed as the problem of acquiring the definition of a general category of documents. To definite the category of a text document, the Conjunctive of keywords is usually be used. These keywords should be fewer and comprehensible. A naïve method is enumerating all combinations of keywords to extract suitable ones. However, because of the enormous number of keyword combinations, it is impossible to extract the most relevant keywords to describe the categories of documents by enumerating all possible combinations of keywords. Many heuristic methods are proposed, such as GA-base, immune based algorithm. In this work, we introduce pruning power technique and propose a robust enumeration-based concept learning algorithm. Experimental results show that the rules produce by our approach has more comprehensible and simplicity than by other methods. <br /

Deakin Research Online

Finding short patterns to classify text documents

Author: An Jiyuan
Chen Yi-Ping Phoebe
Publication venue: IEEE Xplore
Publication date: 01/01/2006
Field of study

Many classification methods have been proposed to find patterns in text documents. However, according to Occam\u27s razor principle, "the explanation of any phenomenon should make as few assumptions as possible", short patterns usually have more explainable and meaningful for classifying text documents. In this paper, we propose a depth-first pattern generation algorithm, which can find out short patterns from text document more effectively, comparing with breadth-first algorithm <br /

Deakin Research Online

Finding coverage using incremental attribute combinations

Author: An Jiyuan
Chen Yi-Ping Phoebe
Publication venue: ICIC International
Publication date: 01/05/2009
Field of study

Coverage is the range that covers only positive samples in attribute (or feature) space. Finding coverage is the kernel problem in induction algorithms because of the fact that coverage can be used as rules to describe positive samples. To reflect the characteristic of training samples, it is desirable that the large coverage that cover more positive samples. However, it is difficult to find large coverage, because the attribute space is usually very high dimensionality. Many heuristic methods such as ID3, AQ and CN2 have been proposed to find large coverage. A robust algorithm also has been proposed to find the largest coverage, but the complexities of time and space are costly when the dimensionality becomes high. To overcome this drawback, this paper proposes an algorithm that adopts incremental feature combinations to effectively find the largest coverage. In this algorithm, the irrelevant coverage can be pruned away at early stages because potentially large coverage can be found earlier. Experiments show that the space and time needed to find the largest coverage has been significantly reduced.<br /

Deakin Research Online

Finding rule groups to classify high dimensional gene expression datasets

Author: An Jiyuan
Chen Yi-Ping Phoebe
Publication venue: IEEE Xplore
Publication date: 01/01/2006
Field of study

Microarray data provides quantitative information about the transcription profile of cells. To analyze microarray datasets, methodology of machine learning has increasingly attracted bioinformatics researchers. Some approaches of machine learning are widely used to classify and mine biological datasets. However, many gene expression datasets are extremely high dimensionality, traditional machine learning methods can not be applied effectively and efficiently. This paper proposes a robust algorithm to find out rule groups to classify gene expression datasets. Unlike the most classification algorithms, which select dimensions (genes) heuristically to form rules groups to identify classes such as cancerous and normal tissues, our algorithm guarantees finding out best-k dimensions (genes), which are most discriminative to classify samples in different classes, to form rule groups for the classification of expression datasets. Our experiments show that the rule groups obtained by our algorithm have higher accuracy than that of other classification approaches <br /

Deakin Research Online

Keyword extraction for text categorization

Author: An Jiyuan
Chen Yi-Ping Phoebe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Text categorization (TC) is one of the main applications of machine learning. Many methods have been proposed, such as Rocchio method, Naive bayes based method, and SVM based text classification method. These methods learn labeled text documents and then construct a classifier. A new coming text document\u27s category can be predicted. However, these methods do not give the description of each category. In the machine learning field, there are many concept learning algorithms, such as, ID3 and CN2. This paper proposes a more robust algorithm to induce concepts from training examples, which is based on enumeration of all possible keywords combinations. Experimental results show that the rules produced by our approach have more precision and simplicity than that of other methods.<br /

Deakin Research Online

Concept Learning of Text Documents

Author: Jiyuan An
Yi-Ping Phoebe Chen
Publication venue
Publication date: 05/03/2020
Field of study

Abstrac

CiteSeerX

Evaluating the role of alcohol consumption in breast and ovarian cancer susceptibility using population-based cohort studies and two-sample Mendelian randomization analyses.

Author: An Jiyuan
Berchuck Andrew
Bojesen Stig E
Chenevix-Trench Georgia
Derks Eske M
Easton Douglas F
Eriksson Mikael
Hall Per
Hwang Liang-Dar
Kelemen Linda E
MacGregor Stuart
Matsuo Keitaro
Ong Jue-Sheng
Pharoah Paul P
Webb Penelope M
Publication venue: Int J Cancer
Publication date: 01/01/2020
Field of study

Alcohol consumption is correlated positively with risk for breast cancer in observational studies, but observational studies are subject to reverse causation and confounding. The association with epithelial ovarian cancer (EOC) is unclear. We performed both observational Cox regression and two-sample Mendelian randomization (MR) analyses using data from various European cohort studies (observational) and publicly available cancer consortia (MR). These estimates were compared to World Cancer Research Fund (WCRF) findings. In our observational analyses, the multivariable-adjusted hazard ratios (HR) for a one standard drink/day increase was 1.06 (95% confidence interval [CI]; 1.04, 1.08) for breast cancer and 1.00 (0.92, 1.08) for EOC, both of which were consistent with previous WCRF findings. MR ORs per genetically predicted one standard drink/day increase estimated via 34 SNPs using MR-PRESSO were 1.00 (0.93, 1.08) for breast cancer and 0.95 (0.85, 1.06) for EOC. Stratification by EOC subtype or estrogen receptor status in breast cancers made no meaningful difference to the results. For breast cancer, the CIs for the genetically derived estimates include the point-estimate from observational studies so are not inconsistent with a small increase in risk. Our data provide additional evidence that alcohol intake is unlikely to have anything other than a very small effect on risk of EOC

Copenhagen University Research Information System

Apollo (Cambridge)

University of Queensland eSpace

The causal relationship between gastro-oesophageal reflux disease and idiopathic pulmonary fibrosis: a bidirectional two-sample Mendelian randomisation study

Author: Allen Richard J
An Jiyuan
Cullinan Paul
Del Greco M Fabiola
Flores Carlos
Jenkins R Gisli
MacGregor Stuart
Maher Toby M
Minelli Cosetta
Molyneaux Philip L
Noth Imre
Oldham Justin M
Ong Jue-Sheng
Reynolds Carl J
Wain Louise V
Yates Tom A
Publication venue: 'European Respiratory Society (ERS)'
Publication date: 01/05/2023
Field of study

Background: Gastro-oesophageal reflux disease (GORD) is associated with idiopathic pulmonary fibrosis (IPF) in observational studies. It is not known if this association arises because GORD causes IPF or because IPF causes GORD, or because of confounding by factors, such as smoking, associated with both GORD and IPF. We used bidirectional Mendelian randomisation (MR), where genetic variants are used as instrumental variables to address issues of confounding and reverse causation, to examine how, if at all, GORD and IPF are causally related. Methods: A bidirectional two-sample MR was performed to estimate the causal effect of GORD on IPF risk and of IPF on GORD risk, using genetic data from the largest GORD (78 707 cases and 288 734 controls) and IPF (4125 cases and 20 464 controls) genome-wide association meta-analyses currently available. Results: GORD increased the risk of IPF, with an OR of 1.6 (95% CI 1.04–2.49; p=0.032). There was no evidence of a causal effect of IPF on the risk of GORD, with an OR of 0.999 (95% CI 0.997–1.000; p=0.245). Conclusions: We found that GORD increases the risk of IPF, but found no evidence that IPF increases the risk of GORD. GORD should be considered in future studies of IPF risk and interest in it as a potential therapeutic target should be renewed. The mechanisms underlying the effect of GORD on IPF should also be investigated

UCL Discovery

Transcriptomic analysis of mRNA expression and alternative splicing during mouse sex determination

Author: An Jiyuan
He Mingyu
Koopman Peter
Lehman Melanie L.
Nelson Colleen C.
Ng Ee Ting
Spiller Cassy M.
Svingen Terje
Wang Chenwei
Zhao Liang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Mammalian sex determination hinges on sexually dimorphic transcriptional programs in developing fetal gonads. A comprehensive view of these programs is crucial for understanding the normal development of fetal testes and ovaries and the etiology of human disorders of sex development (DSDs), many of which remain unexplained. Using strand-specific RNA-sequencing, we characterized the mouse fetal gonadal transcriptome from 10.5 to 13.5 days post coitum, a key time window in sex determination and gonad development. Our dataset benefits from a greater sensitivity, accuracy and dynamic range compared to microarray studies, allows global dynamics and sex-specificity of gene expression to be assessed, and provides a window to non-transcriptional events such as alternative splicing. Spliceomic analysis uncovered female-specific regulation of Lef1 splicing, which may contribute to the enhanced WNT signaling activity in XX gonads. We provide a user-friendly visualization tool for the complete transcriptomic and spliceomic dataset as a resource for the field

Queensland University of Technology ePrints Archive

Online Research Database In Technology

University of Queensland eSpace

miRPlant: an integrated tool for identification of plant miRNA from RNA sequencing data

Author: Atul Sajjanhar
B Langmead
BC Meyers
CC Pritchard
Colleen C Nelson
IL Hofacker
J An
Jiyuan An
John Lai
Melanie L Lehman
MR Friedlander
MR Friedlander
QH Zhu
X Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref