Search CORE

134 research outputs found

The SVM With Uneven Margins and Chinese Document Categorization

Author: Li Yaoyong
Shawe-Taylor John
Publication venue: COLIPS PUBLICATIONS
Publication date: 01/01/2003
Field of study

Advanced learning algorithms for cross-language patent retrieval and classification

Author: John Shawe-Taylor
Yaoyong Li
Publication venue
Publication date: 01/01/2007
Field of study

Abstract We study several machine learning algorithms for cross-lan7guage patent retrieval and classification. In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms exploit the bilingual training documents and learn a semantic representation from them. We study Japanese-English cross-language patent retrieval using Kernel Canonical Correlation Analysis (KCCA), a method of correlating linear relationships between two variables in kernel defined feature spaces. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM). In particular, we study two ways of combining the KCCA and SVM and found that one particular combination called SVM_2k achieved better results than other learning algorithms for either bilingual or monolingual test documents

CiteSeerX

eRNA profiling uncovers the enhancer landscape of oesophageal adenocarcinoma and reveals new deregulated pathways

Author: Ahmed Ibrahim
Li Yaoyong
Ogden Samuel
Sharrocks Andrew D
Yang Shen-hsi
Zhang Wei
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 09/03/2023
Field of study

The University of Manchester - Institutional Repository

A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling

Author: Bradford James R
Hey Yvonne
Li Yaoyong
Miller Crispin J
Pepper Stuart D
Yates Tim
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background RNA-Seq exploits the rapid generation of gigabases of sequence data by Massively Parallel Nucleotide Sequencing, allowing for the mapping and digital quantification of whole transcriptomes. Whilst previous comparisons between RNA-Seq and microarrays have been performed at the level of gene expression, in this study we adopt a more fine-grained approach. Using RNA samples from a normal human breast epithelial cell line (MCF-10a) and a breast cancer cell line (MCF-7), we present a comprehensive comparison between RNA-Seq data generated on the Applied Biosystems SOLiD platform and data from Affymetrix Exon 1.0ST arrays. The use of Exon arrays makes it possible to assess the performance of RNA-Seq in two key areas: detection of expression at the granularity of individual exons, and discovery of transcription outside annotated loci. Results We found a high degree of correspondence between the two platforms in terms of exon-level fold changes and detection. For example, over 80% of exons detected as expressed in RNA-Seq were also detected on the Exon array, and 91% of exons flagged as changing from Absent to Present on at least one platform had fold-changes in the same direction. The greatest detection correspondence was seen when the read count threshold at which to flag exons Absent in the SOLiD data was set to <it>t</it><1 suggesting that the background error rate is extremely low in RNA-Seq. We also found RNA-Seq more sensitive to detecting differentially expressed exons than the Exon array, reflecting the wider dynamic range achievable on the SOLiD platform. In addition, we find significant evidence of novel protein coding regions outside known exons, 93% of which map to Exon array probesets, and are able to infer the presence of thousands of novel transcripts through the detection of previously unreported exon-exon junctions. Conclusions By focusing on exon-level expression, we present the most fine-grained comparison between RNA-Seq and microarrays to date. Overall, our study demonstrates that data from a SOLiD RNA-Seq experiment are sufficient to generate results comparable to those produced from Affymetrix Exon arrays, even using only a single replicate from each platform, and when presented with a large genome.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

White Rose Research Online

Distribution of Transmissible Amyloid Proteins in the Liver with Apolipoprotein A-II Amyloidosis

Author: DING Xin
HIGUCHI Keiichi
LI Lin
LIU Yingye
MIYAHARA Hiroki
SAWASHITA Jinko
WANG Yaoyong
YANG Mu
Publication venue: 信州医学会
Publication date: 10/08/2016
Field of study

Article信州医学雑誌 64(4): 183-194(2016)journal articl

Shinshu University Institutional Repository

Open chromatin profiling identifies AP1 as a transcriptional regulator in oesophageal adenocarcinoma.

Author: Ang Yeng S
Britton Edward
Fitzgerald Rebecca C
Li Xiaodun
Li Yaoyong
Mehta Shaveta
OCCAMS consortium
Rogerson Connor
Sharrocks Andrew D
Publication venue: PLoS Genet
Publication date: 01/08/2017
Field of study

Oesophageal adenocarcinoma (OAC) is one of the ten most prevalent forms of cancer and is showing a rapid increase in incidence and yet exhibits poor survival rates. Compared to many other common cancers, the molecular changes that occur in this disease are relatively poorly understood. However, genes encoding chromatin remodeling enzymes are frequently mutated in OAC. This is consistent with the emerging concept that cancer cells exhibit reprogramming of their chromatin environment which leads to subsequent changes in their transcriptional profile. Here, we have used ATAC-seq to interrogate the chromatin changes that occur in OAC using both cell lines and patient-derived material. We demonstrate that there are substantial changes in the regulatory chromatin environment in the cancer cells and using this data we have uncovered an important role for ETS and AP1 transcription factors in driving the changes in gene expression found in OAC cells.Our work received funding from the Wellcome Trust (https://wellcome.ac.uk/) the National Institute for Health Research (https://www.nihr.ac.uk/) and Cancer Research UK (http:// www.cancerresearchuk.org/)

Crossref

Directory of Open Access Journals

The University of Manchester - Institutional Repository

Apollo (Cambridge)

FigShare

GFI1 proteins regulate stem cell formation in the AGM

Author: A Bigas
AC Zovein
AJ Medvinsky
AM Müller
Berthold Göttgens
BK Hadland
C Lancrin
C Lancrin
C Lancrin
Catherine Robin
CF Pereira
Christophe Lancrin
Crispin Miller
CT Foster
Elli Marinopoulou
EYN Lam
G Costa
G Costa
G Lacaud
G Swiers
Georges Lacaud
HJ Fehling
HM Eilken
J Guiu
J Nichols
J Palis
J Wang
J-C Boisset
J-C Boisset
JM Frame
JY Bertrand
K Batta
K Fiolka
K Kissa
K Kumano
KE McGrath
L Vassen
L Wang
L Wang
M Kyba
M Lichtinger
M Lie-A-Ling
M Stefanska
MFTR de Bruijn
MFTR de Bruijn
MH Ledran
Milena Mazan
MJ Chen
MJ Chen
MJ Vogel
Monika Stefanska
NK Wilson
P Kumaravelu
P Sroczynska
P Sroczynska
Q Wang
R Yücel
Rahima Patel
RL Clarke
Roshana Thambyrajah
S Nishikawa
S Pearson
S Rybtsov
S Saleque
S Taoudi
S Taoudi
Shaun Cowley
T Jaffredo
T Möröy
T North
T Okuda
T Yokomizo
Tarik Möröy
TE North
TE North
Thomas Clapes
V Moignard
Valerie Kouskoff
Victoria Moignard
VM Sandler
W Kim
WA Whyte
WJ Harris
Yaoyong Li
Publication venue: Nat Cell Biol
Publication date: 30/11/2015
Field of study

In vertebrates, the first haematopoietic stem cells (HSCs) with multi-lineage and long-term repopulating potential arise in the AGM (aorta-gonad-mesonephros) region. These HSCs are generated from a rare and transient subset of endothelial cells, called haemogenic endothelium (HE), through an endothelial-to-haematopoietic transition (EHT). Here, we establish the absolute requirement of the transcriptional repressors GFI1 and GFI1B (growth factor independence 1 and 1B) in this unique trans-differentiation process. We first demonstrate that Gfi1 expression specifically defines the rare population of HE that generates emerging HSCs. We further establish that in the absence of GFI1 proteins, HSCs and haematopoietic progenitor cells are not produced in the AGM, revealing the critical requirement for GFI1 proteins in intra-embryonic EHT. Finally, we demonstrate that GFI1 proteins recruit the chromatin-modifying protein LSD1, a member of the CoREST repressive complex, to epigenetically silence the endothelial program in HE and allow the emergence of blood cells.We thank the staff at the Advanced Imaging, animal facility, Molecular Biology Core facilities and Flow Cytometry of CRUK Manchester Institute for technical support and Michael Lie-A-Ling and Elli Marinopoulou for initiating the DamID-PIP bioinformatics project. We thank members of the Stem Cell Biology group, the Stem Cell Haematopoiesis groups and Martin Gering for valuable advice and critical reading of the manuscript. Work in our laboratory is supported by the Leukaemia and Lymphoma Research Foundation (LLR), Cancer Research UK (CRUK) and the Biotechnology and Biological Sciences Research Council (BBSRC). SC is the recipient of an MRC senior fellowship (MR/J009202/1).This is the author accepted manuscript. The final version is available from NPG via http://dx.doi.org/10.1038/ncb327

Crossref

The University of Manchester - Institutional Repository

Enlighten

Apollo (Cambridge)

Utrecht University Repository

Leicester Research Archive

Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4 - the AdAPT Method

Author: Ana Menezes
Angus Roberts
Antonio Agudo
AR Aronson
Ariana Znaor
CC Spencer
CI Amos
Claire M. Healy
Cristina Canova
D Chen
D Falush
D Thomas
Dan Chen
David I. Conway
David Zaridze
DI Conway
Diana Zelenika
DL Nicolae
EH Lips
Eleonóra Fabiánová
G Scelo
Graham Byrnes
H Cunningham
H Cunningham
Hamish Cunningham
Ioan Nicolae Mates
Ivana Holcátová
J Wakefield
J Wakefield
J Wakefield
James D. Mckay
JD McKay
JD McKay
Jolanta Lissowska
Jon Wakefield
Jose Eluf-Neto
JP Ioannidis
K Yu
Kristina Kjaerheim
LA Hindorff
Lars Vatten
Lenka Foretova
Leticia Fernandez Garrote
Lorenzo Richiardi
Luigi Barzan
M Hashibe
Manon Delahaye-Sourdeix
Maria Paula Curado
Mark A. Greenwood
Mark Lathrop
Mattias Johansson
Nalin S. Thakker
Neonilia Szeszenia-Dabrowska
Niraj Aswani
Olga Y. Gorlova
P Brennan
P Lagiou
Pagona Lagiou
Paolo Boffetta
Paul Brennan
Peter Thomson
Pilar Galan
R Herrero
Renato Talamini
RJ Hung
Rolando Herrero
S Purcell
S Raychaudhuri
Sergio Koifman
Silvia Franceschi
Simone Benhamou
Stefania Boccia
T Truong
Tatiana V. Macfarlane
Victor Wünsch-Filho
Vladimir Bencko
Vladimir Janout
W Garavello
Wolfgang Ahrens
Xavier Castellsagué
Yaoyong Li
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Background: Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods: We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results: Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [p(trend)] = 2.5 x 10(-3)). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76-0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion: This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config)

Using KCCA for Japanese-English cross-language information retrieval and classification

Author: Li Yaoyong
Shawe-Taylor John
Publication venue
Publication date: 01/01/2004
Field of study

Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two multidimensional variables in feature space. We applied the KCCA to the Japanese-English cross-language information retrieval and classification. The results were encouraging

CiteSeerX

Southampton (e-Prints Soton)

Perceptron-like learning for ontology based information extraction

Author: Yaoyong Li
Publication venue
Publication date
Field of study

Recent work on ontology-based Information Extraction (IE) has tried to make use of knowledge from the target ontology in order to improve semantic annotation results. However, very few approaches exploit the ontology structure itself, and those that do so, have some limitations. This paper introduces a hierarchical learning approach for IE, which uses the target ontology as an essential part of the extraction process, by taking into account the relations between concepts. The approach is evaluated on the largest available semantically annotated corpus. The results demonstrate clearly the benefits of using knowledge from the ontology as input to the information extraction process. We also demonstrate the advantages of our approach over other state-of-the-art learning systems on a commonly used benchmark dataset

CiteSeerX