Search CORE

436 research outputs found

Building Morphological Chains for Agglutinative Languages

Author: B Can
H Ishwaran
J Goldsmith
J Hankamer
K Narasimhan
Publication venue
Publication date: 23/04/2017
Field of study

In this paper, we build morphological chains for agglutinative languages by using a log-linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, whereas in the original model candidates are generated by considering only binary segmentation of each word. The results show that we improve the state-of-art Turkish scores by 12% having a F-measure of 72% and we improve the English scores by 3% having a F-measure of 74%. Eventually, the system outperforms both MorphoChains and other well-known unsupervised morphological segmentation systems. The results indicate that candidate generation plays an important role in such an unsupervised log-linear model that is learned using contrastive estimation with negative samples.Comment: 10 pages, accepted and presented at the CICLing 2017 (18th International Conference on Intelligent Text Processing and Computational Linguistics

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer

Author: A Lacourt
AE Gelfand
B Pesch
C Tarnaud
CE Antoniak
D Consonni
D Dahl
D Luce
David I Hastie
DI Ohlssen
H Ishwaran
H Ishwaran
H Ishwaran
H Zhang
Isabelle Stücker
J Molitor
J Peto
JH Lubin
JH Lubin
JS Liu
L Breiman
L Kaufman
Lamiae Azizi
M Abrahamowicz
M Kalli
M Papathomas
M Papathomas
MD Ritchie
P Papaspiliopoulos
PJ Green
R Goel
R Peto
RF MacLehose
SC Lemon
SG Walker
Silvia Liverani
SW Thurston
Sylvia Richardson
W Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/10/2013
Field of study

A common characteristic of environmental epidemiology is the multi-dimensional aspect of exposure patterns, frequently reduced to a cumulative exposure for simplicity of analysis. By adopting a flexible Bayesian clustering approach, we explore the risk function linking exposure history to disease. This approach is applied here to study the relationship between different smoking characteristics and lung cancer in the framework of a population based case control study

Crossref

Springer - Publisher Connector

HAL-Inserm

PubMed Central

Queen Mary Research Online

Brunel University Research Archive

HAL UVSQ

Random survival forests

Author: Blackstone Eugene H.
Ishwaran Hemant
Kogalur Udaya B.
Lauer Michael S.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing survival trees are introduced, as is a new missing data algorithm for imputing missing data. A conservation-of-events principle for survival forests is introduced and used to define ensemble mortality, a simple interpretable measure of mortality that can be used as a predicted outcome. Several illustrative examples are given, including a case study of the prognostic implications of body mass for individuals with coronary artery disease. Computations for all examples were implemented using the freely available R-software package, randomSurvivalForest.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS169 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Miami: Scholarship Miami

A Recurrent Neural Network Survival Model: Predicting Web User Return Time

Author: A Graves
AG Hawkes
B Efron
DR Cox
DR Cox
DR Cox
FE Harrell
H Ishwaran
JD Kalbfleisch
JP Klein
M Han
N Breslow
R Chandra
S Hochreiter
X Cai
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2018
Field of study

The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.Comment: Accepted into ECML PKDD 2018; 8 figures and 1 tabl

arXiv.org e-Print Archive

Crossref

Siamese Survival Analysis with Competing Risks

Author: A Mayr
A Mayr
A Tsiatis
DR Cox
H Ishwaran
H Ishwaran
J Satagopan
J Wang
JANE BROMLEY
JP Fine
L Antolini
M Schmid
M Wolbers
MJ Crowder
P Lambert
PJ Heagerty
RJ Glynn
TA Gooley
Y Chen
Publication venue
Publication date: 16/08/2018
Field of study

Survival analysis in the presence of multiple possible adverse events, i.e., competing risks, is a pervasive problem in many industries (healthcare, finance, etc.). Since only one event is typically observed, the incidence of an event of interest is often obscured by other related competing events. This nonidentifiability, or inability to estimate true cause-specific survival curves from empirical data, further complicates competing risk survival analysis. We introduce Siamese Survival Prognosis Network (SSPN), a novel deep learning architecture for estimating personalized risk scores in the presence of competing risks. SSPN circumvents the nonidentifiability problem by avoiding the estimation of cause-specific survival curves and instead determines pairwise concordant time-dependent risks, where longer event times are assigned lower risks. Furthermore, SSPN is able to directly optimize an approximation to the C-discrimination index, rather than relying on well-known metrics which are unable to capture the unique requirements of survival analysis with competing risks

arXiv.org e-Print Archive

Crossref

Location Dependent Dirichlet Processes

Author: A Oliva
A Rodríguez
C Bishop
C Williams
CE Rasmussen
D Blei
D Blei
D Dunson
E Sudderth
F Zhu
H Ishwaran
J Duan
J Griffin
J Griffin
J Paisley
J Sethuraman
J Shi
L Ren
N Foti
P Orbanz
R Unnikrishnan
S Kumar
T Ferguson
X Sun
YW Teh
Publication venue
Publication date: 02/07/2017
Field of study

Dirichlet processes (DP) are widely applied in Bayesian nonparametric modeling. However, in their basic form they do not directly integrate dependency information among data arising from space and time. In this paper, we propose location dependent Dirichlet processes (LDDP) which incorporate nonparametric Gaussian processes in the DP modeling framework to model such dependencies. We develop the LDDP in the context of mixture modeling, and develop a mean field variational inference algorithm for this mixture model. The effectiveness of the proposed modeling framework is shown on an image segmentation task

arXiv.org e-Print Archive

Crossref

The dynamics of E1A in regulating networks and canonical pathways in quiescent cells

Author: A Cerezo
A Mal
A Mal
A Roulston
AJ Berk
B Ren
C Genovese
CC Tsao
Chien Nguyen
D Branzei
D Chattopadhyay
DL Miller
DW Stacey
H Cam
H Ishwaran
H Ishwaran
H Ishwaran
J Sha
JB Rayman
Jean-Eudes Dazard
Jennifer Bongorno
JH Bielas
Jingfeng Sha
Keman Zhang
KR Spindler
Linda Cai
MA Hutchens
Marian L Harter
MK Ghosh
Mrinal Ghosh
MV Frolov
Omar Yasin
P Du
P Du
P Hublitz
P Khatri
R Ferrari
RC Gentleman
SM Lin
SY Rhee
T Nouspikel
WM Liu
WS Wold
X Xu
Y Benjamini
Y Takahashi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Adenoviruses force quiescent cells to re-enter the cell cycle to replicate their DNA, and for the most part, this is accomplished after they express the E1A protein immediately after infection. In this context, E1A is believed to inactivate cellular proteins (e.g., p130) that are known to be involved in the silencing of E2F-dependent genes that are required for cell cycle entry. However, the potential perturbation of these types of genes by E1A relative to their functions in regulatory networks and canonical pathways remains poorly understood. Findings We have used DNA microarrays analyzed with Bayesian ANOVA for microarray (BAM) to assess changes in gene expression after E1A alone was introduced into quiescent cells from a regulated promoter. Approximately 2,401 genes were significantly modulated by E1A, and of these, 385 and 1033 met the criteria for generating networks and functional and canonical pathway analysis respectively, as determined by using Ingenuity Pathway Analysis software. After focusing on the highest-ranking cellular processes and regulatory networks that were responsive to E1A in quiescent cells, we observed that many of the up-regulated genes were associated with DNA replication, the cell cycle and cellular compromise. We also identified a cadre of up regulated genes with no previous connection to E1A; including genes that encode components of global DNA repair systems and DNA damage checkpoints. Among the down-regulated genes, we found that many were involved in cell signalling, cell movement, and cellular proliferation. Remarkably, a subset of these was also associated with p53-independent apoptosis, and the putative suppression of this pathway may be necessary in the viral life cycle until sufficient progeny have been produced. Conclusions These studies have identified for the first time a large number of genes that are relevant to E1A's activities in promoting quiescent cells to re-enter the cell cycle in order to create an optimum environment for adenoviral replication.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

Author: Agarwal A.
Autism
Blei D.
Bollen J.
Chang J.
Danial J. T.
Harrington J. W.
Harshavardhan A.
Higashida N.
Himelboim I.
Hutchings C.
Hviid A.
Ishwaran H.
Jacobson J. W.
Jashinsky J.
Jiang L.
Paul M. J.
Paul M. J.
Robinson B.
Russell M. A.
Scanfeld D.
Teh Y. W.
Teh Y. W.
Trembath D.
Verma S.
Warren Z.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 201

arXiv.org e-Print Archive

Deakin Research Online

Crossref

Statistical Relational Learning with Formal Ontologies

Author: C. Kiefer
H. Ishwaran
I. Davidson
L.D. Raedt
M. Richardson
N. Fanizzi
N. Fanizzi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Intervention in prediction measure: a new approach to assessing variable importance for random forests

Author: A Hapfelmeier
A Liaw
A Pierola
AL Boulesteix
AL Boulesteix
C Strobl
C Strobl
E Steyerberg
H Ishwaran
H Ishwaran
H Ishwaran
H Ishwaran
Irene Epifanio
JR Bayjanov
L Breiman
L Breiman
M Segal
MG Kendall
MP Cummings
P Wei
PGP Martin
R Development Core Team
S Janitza
S Wuchty
T Hastie
T Hothorn
T Hothorn
WG Touw
X Chen
Y Lin
Y Xiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref