Search CORE

12,725 research outputs found

Replica analysis of overfitting in regression models for time-to-event data

Author: Barrett JE
Coolen ACC
Paga P
Perez-Vicente CJ
Publication venue: 'IOP Publishing'
Publication date: 20/07/2017
Field of study

Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox's proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.Comment: 37 pages, 9 figure

arXiv.org e-Print Archive

Crossref

King's Research Portal

Family farming in the agricultural census of 2006: the legal mark and the options for their identification.

Author: Grossi Mauro Eduardo Del
Marques Vicente P. M. de Azevedo
Publication venue
Publication date: 01/01/2010
Field of study

(Agricultura familiar no censo agropecuário 2006: o marco legal e as opções para sua identificação). Visando delimitar a agricultura familiar no Censo Agropecuário 2006, o Ministério do Desenvolvimento Agrário (MDA) e o Instituto Brasileiro de Geografia e Estatística (IBGE) elaboraram metodologia para construção de uma variável identificando os estabelecimentos agropecuários recenseados e que se ajustam ao conceito previsto na Lei n.11.326, de 24 de julho de 2006. O texto apresenta os passos metodológicos utilizados e alguns resultados

Organic Eprints

Inspection and diagnosis tests for structural safety evaluation: A case study

Author: Cunha P.
Gesta C.
Rodrigues F.
Varum H.
Vicente R.
Publication venue
Publication date: 01/01/2005
Field of study

Diagnosis and assessment of existing structures is a developing area due to the appearance of a high number of building defects, structural and non-structural deterioration and precocious loss of quality, and, consequently, lower expected durability. With the aim of verifying the viability of rehabilitation or the need to demolish an existing fifteen year old parking building, several inspections and diagnostic non-destructive and destructive testing, visual inspection, were carried out to evaluate the structural safety conditions

Repositório Institucional da Universidade de Aveiro

The underpotential deposition that should not be : Cu(1x1) on Au(111)

Author: Cuesta Angel
Leiva Ezequiel P M
Macagno Vicente A
Velez Patricio
Publication venue: 'Elsevier BV'
Publication date: 01/11/2012
Field of study

Peer reviewedPostprin

Aberdeen University Research

Crossref

Directory of Open Access Journals

Digital.CSIC

Mutual optical injection in coupled DBR laser pairs

Author: Avila
Buldu
Diez
Erzgraber
Fujiwara
Hegarty
Heil
Hohl
I. F. Lealman
I. Henning
Jiang
Kelly
L. J. Rivers
Li
Li
M. J. Adams
M. P. Vaughan
Mirasso
Mulet
Mulet
P. Cannard
Perez
Revuelta
Rogister
Tang
Tatham
Vicente
Vicente
Vicente
Vicente
Wieczorek
Wille
Zhang
Zhang
Zhang
Publication venue: 'The Optical Society'
Publication date: 02/02/2009
Field of study

We report an experimental study of nonlinear effects, characteristic of mutual optical coupling, in an ultra-short coupling regime observed in a distributed Bragg reflector laser pair fabricated on the same chip. Optical feedback is amplified via a double pass through a common onchip optical amplifier, which introduces further nonlinear phenomena. Optical coupling has been introduced via back reflection from a cleaveended fibre. The coupling may be varied in strength by varying the distance of the fibre from the output of the chip, without significantly affecting the coupling time. © 2008 Optical. Society of America

University of Essex Research Repository

Crossref

Explore Bristol Research

Sampling Twitter users for social science research: Evidence from a systematic review of the literature

Author: Vicente P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

All social media platforms can be used to conduct social science research, but Twitter is the most popular as it provides its data via several Application Programming Interfaces, which allows qualitative and quantitative research to be conducted with its members. As Twitter is a huge universe, both in number of users and amount of data, sampling is generally required when using it for research purposes. Researchers only recently began to question whether tweet-level sampling—in which the tweet is the sampling unit—should be replaced by user-level sampling—in which the user is the sampling unit. The major rationale for this shift is that tweet-level sampling does not consider the fact that some core discussants on Twitter are much more active tweeters than other less active users, thus causing a sample biased towards the more active users. The knowledge on how to select representative samples of users in the Twitterverse is still insufficient despite its relevance for reliable and valid research outcomes. This paper contributes to this topic by presenting a systematic quantitative literature review of sampling plans designed and executed in the context of social science research in Twitter, including: (1) the definition of the target populations, (2) the sampling frames used to support sample selection, (3) the sampling methods used to obtain samples of Twitter users, (4) how data is collected from Twitter users, (5) the size of the samples, and (6) how research validity is addressed. This review can be a methodological guide for professionals and academics who want to conduct social science research involving Twitter users and the Twitterverse.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL