Search CORE

441 research outputs found

Principal Component Analysis and Radiative Transfer modelling of Spitzer IRS Spectra of Ultra Luminous Infrared Galaxies

Author: Efstathiou Andreas
Farrah Duncan
Hurley Peter D
Oliver Seb
Wang Lingyu
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

The mid-infrared spectra of ultraluminous infrared galaxies (ULIRGs) contain a variety of spectral features that can be used as diagnostics to characterise the spectra. However, such diagnostics are biased by our prior prejudices on the origin of the features. Moreover, by using only part of the spectrum they do not utilise the full information content of the spectra. Blind statistical techniques such as principal component analysis (PCA) consider the whole spectrum, find correlated features and separate them out into distinct components. We further investigate the principal components (PCs) of ULIRGs derived in Wang et al.(2011). We quantitatively show that five PCs is optimal for describing the IRS spectra. These five components (PC1-PC5) and the mean spectrum provide a template basis set that reproduces spectra of all z<0.35 ULIRGs within the noise. For comparison, the spectra are also modelled with a combination of radiative transfer models of both starbursts and the dusty torus surrounding active galactic nuclei. The five PCs typically provide better fits than the models. We argue that the radiative transfer models require a colder dust component and have difficulty in modelling strong PAH features. Aided by the models we also interpret the physical processes that the principal components represent. The third principal component is shown to indicate the nature of the dominant power source, while PC1 is related to the inclination of the AGN torus. Finally, we use the 5 PCs to define a new classification scheme using 5D Gaussian mixtures modelling and trained on widely used optical classifications. The five PCs, average spectra for the four classifications and the code to classify objects are made available at: http://www.phys.susx.ac.uk/~pdh21/PCA/Comment: 11 pages, 12 figures, accepted for publication in MNRA

arXiv.org e-Print Archive

Sussex Research Online

Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy

Author: Hurley Peter Donald
Mehta Anil
Oliver Seb
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Patient registry data are commonly collected as annual snapshots that need to be amalgamated to understand the longitudinal progress of each patient. However, patient identifiers can either change or may not be available for legal reasons when longitudinal data are collated from patients living in different countries. Here, we apply astronomical statistical matching techniques to link individual patient records that can be used where identifiers are absent or to validate uncertain identifiers. We adopt a Bayesian model framework used for probabilistically linking records in astronomy. We adapt this and validate it across blinded, annually collected data. This is a high-quality (Danish) sub-set of data held in the European Cystic Fibrosis Society Patient Registry (ECFSPR). Our initial experiments achieved a precision of 0.990 at a recall value of 0.987. However, detailed investigation of the discrepancies uncovered typing errors in 27 of the identifiers in the original Danish sub-set. After fixing these errors to create a new gold standard our algorithm correctly linked individual records across years achieving a precision of 0.997 at a recall value of 0.987 without recourse to identifiers. Our Bayesian framework provides the probability of whether a pair of records belong to the same patient. Unlike other record linkage approaches, our algorithm can also use physical models, such as body mass index curves, as prior information for record linkage. We have shown our framework can create longitudinal samples where none existed and validate pre-existing patient identifiers. We have demonstrated that in this specific case this automated approach is better than the existing identifiers

Directory of Open Access Journals

University of Dundee Online Publications

Sussex Research Online

Intersensory integration and reading : a theory / IREC Papers Vol. 1, No. 2

Author: Hurley Oliver Leon
Publication venue: Urbana, IL. : Institute for Research on Exceptional Children, University of Illinois at Urbana-Champaign,
Publication date: 01/01/1966
Field of study

Includes bibliographic references (p. 34-37)

Illinois Digital Environment for Access to Learning and Scholarship Repository

De-blending Deep Herschel Surveys: A Multi-wavelength Approach

Author: Burgarella D.
Hurley P. D.
Oliver S. J.
Pearson W. J.
van der Tak F. F. S.
Wang L.
Publication venue: 'EDP Sciences'
Publication date: 01/01/2017
Field of study

Cosmological surveys in the far infrared are known to suffer from confusion. The Bayesian de-blending tool, XID+, currently provides one of the best ways to de-confuse deep Herschel SPIRE images, using a flat flux density prior. This work is to demonstrate that existing multi-wavelength data sets can be exploited to improve XID+ by providing an informed prior, resulting in more accurate and precise extracted flux densities. Photometric data for galaxies in the COSMOS field were used to constrain spectral energy distributions (SEDs) using the fitting tool CIGALE. These SEDs were used to create Gaussian prior estimates in the SPIRE bands for XID+. The multi-wavelength photometry and the extracted SPIRE flux densities were run through CIGALE again to allow us to compare the performance of the two priors. Inferred ALMA flux densities (F

^i

), at 870

\mu

m and 1250

\mu

m, from the best fitting SEDs from the second CIGALE run were compared with measured ALMA flux densities (F

^m

) as an independent performance validation. Similar validations were conducted with the SED modelling and fitting tool MAGPHYS and modified black body functions to test for model dependency. We demonstrate a clear improvement in agreement between the flux densities extracted with XID+ and existing data at other wavelengths when using the new informed Gaussian prior over the original uninformed prior. The residuals between F

^m

and F

^i

were calculated. For the Gaussian prior, these residuals, expressed as a multiple of the ALMA error (

\sigma

), have a smaller standard deviation, 7.95

\sigma

for the Gaussian prior compared to 12.21

\sigma

for the flat prior, reduced mean, 1.83

\sigma

compared to 3.44

\sigma

, and have reduced skew to positive values, 7.97 compared to 11.50. These results were determined to not be significantly model dependent. This results in statistically more reliable SPIRE flux densities.Comment: 8 pages, 7 figures, 3 tables. Accepted for publication in A&

arXiv.org e-Print Archive

Proceedings - University of Groningen

Crossref

University of Groningen

EDP Sciences OAI-PMH repository (1.2.0)

HAL AMU

ARTS repository - University of Groningen

HAL-INSU

Dissertations of the University of Groningen

Extreme star formation events in quasar hosts over ${\bf0.5<\textit{z}<4}$

Author: Clarke Charlotte
Farrah Duncan
Feltre Anna
Harris Kathryn
Hatziminaoglou Evanthia
Hurley Peter
Oliver Sebastian
Page Mathew
Pitchford Lura K.
Wang Lingyu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

We explore the relationship between active galactic nuclei and star formation in a sample of 513 optically luminous type 1 quasars up to redshifts of

\sim

4 hosting extremely high star formation rates (SFRs). The quasars are selected to be individually detected by the \textit{Herschel} SPIRE instrument at

>

\sigma

at 250

\mu

m, leading to typical SFRs of order of 1000 M

_{\odot}

^{-1}

. We find the average SFRs to increase by almost a factor 10 from

z\sim0.5

z\sim3

, mirroring the rise in the comoving SFR density over the same epoch. However, we find that the SFRs remain approximately constant with increasing accretion luminosity for accretion luminosities above 10

^{12}

_{\odot}

. We also find that the SFRs do not correlate with black hole mass. Both of these results are most plausibly explained by the existence of a self-regulation process by the starburst at high SFRs, which controls SFRs on time-scales comparable to or shorter than the AGN or starburst duty cycles. We additionally find that SFRs do not depend on Eddington ratio at any redshift, consistent with no relation between SFR and black hole growth rate per unit black hole mass. Finally, we find that high-ionisation broad absorption line (HiBAL) quasars have indistinguishable far-infrared properties to those of classical quasars, consistent with HiBAL quasars being normal quasars observed along a particular line of sight, with the outflows in HiBAL quasars not having any measurable effect on the star formation in their hosts.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

Crossref

UCL Discovery

HAL-INSU

Recommended from our members

Can the use of Bayesian analysis methods correct for incompleteness in electronic health records diagnosis data? Development of a novel method using simulated and real-life clinical data

Author: Bremner Stephen
Cassell Jackie
Ford Elizabeth
Hurley Peter
Oliver Seb
Rooney Philip
Publication venue: 'Frontiers Media SA'
Publication date: 05/03/2020
Field of study

Background Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias towards the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterised by under-diagnosis. Methods Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors. Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic. Results Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis. Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR. Conclusions The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrated the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data

Sussex Research Online

Identifying undetected dementia in UK primary care patients: a retrospective case-control study comparing machine-learning and standard epidemiological approaches

Author: Banerjee Sube
Cassell Jackie
Ford Elizabeth
Hoile Richard
Hurley Peter
Oliver Seb
Rooney Philip
van Marwijk Harm
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/12/2019
Field of study

Background Identifying dementia early in time, using real world data, is a public health challenge. As only two-thirds of people with dementia now ultimately receive a formal diagnosis in United Kingdom health systems and many receive it late in the disease process, there is ample room for improvement. The policy of the UK government and National Health Service (NHS) is to increase rates of timely dementia diagnosis. We used data from general practice (GP) patient records to create a machine-learning model to identify patients who have or who are developing dementia, but are currently undetected as having the condition by the GP. Methods We used electronic patient records from Clinical Practice Research Datalink (CPRD). Using a case-control design, we selected patients aged >65y with a diagnosis of dementia (cases) and matched them 1:1 by sex and age to patients with no evidence of dementia (controls). We developed a list of 70 clinical entities related to the onset of dementia and recorded in the 5 years before diagnosis. After creating binary features, we trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, support vector machines, random forest and neural networks). We examined the most important features contributing to discrimination. Results The final analysis included data on 93,120 patients, with a median age of 82.6 years; 64.8% were female. The naïve Bayes model performed least well. The logistic regression, support vector machine, neural network and random forest performed very similarly with an AUROC of 0.74. The top features retained in the logistic regression model were disorientation and wandering, behaviour change, schizophrenia, self-neglect, and difficulty managing. Conclusions Our model could aid GPs or health service planners with the early detection of dementia. Future work could improve the model by exploring the longitudinal nature of patient data and modelling decline in function over time

Plymouth Electronic Archive and Research Library

Sussex Research Online

Learning the fundamental mid-infrared spectral components of galaxies with non-negative matrix factorization

Author: Akaike
Alonso-Herrero
Alonso-Herrero
Armus
Berné
Blanton
Brandl
Bromley
Calzetti
Calzetti
Chiar
Connolly
D. Farrah
Dale
Davoodi
Engelbracht
Farrah
Farrah
Farrah
Farrah
Feroz
Genzel
H. W. W. Spoon
Hernán-Caballero
Houck
Hurley
Kennicutt
Kessler
Laurent
Lebouteiller
Lee
Lee
Lutz
Madden
Mason
Moore
Mor
P. D. Hurley
Pan
Peeters
Peeters
Petric
Pope
Rigopoulou
Rosenberg
Roussel
Roussel
Rovilos
S. Oliver
Sajina
Sanders
Schwarz
Schweitzer
Skilling
Smith
Spoon
Sturm
Sturm
Taghizadeh-Popp
Thornley
V. Lebouteiller
Valiante
Veilleux
Wang
Werner
Wu
Zafeiriou
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/10/2013
Field of study

The mid-infrared (MIR) spectra observed with the Spitzer Infrared Spectrograph (IRS) provide a valuable data set for untangling the physical processes and conditions within galaxies. This paper presents the first attempt to blindly learn fundamental spectral components of MIR galaxy spectra, using non-negative matrix factorization (NMF). NMF is a recently developed multivariate technique shown to be successful in blind source separation problems. Unlike the more popular multivariate analysis technique, principal component analysis, NMF imposes the condition that weights and spectral components are non-negative. This more closely resembles the physical process of emission in the MIR, resulting in physically intuitive components. By applying NMF to galaxy spectra in the Cornell Atlas of Spitzer/IRS sources, we find similar components amongst different NMF sets. These similar components include two for active galactic nucleus (AGN) emission and one for star formation. The first AGN component is dominated by fine structure emission lines and hot dust, the second by broad silicate emission at 10 and 18 μm. The star formation component contains all the polycyclic aromatic hydrocarbon features and molecular hydrogen lines. Other components include rising continuums at longer wavelengths, indicative of colder grey-body dust emission. We show an NMF set with seven components can reconstruct the general spectral shape of a wide variety of objects, though struggle to fit the varying strength of emission lines. We also show that the seven components can be used to separate out different types of objects. We model this separation with Gaussian mixtures modelling and use the result to provide a classification tool. We also show that the NMF components can be used to separate out the emission from AGN and star formation regions and define a new star formation/AGN diagnostic which is consistent with all MIR diagnostics already in use but has the advantage that it can be applied to MIR spectra with low signal-to-noise ratio or with limited spectral range. The seven NMF components and code for classification are available at https://github.com/pdh21/NMF_software/

arXiv.org e-Print Archive

Sussex Research Online

Hal-Diderot

Recommended from our members

An empirical, Bayesian approach to modelling crop yield: Maize in USA

Author: Bacon James
Barrett Adam B
Bartlett Myles
Duivenvoorden Steven
Hurley Peter
Kent Chris
Oliver Seb
Pope Edward
Quadrianto Novi
Rooney Phil
Shirley Raphael
Publication venue: 'IOP Publishing'
Publication date: 29/01/2020
Field of study

We apply an empirical, data-driven approach for describing crop yield as a function of monthly temperature and precipitation by employing generative probabilistic models with parameters determined through Bayesian inference. Our approach is applied to state-scale maize yield and meteorological data for the US Corn Belt from 1981 to 2014 as an exemplar, but would be readily transferable to other crops, locations and spatial scales. Experimentation with a number of models shows that maize growth rates can be characterised by a two-dimensional Gaussian function of temperature and precipitation with monthly contributions accumulated over the growing period. This approach accounts for non-linear growth responses to the individual meteorological variables, and allows for interactions between them. Our models correctly identify that temperature and precipitation have the largest impact on yield in the six months prior to the harvest, in agreement with the typical growing season for US maize (April to September). Maximal growth rates occur for monthly mean temperature 18 °C–19 °C, corresponding to a daily maximum temperature of 24 °C–25 °C (in broad agreement with previous work) and monthly total precipitation 115 mm. Our approach also provides a self-consistent way of investigating climate change impacts on current US maize varieties in the absence of adaptation measures. Keeping precipitation and growing area fixed, a temperature increase of 2 °C, relative to 1981–2014, results in the mean yield decreasing by 8%, while the yield variance increases by a factor of around 3. We thus provide a flexible, data-driven framework for exploring the impacts of natural climate variability and climate change on globally significant crops based on their observed behaviour. In concert with other approaches, this can help inform the development of adaptation strategies that will ensure food security under a changing climate

Sussex Research Online

Main sequence of star forming galaxies beyond the Herschel confusion limit

Author: Buat V
Burgarella D
Farrah D
Hurley P D
Malek K
Oliver S J
Pearson W J
Smith D J B
van der Tak F F S
Wang L
Publication venue: 'EDP Sciences'
Publication date: 11/04/2018
Field of study

Context. Deep far-infrared (FIR) cosmological surveys are known to be affected by confusion, causing issues when examining the main sequence of star forming galaxies (MS). In the past this has typically been partially tackled by the use of stacking. However, stacking only provides the average properties of the objects in the stack. Aims. This work aims to trace the MS over 0.2 ≤ z < 6.0 using the latest de-blended Herschel photometry, which reaches ≈ 10 times deeper than the 5σ confusion limit in SPIRE. This provides more reliable star formation rates (SFRs), especially for the fainter galaxies, and hence a more reliable MS. Methods. We built a pipeline that uses the spectral energy distribution (SED) modelling and fitting tool CIGALE to generate flux density priors in the Herschel SPIRE bands. These priors where then fed into the de-blending tool XID+ to extract flux densities from the SPIRE maps. In the final step, multi-wavelength data were combined with the extracted SPIRE flux densities to constrain SEDs and provide stellar mass (M☉) and SFRs. These M☉ and SFRs were then used to populate the SFR-M☉ plane over 0.2 ≤ z < 6.0. Results. No significant evidence of a high-mass turn-over was found, resulting in the best fit being a simple two-parameter power law of the form log(SFR) = α(log(M☉) - 10:5] + β. The normalisation of the power law increased with redshift, rapidly at z ≲ 1.8, from 0.58 ± 0.09 at z ≈ 0:37 to 1.31 ± 0.08 at z ≈ 1.8. The slope was also found to increase with redshift, perhaps with an excess around 1.8 ≤ z < 2.9. Conclusions. The increasing slope indicates that galaxies become more self-similar as redshift increases. This implies that high-mass galaxies’ specific SFR increases with redshift, from 0.2 to 6.0, becoming closer to that of low-mass galaxies. The excess in the slope at 1.8 ≤ z < 2.9, if present, coincides with the peak of the cosmic star formation history

University of Groningen

EDP Sciences OAI-PMH repository (1.2.0)

HAL AMU

Sussex Research Online

arXiv.org e-Print Archive

Proceedings - University of Groningen

Crossref

ARTS repository - University of Groningen

HAL-INSU

University of Hertfordshire Research Archive

Dissertations of the University of Groningen