Search CORE

20,375 research outputs found

Recommended from our members

The quest for a donor: probability based methods offer help

Author: Bosnes V.
Braaten O.
Cowell R.
Egeland T.
Mostad P. F.
Publication venue: Faculty of Actuarial Science & Insurance, City University London
Publication date: 01/01/2005
Field of study

When a patient in need of a stem cell transplant has no compatible donor within his or her closest family, and no matched unrelated donor can be found, a remaining option is to search within the patient’s extended family. This situation often arises when the patient is of an ethnic minority, originating from a country that lacks a well-developed stem cell donor program, and has HLA haplotypes that are rare in his or her country of residence. Searching within the extended family may be time-consuming and expensive, and tools to calculate the probability of a match within groups of untested relatives would facilitate the search. We present a general approach to calculating the probability of a match in a given relative, or group of relatives, based on the pedigree, and on knowledge of the genotypes of some of the individuals. The method extends previous approaches by allowing the pedigrees to be consanguineous and arbitrarily complex, with deviations from Hardy-Weinberg equilibrium. We show how this extension has a considerable effect on results, in particular for rare haplotypes. The methods are exemplified using freeware programs to solve a case of practical importance

City Research Online

Chalmers Research

Chalmers Publication Library

Bayesian inference from photometric redshift surveys

Author: Jasche Jens
Wandelt Benjamin D.
Publication venue: 'Wiley'
Publication date: 14/06/2011
Field of study

We show how to enhance the redshift accuracy of surveys consisting of tracers with highly uncertain positions along the line of sight. Photometric surveys with redshift uncertainty delta_z ~ 0.03 can yield final redshift uncertainties of delta_z_f ~ 0.003 in high density regions. This increased redshift precision is achieved by imposing an isotropy and 2-point correlation prior in a Bayesian analysis and is completely independent of the process that estimates the photometric redshift. As a byproduct, the method also infers the three dimensional density field, essentially super-resolving high density regions in redshift space. Our method fully takes into account the survey mask and selection function. It uses a simplified Poissonian picture of galaxy formation, relating preferred locations of galaxies to regions of higher density in the matter field. The method quantifies the remaining uncertainties in the three dimensional density field and the true radial locations of galaxies by generating samples that are constrained by the survey data. The exploration of this high dimensional, non-Gaussian joint posterior is made feasible using multiple-block Metropolis-Hastings sampling. We demonstrate the performance of our implementation on a simulation containing 2.0 x 10^7 galaxies. These results bear out the promise of Bayesian analysis for upcoming photometric large scale structure surveys with tens of millions of galaxies.Comment: 17 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

HAL-INSU

A Deep Embedding Model for Co-occurrence Learning

Author: Chen Jianshu
Deng Li
Gao Jianfeng
He Xiaodong
Jin Ruoming
Shen Yelong
Publication venue
Publication date: 04/06/2015
Field of study

Co-occurrence Data is a common and important information source in many areas, such as the word co-occurrence in the sentences, friends co-occurrence in social networks and products co-occurrence in commercial transaction data, etc, which contains rich correlation and clustering information about the items. In this paper, we study co-occurrence data using a general energy-based probabilistic model, and we analyze three different categories of energy-based model, namely, the

L_1

L_2

and

L_k

models, which are able to capture different levels of dependency in the co-occurrence data. We also discuss how several typical existing models are related to these three types of energy models, including the Fully Visible Boltzmann Machine (FVBM) (

L_2

), Matrix Factorization (

L_2

), Log-BiLinear (LBL) models (

L_2

), and the Restricted Boltzmann Machine (RBM) model (

L_k

). Then, we propose a Deep Embedding Model (DEM) (an

L_k

model) from the energy model in a \emph{principled} manner. Furthermore, motivated by the observation that the partition function in the energy model is intractable and the fact that the major objective of modeling the co-occurrence data is to predict using the conditional probability, we apply the \emph{maximum pseudo-likelihood} method to learn DEM. In consequence, the developed model and its learning method naturally avoid the above difficulties and can be easily used to compute the conditional probability in prediction. Interestingly, our method is equivalent to learning a special structured deep neural network using back-propagation and a special sampling strategy, which makes it scalable on large-scale datasets. Finally, in the experiments, we show that the DEM can achieve comparable or better results than state-of-the-art methods on datasets across several application domains

arXiv.org e-Print Archive

Crossref

Evaluating Overfit and Underfit in Models of Network Community Structure

Author: Clauset Aaron
Ghasemian Amir
Hosseinmardi Homa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

arXiv.org e-Print Archive

Crossref