1,197 research outputs found
Parameter estimators of random intersection graphs with thinned communities
This paper studies a statistical network model generated by a large number of
randomly sized overlapping communities, where any pair of nodes sharing a
community is linked with probability via the community. In the special case
with the model reduces to a random intersection graph which is known to
generate high levels of transitivity also in the sparse context. The parameter
adds a degree of freedom and leads to a parsimonious and analytically
tractable network model with tunable density, transitivity, and degree
fluctuations. We prove that the parameters of this model can be consistently
estimated in the large and sparse limiting regime using moment estimators based
on partially observed densities of links, 2-stars, and triangles.Comment: 15 page
A Fast Counting Method for 6-motifs with Low Connectivity
A -motif (or graphlet) is a subgraph on nodes in a graph or network.
Counting of motifs in complex networks has been a well-studied problem in
network analysis of various real-word graphs arising from the study of social
networks and bioinformatics. In particular, the triangle counting problem has
received much attention due to its significance in understanding the behavior
of social networks. Similarly, subgraphs with more than 3 nodes have received
much attention recently. While there have been successful methods developed on
this problem, most of the existing algorithms are not scalable to large
networks with millions of nodes and edges.
The main contribution of this paper is a preliminary study that genaralizes
the exact counting algorithm provided by Pinar, Seshadhri and Vishal to a
collection of 6-motifs. This method uses the counts of motifs with smaller size
to obtain the counts of 6-motifs with low connecivity, that is, containing a
cut-vertex or a cut-edge. Therefore, it circumvents the combinatorial explosion
that naturally arises when counting subgraphs in large networks
The detection of the imprint of filaments on cosmic microwave background lensing
Galaxy redshift surveys, such as 2dF, SDSS, 6df, GAMA and VIPERS, have shown
that the spatial distribution of matter forms a rich web, known as the cosmic
web. The majority of galaxy survey analyses measure the amplitude of galaxy
clustering as a function of scale, ignoring information beyond a small number
of summary statistics. Since the matter density field becomes highly
non-Gaussian as structure evolves under gravity, we expect other statistical
descriptions of the field to provide us with additional information. One way to
study the non-Gaussianity is to study filaments, which evolve non-linearly from
the initial density fluctuations produced in the primordial Universe. In our
study, we report the first detection of CMB (Cosmic Microwave Background)
lensing by filaments and we apply a null test to confirm our detection.
Furthermore, we propose a phenomenological model to interpret the detected
signal and we measure how filaments trace the matter distribution on large
scales through filament bias, which we measure to be around 1.5. Our study
provides a new scope to understand the environmental dependence of galaxy
formation. In the future, the joint analysis of lensing and Sunyaev-Zel'dovich
observations might reveal the properties of `missing baryons', the vast
majority of the gas which resides in the intergalactic medium and has so far
evaded most observations
Construction and Random Generation of Hypergraphs with Prescribed Degree and Dimension Sequences
We propose algorithms for construction and random generation of hypergraphs
without loops and with prescribed degree and dimension sequences. The objective
is to provide a starting point for as well as an alternative to Markov chain
Monte Carlo approaches. Our algorithms leverage the transposition of properties
and algorithms devised for matrices constituted of zeros and ones with
prescribed row- and column-sums to hypergraphs. The construction algorithm
extends the applicability of Markov chain Monte Carlo approaches when the
initial hypergraph is not provided. The random generation algorithm allows the
development of a self-normalised importance sampling estimator for hypergraph
properties such as the average clustering coefficient.We prove the correctness
of the proposed algorithms. We also prove that the random generation algorithm
generates any hypergraph following the prescribed degree and dimension
sequences with a non-zero probability. We empirically and comparatively
evaluate the effectiveness and efficiency of the random generation algorithm.
Experiments show that the random generation algorithm provides stable and
accurate estimates of average clustering coefficient, and also demonstrates a
better effective sample size in comparison with the Markov chain Monte Carlo
approaches.Comment: 21 pages, 3 figure
Opportunities and priorities for breast surgical research
The Breast Cancer Campaign Gap analysis (2013) established breast cancer research priorities without specific focus on surgical research nor the role of surgeons. The majority of breast cancer patients encounter a surgeon at diagnosis or during treatment, thus surgical involvement in design and delivery of high-quality research to improve patient care is critical. This review aims to identify opportunities and priorities for breast surgical research to complement the previous gap analysis
The diagnosis of colorectal cancer in patients with symptoms: finding a needle in a haystack
Patients often see primary care physicians with symptoms that might signal colorectal cancer but are also common in adults without cancer. Physicians and patients must then make a difficult decision about whether and how aggressively to evaluate the symptom. Favoring referral is that missed diagnoses lead to unnecessary testing, prolonged uncertainty, and continuing symptoms; also, the physician will suffer chagrin. It is not clear that diagnostic delay leads to progression to a more advanced stage. Against referral is that proper evaluation includes colonoscopy, with attendant inconvenience, discomfort, cost, and risk. The article by Hamilton et al, published this month in BMC Medicine, provides strong estimates of the predictive value of the various symptoms and signs of colorectal cancer and show how much higher predictive values are with increasing age and male sex. Unfortunately, their results also make clear that most colorectal cancers present with symptoms with low predictive values, < 1.2%. Models that include a set of predictive variables, that is, risk factors, age, sex, screening history, and symptoms, have been developed to guide primary prevention and clinical decision-making and are more powerful than individual symptoms and signs alone. Although screening for colorectal cancer is increasing in many countries, cancers will still be found outside screening programs so primary care physicians will remain at the front line in the difficult task of distinguishing everyday symptoms from life-threatening cancer
Effectiveness and cost of recruitment strategies for a community-based randomised controlled trial among rainwater drinkers
<p>Abstract</p> <p>Background</p> <p>Community-based recruitment is challenging particularly if the sampling frame is not easily defined as in the case of people who drink rainwater. Strategies for contacting participants must be carefully considered to maximise generalisability and minimise bias of the results. This paper assesses the recruitment strategies for a 1-year double-blinded randomised trial on drinking untreated rainwater. The effectiveness of the recruitment strategies and associated costs are described.</p> <p>Methods</p> <p>Community recruitment of households from Adelaide, Australia occurred from February to July 2007 using four methods: electoral roll mail-out, approaches to schools and community groups, newspaper advertising, and other media involvement. Word of mouth communication was also assessed.</p> <p>Results</p> <p>A total of 810 callers were screened, with 53.5% eligible. Of those who were eligible and sent further information, 76.7% were willing to participate in the study and 75.1% were enrolled. The target for recruitment was 300 households, and this was achieved. The mail-out was the most effective method with respect to number of households randomised, while recruitment via schools had the highest yield (57.3%) and was the most cost effective when considering cost per household randomised (AUD$147.20). Yield and cost effectiveness were lowest for media advertising.</p> <p>Conclusion</p> <p>The use of electoral roll mail-out and advertising via schools were effective in reaching households using untreated rainwater for drinking. Employing multiple strategies enabled success in achieving the recruitment target. In countries where electoral roll extracts are available to researchers, this method is likely to have a high yield for recruitment into community-based epidemiological studies.</p
Prevalence and dynamics of ribosomal DNA micro-heterogeneity are linked to population history in two contrasting yeast species
Despite the considerable number and taxonomic breadth of past and current genome sequencing projects, many of which necessarily encompass the ribosomal DNA, detailed information on the prevalence and evolutionary significance of sequence variation in this ubiquitous genomic region are severely lacking. Here, we attempt to address this issue in two closely related yet contrasting yeast species, the baker's yeast Saccharomyces cerevisiae and the wild yeast Saccharomyces paradoxus. By drawing on existing datasets from the Saccharomyces Genome Resequencing Project, we identify a rich seam of ribosomal DNA sequence variation, characterising 1,068 and 970 polymorphisms in 34 S. cerevisiae and 26 S. paradoxus strains respectively. We discover the two species sets exhibit distinct mutational profiles. Furthermore, we show for the first time that unresolved rDNA sequence variation resulting from imperfect concerted evolution of the ribosomal DNA region follows a U-shaped allele frequency distribution in each species, similar to loci that evolve under non-concerted mechanisms but arising through rather different evolutionary processes. Finally, we link differences between the shapes of these allele frequency distributions to the two species' contrasting population histories
The Paired Availability Design for Historical Controls
BACKGROUND: Although a randomized trial represents the most rigorous method of evaluating a medical intervention, some interventions would be extremely difficult to evaluate using this study design. One alternative, an observational cohort study, can give biased results if it is not possible to adjust for all relevant risk factors. METHODS: A recently developed and less well-known alternative is the paired availability design for historical controls. The paired availability design requires at least 10 hospitals or medical centers in which there is a change in the availability of the medical intervention. The statistical analysis involves a weighted average of a simple "before" versus "after" comparison from each hospital or medical center that adjusts for the change in availability. RESULTS: We expanded requirements for the paired availability design to yield valid inference. (1) The hospitals or medical centers serve a stable population. (2) Other aspects of patient management remain constant over time. (3) Criteria for outcome evaluation are constant over time. (4) Patient preferences for the medical intervention are constant over time. (5) For hospitals where the intervention was available in the "before" group, a change in availability in the "after group" does not change the effect of the intervention on outcome. CONCLUSION: The paired availability design has promise for evaluating medical versus surgical interventions, in which it is difficult to recruit patients to a randomized trial
BioTorrents: A File Sharing Service for Scientific Data
The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at http://www.biotorrents.net
- …