Search CORE

1,665 research outputs found

Random Graph Generator for Bipartite Networks Modeling

Author: Chojnacki Szymon
Kłopotek Mieczysław
Publication venue
Publication date: 02/11/2010
Field of study

The purpose of this article is to introduce a new iterative algorithm with properties resembling real life bipartite graphs. The algorithm enables us to generate wide range of random bigraphs, which features are determined by a set of parameters.We adapt the advances of last decade in unipartite complex networks modeling to the bigraph setting. This data structure can be observed in several situations. However, only a few datasets are freely available to test the algorithms (e.g. community detection, influential nodes identification, information retrieval) which operate on such data. Therefore, artificial datasets are needed to enhance development and testing of the algorithms. We are particularly interested in applying the generator to the analysis of recommender systems. Therefore, we focus on two characteristics that, besides simple statistics, are in our opinion responsible for the performance of neighborhood based collaborative filtering algorithms. The features are node degree distribution and local clustering coeficient

arXiv.org e-Print Archive

CiteSeerX

Soft clustering analysis of galaxy morphologies: A worked example with SDSS

Author: Abazajian
Baldry
Ball
Bamford
Croton
Fukugita
Huertas-Company
Huertas-Company
Kelly
Kelly
Lahav
Lahav
M. Bartelmann
Massey
Melchior
Melchior
Melchior
Naim
P. Melchior
R. Andrae
Redner
Richards
Réfrégier
Storrie-Lombardi
Strateva
Publication venue: 'EDP Sciences'
Publication date: 01/01/2010
Field of study

Context: The huge and still rapidly growing amount of galaxies in modern sky surveys raises the need of an automated and objective classification method. Unsupervised learning algorithms are of particular interest, since they discover classes automatically. Aims: We briefly discuss the pitfalls of oversimplified classification methods and outline an alternative approach called "clustering analysis". Methods: We categorise different classification methods according to their capabilities. Based on this categorisation, we present a probabilistic classification algorithm that automatically detects the optimal classes preferred by the data. We explore the reliability of this algorithm in systematic tests. Using a small sample of bright galaxies from the SDSS, we demonstrate the performance of this algorithm in practice. We are able to disentangle the problems of classification and parametrisation of galaxy morphologies in this case. Results: We give physical arguments that a probabilistic classification scheme is necessary. The algorithm we present produces reasonable morphological classes and object-to-class assignments without any prior assumptions. Conclusions: There are sophisticated automated classification algorithms that meet all necessary requirements, but a lot of work is still needed on the interpretation of the results.Comment: 18 pages, 19 figures, 2 tables, submitted to A

arXiv.org e-Print Archive

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Topics in social network analysis and network science

Author: O'Malley A. James
Onnela Jukka-Pekka
Publication venue
Publication date: 31/03/2014
Field of study

This chapter introduces statistical methods used in the analysis of social networks and in the rapidly evolving parallel-field of network science. Although several instances of social network analysis in health services research have appeared recently, the majority involve only the most basic methods and thus scratch the surface of what might be accomplished. Cutting-edge methods using relevant examples and illustrations in health services research are provided

arXiv.org e-Print Archive

CiteSeerX

FreePSI: an alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome.

Author: Jiang Tao
Ma Shining
Wang Dongfang
Zeng Jianyang
Zhou Jianyu
Publication venue: eScholarship, University of California
Publication date: 09/11/2017
Field of study

Alternative splicing plays an important role in many cellular processes of eukaryotic organisms. The exon-inclusion ratio, also known as percent spliced in, is often regarded as one of the most effective measures of alternative splicing events. The existing methods for estimating exon-inclusion ratios at the genome scale all require the existence of a reference transcriptome. In this paper, we propose an alignment-free method, FreePSI, to perform genome-wide estimation of exon-inclusion ratios from RNA-Seq data without relying on the guidance of a reference transcriptome. It uses a novel probabilistic generative model based on k-mer profiles to quantify the exon-inclusion ratios at the genome scale and an efficient expectation-maximization algorithm based on a divide-and-conquer strategy and ultrafast conjugate gradient projection descent method to solve the model. We compare FreePSI with the existing methods on simulated and real RNA-seq data in terms of both accuracy and efficiency and show that it is able to achieve very good performance even though a reference transcriptome is not provided. Our results suggest that FreePSI may have important applications in performing alternative splicing analysis for organisms that do not have quality reference transcriptomes. FreePSI is implemented in C++ and freely available to the public on GitHub

Crossref

eScholarship - University of California