7 research outputs found
Professional development and human resources management in networks
Social networks occupy more places in development of people and organizations. Confidence in institutions and social networking are different and based on referentiality in Internet. For communication in network persons choose a different strategies and behavior in LinkedIn, resources of whom may be in different degree are interesting in Human Resources Management for organizations. Members of different social groups and cultures demonstrate some differences in interaction with Russian identity native. There are gender differences behavior in networks. Participating in groups need ethical behavior and norms in social networking for professional development and communication in future
Machine Learning and Rule Mining Techniques in the Study of Gene Inactivation and RNA Interference
RNA interference (RNAi) and gene inactivation are extensively used biological terms in biomedical research. Two categories of small ribonucleic acid (RNA) molecules, viz., microRNA (miRNA) and small interfering RNA (siRNA) are central to the RNAi. There are various kinds of algorithms developed related to RNAi and gene silencing. In this book chapter, we provided a comprehensive review of various machine learning and association rule mining algorithms developed to handle different biological problems such as detection of gene signature, biomarker, gene module, potentially disordered protein, differentially methylated region and many more. We also provided a comparative study of different well-known classifiers along with other used methods. In addition, we demonstrated the brief biological information regarding the immense biological challenges for gene activation as well as their advantages, disadvantages and possible therapeutic strategies. Finally, our study helps the bioinformaticians to understand the overall immense idea in different research dimensions including several learning algorithms for the benevolent of the disease discovery
Sparse Estimation of Huge Networks with a Block-Wise Structure
Networks with a very large number of nodes appear in many application areas and pose challenges to the traditional Gaussian graphical modelling approaches. In this paper we focus on the estimation of a Gaussian graphical model when the dependence between variables has a block-wise structure. We propose a penalised likelihood estimation of the inverse covariance matrix, also called Graphical LASSO, applied to block averages of observations, and derive its asymptotic properties. Monte Carlo experiments, comparing the properties of our estimator with those of the conventional Graphical LASSO, show that the proposed approach works well in the presence of block-wise dependence structure and is also robust to possible model misspeci cation. We conclude the paper with an empirical study on economic growth and convergence of 1,088 European small regions in the years 1980 to 2012. While requiring a-priori information on the block structure, for example given by the hierarchical structure of data, our approach can be adopted for estimation and prediction using very large panel data sets. Also, it is particularly useful when there is a problem of missing values and outliers or when the focus of the analysis is on out-of-sample prediction
Joint Estimation of Sparse Networks with application to Paired Gene Expression data
We consider a method to jointly estimate sparse precision matrices and their
underlying graph structures using dependent high-dimensional datasets. We
present a penalized maximum likelihood estimator which encourages both sparsity
and similarity in the estimated precision matrices where tuning parameters are
automatically selected by controlling the expected number of false positive
edges. We also incorporate an extra step to remove edges which represent an
overestimation of triangular motifs. We conduct a simulation study to show that
the proposed methodology presents consistent results for different combinations
of sample size and dimension. Then, we apply the suggested approaches to a
high-dimensional real case study of gene expression data with samples in two
medical conditions, healthy and colon cancer tissues, to estimate a common
network of genes as well as the differentially connected genes that are
important to the disease. We find denser graph structures for healthy samples
than for tumor samples, with groups of genes interacting together in the shape
of clusters.Comment: 34 pages, 10 figures, 7 table
Statistical methods for the testing and estimation of linear dependence structures on paired high-dimensional data: application to genomic data
This thesis provides novel methodology for statistical analysis of paired high-dimensional genomic
data, with the aimto identify gene interactions specific to each group of samples as well as the gene
connections that change between the two classes of observations. An example of such groups can
be patients under two medical conditions, in which the estimation of gene interaction networks is
relevant to biologists as part of discerning gene regulatory mechanisms that control a disease process
like, for instance, cancer. We construct these interaction networks fromdata by considering the non-zero
structure of correlationmatrices, which measure linear dependence between random variables,
and their inversematrices, which are commonly known as precision matrices and determine linear
conditional dependence instead. In this regard, we study three statistical problems related to the
testing, single estimation and joint estimation of (conditional) dependence structures.
Firstly, we develop hypothesis testingmethods to assess the equality of two correlation matrices,
and also two correlation sub-matrices, corresponding to two classes of samples, and hence the equality
of the underlying gene interaction networks. We consider statistics based on the average of squares,
maximum and sum of exceedances of sample correlations, which are suitable for both independent
and paired observations. We derive the limiting distributions for the test statistics where possible
and, for practical needs, we present a permuted samples based approach to find their corresponding
non-parametric distributions.
Cases where such hypothesis testing presents enough evidence against the null hypothesis of
equality of two correlation matrices give rise to the problem of estimating two correlation (or precision)
matrices. However, before that we address the statistical problem of estimating conditional
dependence between random variables in a single class of samples when data are high-dimensional,
which is the second topic of the thesis. We study the graphical lasso method which employs an L1
penalized likelihood expression to estimate the precision matrix and its underlying non-zero graph
structure. The lasso penalization termis given by the L1 normof the precisionmatrix elements scaled
by a regularization parameter, which determines the trade-off between sparsity of the graph and fit
to the data, and its selection is our main focus of investigation. We propose several procedures to
select the regularization parameter in the graphical lasso optimization problem that rely on network
characteristics such as clustering or connectivity of the graph.
Thirdly, we address the more general problem of estimating two precision matrices that are
expected to be similar, when datasets are dependent, focusing on the particular case of paired
observations. We propose a new method to estimate these precision matrices simultaneously, a
weighted fused graphical lasso estimator. The analogous joint estimation method concerning two
regression coefficient matrices, which we call weighted fused regression lasso, is also developed in
this thesis under the same paired and high-dimensional setting. The two joint estimators maximize
penalized marginal log likelihood functions, which encourage both sparsity and similarity in the
estimated matrices, and that are solved using an alternating direction method of multipliers (ADMM)
algorithm. Sparsity and similarity of thematrices are determined by two tuning parameters and we
propose to choose them by controlling the corresponding average error rates related to the expected
number of false positive edges in the estimated conditional dependence networks.
These testing and estimation methods are implemented within the R package ldstatsHD, and are
applied to a comprehensive range of simulated data sets as well as to high-dimensional real case
studies of genomic data. We employ testing approaches with the purpose of discovering pathway
lists of genes that present significantly different correlation matrices on healthy and unhealthy (e.g.,
tumor) samples. Besides, we use hypothesis testing problems on correlation sub-matrices to reduce
the number of genes for estimation. The proposed joint estimation methods are then considered to
find gene interactions that are common between medical conditions as well as interactions that vary
in the presence of unhealthy tissues
Dynamic factorial graphical models for dynamic networks
Dynamic networks models describe a growing number of important scientific processes, from cell biology and epidemiology to sociology and finance. Estimating dynamic networks from noisy time series data is a difficult task since the number of components involved in the system is very large. As a result, the number of parameters to be estimated is typically larger than the number of observations. However, a characteristic of many real life networks is that they are sparse. For example, the molec- ular structure of genes make interactions with other components a highly-structured and, therefore, a sparse process. Penalized Gaussian graphical models have been used to estimate sparse networks. However, the literature has focussed on static networks, which lack specific temporal interpretations.
We propose a flexible collection of ANOVA-like dynamic network models, where the user can select specific time dynamics, known presence or absence of links and a particular autoregressive structure. We use undirected graphical models with block equality constraints on the parameters. This reduces the number of parameters, increases the accuracy of the estimates and makes interpretation of the results more relevant. We show that the constrained likelihood optimization problem can be solved by taking advantage of an efficient solver, LogdetPPA, developed in convex optimization. Model selection strategies can be used to select a particular model. We illustrate the flexibility of the method on both synthetic and real data