224 research outputs found
Recommended from our members
Overlapping community detection in massive social networks
Massive social networks have become increasingly popular in recent years. Community detection is one of the most important techniques for the analysis of such complex networks. A community is a set of cohesive vertices that has more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. In this thesis, we propose scalable overlapping community detection algorithms that effectively identify high quality overlapping communities in various real-world networks.
We first develop an efficient overlapping community detection algorithm using a seed set expansion approach. The key idea of this algorithm is to find good seeds and then greedily expand these seeds using a personalized PageRank clustering scheme. Experimental results show that our algorithm significantly outperforms other state-of-the-art overlapping community detection methods in terms of run time, cohesiveness of communities, and ground-truth accuracy.
To develop more principled methods, we formulate the overlapping community detection problem as a non-exhaustive, overlapping graph clustering problem where clusters are allowed to overlap with each other, and some nodes are allowed to be outside of any cluster. To tackle this non-exhaustive, overlapping clustering problem, we propose a simple and intuitive objective function that captures the issues of overlap and non-exhaustiveness in a unified manner. To optimize the objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using a low-rank semidefinite programming technique. Our experimental results show that the new objective and the algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness.
We extend our non-exhaustive, overlapping clustering techniques to co-clustering where the goal is to simultaneously identify a clustering of the rows as well as the columns of a data matrix. As an example application, consider recommender systems where users have ratings on items. This can be represented by a bipartite graph where users and items are denoted by two different types of nodes, and the ratings are denoted by weighted edges between the users and the items. In this case, co-clustering would be a simultaneous clustering of users and items. We propose a new co-clustering objective function and an efficient co-clustering algorithm that is able to identify overlapping clusters as well as outliers on both types of the nodes in the bipartite graph. We show that our co-clustering algorithm is able to effectively capture the underlying co-clustering structure of the data, which results in boosting the performance of a standard one-dimensional clustering.
Finally, we study the design of parallel data-driven algorithms, which enables us to further increase the scalability of our overlapping community detection algorithms. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve a significantly superior scalability than standard PageRank implementations. The design choices affect both single-threaded performance as well as parallel scalability. The lessons learned from this study not only guide efficient implementations of many graph mining algorithms but also provide a framework for designing new scalable algorithms, especially for large-scale community detection.Computer Science
Optimal distributed energy resource coordination: a decomposition method based on distribution locational marginal costs
In this paper, we consider the day-ahead operational planning problem of a radial distribution network hosting Distributed Energy Resources (DERs) including rooftop solar and storage-like loads, such as electric vehicles. We present a novel decomposition method that is based on a centralized AC Optimal Power Flow (AC OPF) problem interacting iteratively with self-dispatching DER problems adapting to real and reactive power Distribution Locational Marginal Costs. We illustrate the applicability and tractability of the proposed method on an actual distribution feeder, while modeling the full complexity of spatiotemporal DER capabilities and preferences, and accounting for instances of non-exact AC OPF convex relaxations. We show that the proposed method achieves optimal Grid-DER coordination, by successively improving feasible AC OPF solutions, and discovers spatiotemporally varying marginal costs in distribution networks that are key to optimal DER scheduling by modeling losses, ampacity and voltage congestion, and, most importantly, dynamic asset degradation.Accepted manuscrip
Optimal distributed energy resource coordination: a decomposition method based on distribution locational marginal costs
In this paper, we consider the day-ahead operational planning problem of a radial distribution network hosting Distributed Energy Resources (DERs) including rooftop solar and storage-like loads, such as electric vehicles. We present a novel decomposition method that is based on a centralized AC Optimal Power Flow (AC OPF) problem interacting iteratively with self-dispatching DER problems adapting to real and reactive power Distribution Locational Marginal Costs. We illustrate the applicability and tractability of the proposed method on an actual distribution feeder, while modeling the full complexity of spatiotemporal DER capabilities and preferences, and accounting for instances of non-exact AC OPF convex relaxations. We show that the proposed method achieves optimal Grid-DER coordination, by successively improving feasible AC OPF solutions, and discovers spatiotemporally varying marginal costs in distribution networks that are key to optimal DER scheduling by modeling losses, ampacity and voltage congestion, and, most importantly, dynamic asset degradation.Accepted manuscrip
Statistical models with covariance constraints
Imperial Users onl
Model-based segmentation for improved activation detection in single-subject functional Magnetic Resonance Imaging studies
Functional Magnetic Resonance Imaging (fMRI) maps cerebral activation in
response to stimuli but this activation is often difficult to detect,
especially in low-signal contexts and single-subject studies. Accurate
activation detection can be guided by the fact that very few voxels are, in
reality, truly activated and that these voxels are spatially localized, but it
is challenging to incorporate both these facts. We address these twin
challenges to single-subject and low-signal fMRI by developing a
computationally feasible and methodologically sound model-based approach,
implemented in the R package MixfMRI, that bounds the a priori expected
proportion of activated voxels while also incorporating spatial context. An
added benefit of our methodology is the ability to distinguish voxels and
regions having different intensities of activation. Our suggested approach is
evaluated in realistic two- and three-dimensional simulation experiments as
well as on multiple datasets. Finally, the value of our suggested approach in
low-signal and single-subject fMRI studies is illustrated on a sports
imagination experiment that is often used to detect awareness and improve
treatment in patients in persistent vegetative state (PVS). Our ability to
reliably distinguish activation in this experiment potentially opens the door
to the adoption of fMRI as a clinical tool for the improved treatment and
therapy of PVS survivors and other patients.Comment: 20 pages, 9 figures, 1 tabl
High-performance Global Routing for Trillion-gate Systems-on-Chips.
Due to aggressive transistor scaling, modern-day CMOS circuits have continually increased in both complexity and productivity. Modern semiconductor designs have narrower and more resistive wires, thereby shifting the performance bottleneck to interconnect delay. These trends considerably impact timing closure and call for improvements in high-performance physical design tools to keep pace with the current state of IC innovation.
As leading-edge designs may incorporate tens of millions of gates, algorithm and software scalability are crucial to achieving reasonable turnaround time. Moreover, with decreasing device sizes, optimizing traditional objectives is no longer sufficient.
Our research focuses on (i) expanding the capabilities of standalone global routing, (ii) extending global routing for use in different design applications, and (iii) integrating routing within broader physical design optimizations and flows, e.g., congestion-driven
placement. Our first global router relies on integer-linear programming (ILP), and can solve fairly large problem instances to optimality. Our second iterative global router relies on Lagrangian relaxation, where we relax the routing violation constraints to allowing routing overflow at a penalty. In both approaches, our desire is to give the router the maximum degree of freedom within a specified context. Empirically, both routers produce competitive results within a reasonable amount of runtime. To improve routability, we explore the incorporation of routing with placement, where the router estimates congestion and feeds this information to the placer. In turn, the emphasis on runtime is heightened, as the router will be invoked multiple times. Empirically, our placement-and-route framework significantly improves the final solutionās routability than performing the steps sequentially. To further enhance routability-driven placement, we (i) leverage incrementality to generate fast and accurate congestion maps, and (ii) develop several techniques to relieve cell-based and layout-based congestion. To broaden the scope of routing, we integrate a global router in a chip-design flow that addresses the buffer explosion problem.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/98025/1/jinhu_1.pd
Statistical integrative omics methods for disease subtype discovery
Disease phenotyping using omics data has become a popular approach that can poten-tially lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the ļ¬rst step towards this goal. With the accumulation of massive high-throughput omics data sets, omics data integration becomes essential to improve statistical power and reproducibility. In this dissertation, two directions from sparse K-means method will be extended.
The ļ¬rst extension is a meta-analytic framework to identify novel disease subtypes when expression proļ¬les from multiple cohorts are available. The lasso regularization and meta-analysis can identify a unique set of gene features for subtype characterization. By adding pattern matching reward function, consistency of subtype signatures across studies can be achieved.
The second extension is using integrating multi-level omics datasets by incorporating prior biological knowledge using sparse overlapping group lasso approach. An algorithm using alternating direction method of multiplier (ADMM) will be applied for fast optimization.
For both topics, simulation and real applications in breast cancer and leukemia will show the superior clustering accuracy, feature selection and functional annotation. These methods will improved statistical power, prediction accuracy and reproducibility of disease subtype discovery analysis.
Contribution to public health: The proposed methods are able to identify disease subtypes from complex multi-level or multi-cohort omics data. Disease subtype deļ¬nition is essential to deliver personalized medicine, since treating diļ¬erent subtypes by its most appropriate medicine will achieve the most eļ¬ective treatment eļ¬ect and eliminate side eļ¬ect. Omics data itself can provide better deļ¬nition of disease subtypes than regular pathological approaches. By multi-level or multi-cohort omics data, we are able to gain statistical power and reproducibility, and the resulting subtype deļ¬nition is much reliable, convincing and reproducible than single study analysis
Algorithms for Inferring Multiple Microbial Networks
The interactions among the constituent members of a microbial community play a major role in determining the overall behavior of the community and the abundance levels of its members. These interactions can be modeled using a network whose nodes represent microbial taxa and edges represent pairwise interactions. A microbial network is a weighted graph that is constructed from a sample-taxa count matrix and can be used to model co-occurrences and/or interactions of the constituent members of a microbial community. The nodes in this graph represent microbial taxa and the edges represent pairwise associations amongst these taxa. A microbial network is typically constructed from a sample-taxa count matrix that is obtained by sequencing multiple biological samples and identifying taxa counts. From large-scale microbiome studies, it is evident that microbial community compositions and interactions are impacted by environmental and/or host factors. Thus, it is not unreasonable to expect that a sample-taxa matrix generated as part of a large study involving multiple environmental or clinical parameters can be associated with more than one microbial network. However, to our knowledge, microbial network inference methods proposed thus far assume that the sample-taxa matrix is associated with a single network. This dissertation addresses the scenario when the sample-taxa matrix is associated with K microbial networks and considers the computational problem of inferring K microbial networks from a given sample-taxa matrix. The contributions of this dissertation include 1) new frameworks to generate synthetic sample-taxa count data; 2)novel methods to combine mixture modeling with probabilistic graphical models to infer multiple interaction/association networks from microbial count data; 3) dealing with the compositionality aspect of microbial count data;4) extensive experiments on real and synthetic data; 5)new methods for model selection to infer the correct value of K
- ā¦