3,010 research outputs found

    SUBIC: A Supervised Bi-Clustering Approach for Precision Medicine

    Full text link
    Traditional medicine typically applies one-size-fits-all treatment for the entire patient population whereas precision medicine develops tailored treatment schemes for different patient subgroups. The fact that some factors may be more significant for a specific patient subgroup motivates clinicians and medical researchers to develop new approaches to subgroup detection and analysis, which is an effective strategy to personalize treatment. In this study, we propose a novel patient subgroup detection method, called Supervised Biclustring (SUBIC) using convex optimization and apply our approach to detect patient subgroups and prioritize risk factors for hypertension (HTN) in a vulnerable demographic subgroup (African-American). Our approach not only finds patient subgroups with guidance of a clinically relevant target variable but also identifies and prioritizes risk factors by pursuing sparsity of the input variables and encouraging similarity among the input variables and between the input and target variable

    Different approaches to community detection

    Full text link
    A precise definition of what constitutes a community in networks has remained elusive. Consequently, network scientists have compared community detection algorithms on benchmark networks with a particular form of community structure and classified them based on the mathematical techniques they employ. However, this comparison can be misleading because apparent similarities in their mathematical machinery can disguise different reasons for why we would want to employ community detection in the first place. Here we provide a focused review of these different motivations that underpin community detection. This problem-driven classification is useful in applied network science, where it is important to select an appropriate algorithm for the given purpose. Moreover, highlighting the different approaches to community detection also delineates the many lines of research and points out open directions and avenues for future research.Comment: 14 pages, 2 figures. Written as a chapter for forthcoming Advances in network clustering and blockmodeling, and based on an extended version of The many facets of community detection in complex networks, Appl. Netw. Sci. 2: 4 (2017) by the same author

    Unsupervised Algorithms for Microarray Sample Stratification

    Get PDF
    The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.Peer reviewe

    Characterization of complex networks: A survey of measurements

    Full text link
    Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of measurements capable of expressing the most relevant topological features. This article presents a survey of such measurements. It includes general considerations about complex network characterization, a brief review of the principal models, and the presentation of the main existing measurements. Important related issues covered in this work comprise the representation of the evolution of complex networks in terms of trajectories in several measurement spaces, the analysis of the correlations between some of the most traditional measurements, perturbation analysis, as well as the use of multivariate statistics for feature selection and network classification. Depending on the network and the analysis task one has in mind, a specific set of features may be chosen. It is hoped that the present survey will help the proper application and interpretation of measurements.Comment: A working manuscript with 78 pages, 32 figures. Suggestions of measurements for inclusion are welcomed by the author

    The stability of a graph partition: A dynamics-based framework for community detection

    Full text link
    Recent years have seen a surge of interest in the analysis of complex networks, facilitated by the availability of relational data and the increasingly powerful computational resources that can be employed for their analysis. Naturally, the study of real-world systems leads to highly complex networks and a current challenge is to extract intelligible, simplified descriptions from the network in terms of relevant subgraphs, which can provide insight into the structure and function of the overall system. Sparked by seminal work by Newman and Girvan, an interesting line of research has been devoted to investigating modular community structure in networks, revitalising the classic problem of graph partitioning. However, modular or community structure in networks has notoriously evaded rigorous definition. The most accepted notion of community is perhaps that of a group of elements which exhibit a stronger level of interaction within themselves than with the elements outside the community. This concept has resulted in a plethora of computational methods and heuristics for community detection. Nevertheless a firm theoretical understanding of most of these methods, in terms of how they operate and what they are supposed to detect, is still lacking to date. Here, we will develop a dynamical perspective towards community detection enabling us to define a measure named the stability of a graph partition. It will be shown that a number of previously ad-hoc defined heuristics for community detection can be seen as particular cases of our method providing us with a dynamic reinterpretation of those measures. Our dynamics-based approach thus serves as a unifying framework to gain a deeper understanding of different aspects and problems associated with community detection and allows us to propose new dynamically-inspired criteria for community structure.Comment: 3 figures; published as book chapte

    Using Feature Selection with Machine Learning for Generation of Insurance Insights

    Get PDF
    Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector via the selection of relevant features. The experimental results, based on five publicly available real insurance datasets, show the importance of applying feature selection for the removal of noisy features before performing machine learning techniques, to allow the algorithm to focus on influential features. An additional business benefit is the revelation of the most and least important features in the datasets. These insights can prove useful for decision making and strategy development in areas/business problems that are not limited to the direct target of the downstream algorithms. In our experiments, machine learning techniques based on a set of selected features suggested by feature selection algorithms outperformed the full feature set for a set of real insurance datasets. Specifically, 20% and 50% of features in our five datasets had improved downstream clustering and classification performance when compared to whole datasets. This indicates the potential for feature selection in the insurance sector to both improve model performance and to highlight influential features for business insights
    • …
    corecore