847 research outputs found

    Block Belief Propagation for Parameter Learning in Markov Random Fields

    Full text link
    Traditional learning methods for training Markov random fields require doing inference over all variables to compute the likelihood gradient. The iteration complexity for those methods therefore scales with the size of the graphical models. In this paper, we propose \emph{block belief propagation learning} (BBPL), which uses block-coordinate updates of approximate marginals to compute approximate gradients, removing the need to compute inference on the entire graphical model. Thus, the iteration complexity of BBPL does not scale with the size of the graphs. We prove that the method converges to the same solution as that obtained by using full inference per iteration, despite these approximations, and we empirically demonstrate its scalability improvements over standard training methods.Comment: Accepted to AAAI 201

    MEMORY OPTIMIZATIONS FOR HIGH-THROUGHPUT COMPUTER SYSTEMS

    Get PDF
    The emergence of new non-volatile memory (NVM) technology and deep neural network (DNN) inferences bring challenges related to off-chip memory access. Ensuring crash consistency leads to additional memory operations and exposes memory update operations on the critical execution path. DNN inference execution on some accelerators suffers from intensive off-chip memory access. The focus of this dissertation is to tackle the issues related to off-chip memory in these high performance computing systems. The logging operations, required by the crash consistency, impose a significant performance overhead due to the extra memory access. To mitigate the persistence time of log requests, we introduce a load-aware log entry allocation scheme that allocates log requests to the address whose bank has the lightest workload. To address the problem of intra-record ordering, we propose to buffer log metadata in a non-volatile ADR buffer until the corresponding log can be removed. Moreover, the recently proposed LAD introduced unnecessary logging operations on multicore CPU. To reduce these unnecessary operations, we have devised two-stage transaction execution and virtual ADR buffers. To tackle the challenge of low response time and high computational intensity associated with DNN inferences, these computations are often executed on customized accelerators. However, data loading from off-chip memory typically takes longer than computing, thereby reducing performance in some scenarios, especially on edge devices. To address this issue, we propose an optimization of the widely adopted Weight Stationary dataflow to remove redundant accesses to IFMAP in off-chip memory by reordering the loops in the standard convolution operation. Furthermore, to enhance the off-chip memory throughput, we introduce the load-aware placement for data tiles on off-chip memory that reduces intra/inter contentions caused by concurrent accesses from multiple tiles and improves the off-chip memory device parallelism during access

    Relation Structure-Aware Heterogeneous Information Network Embedding

    Full text link
    Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification

    An Ontology-Based Artificial Intelligence Model for Medicine Side-Effect Prediction: Taking Traditional Chinese Medicine as An Example

    Get PDF
    In this work, an ontology-based model for AI-assisted medicine side-effect (SE) prediction is developed, where three main components, including the drug model, the treatment model, and the AI-assisted prediction model, of proposed model are presented. To validate the proposed model, an ANN structure is established and trained by two hundred and forty-two TCM prescriptions. These data are gathered and classified from the most famous ancient TCM book and more than one thousand SE reports, in which two ontology-based attributions, hot and cold, are introduced to evaluate whether the prescription will cause SE or not. The results preliminarily reveal that it is a relationship between the ontology-based attributions and the corresponding predicted indicator that can be learnt by AI for predicting the SE, which suggests the proposed model has a potential in AI-assisted SE prediction. However, it should be noted that, the proposed model highly depends on the sufficient clinic data, and hereby, much deeper exploration is important for enhancing the accuracy of the prediction

    Large Data Approaches to Thresholding Problems

    Full text link
    Statistical models with discontinuities have seen much use in a variety of situations, in practical fields such as statistical process control, processing gene data, and econometrics. The study of such models is usually concerned with locating the these discontinuities, which methodologically cause various issues as estimation requires nonstandard optimization problems. With the contemporary increase in computer power and memory, it becomes more relevant to view these problems in the context of very large datasets, a context which introduces further complications for estimation. In this thesis, we study two major topics in threshold estimation, with models, methodology, and results motivated by the concern towards handling big data. Our first topic focuses on the change point problem, which involves detection of the locations where a change in distribution occurs within a data sequence. A variety of methods have been proposed and studied in this area, with novel approaches in the case where the number of change points is an unknown that could be greater than 1, making exhaustive search methods infeasible. Our contribution in this problem is motivated by the principle that only the data points close to the change points are useful for their estimation while other points are extraneous. From this observation we propose a zoom in estimation method which efficiently subsamples the data for estimation while not compromising the accuracy. The resulting method runs in sublinear time, while existing methods all run in linear time or above. Furthermore, the nature of this new methodology allows us to characterize the asymptotic distribution even in the case where the number of change point parameters increases without bound, a type of result not replicated in this field. The second topic regards the change plane model, which involves a real valued signal over a multiple dimensional space with a discontinuity delineated by a hyperplane. Practically the change plane model is used to combine regression between a covariate and response variable, while performing unsupervised classification onto the covariate. As change -plane models in growing dimensions have not been studied in the literature, we confine ourselves to canonical models in this dissertation, as a first approach to these problems. in terms of details, we establish fundamental convergence and support selection properties (the latter for the high-dimensional case) and present some simulation results.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153384/1/jlnlu_1.pd

    Coordinate Descent Full Configuration Interaction for Excited States

    Full text link
    An efficient excited state method, named xCDFCI, in the configuration interaction framework, is proposed. xCDFCI extends the unconstrained nonconvex optimization problem in CDFCI to a multicolumn version, for low-lying excited states computation. The optimization problem is addressed via a tailored coordinate descent method. In each iteration, a determinant is selected based on an approximated gradient, and coefficients of all states associated with the selected determinant are updated. A deterministic compression is applied to limit memory usage. We test xCDFCI applied to H2O and N2 molecules under the cc-pVDZ basis set. For both systems, five low-lying excited states in the same symmetry sector are calculated together with the ground state. xCDFCI also produces accurate binding curves of carbon dimer in the cc-pVDZ basis with 10−210^{-2} mHa accuracy, where the ground state and four excited states in the same symmetry sector are benchmarked
    • …
    corecore