8,173 research outputs found

    Measurements over distributed high performance computing and storage systems

    Get PDF
    A strawman proposal is given for a framework for presenting a common set of metrics for supercomputers, workstations, file servers, mass storage systems, and the networks that interconnect them. Production control and database systems are also included. Though other applications and third part software systems are not addressed, it is important to measure them as well

    Exact Geosedics and Shortest Paths on Polyhedral Surface

    Full text link
    We present two algorithms for computing distances along a non-convex polyhedral surface. The first algorithm computes exact minimal-geodesic distances and the second algorithm combines these distances to compute exact shortest-path distances along the surface. Both algorithms have been extended to compute the exact minimalgeodesic paths and shortest paths. These algorithms have been implemented and validated on surfaces for which the correct solutions are known, in order to verify the accuracy and to measure the run-time performance, which is cubic or less for each algorithm. The exact-distance computations carried out by these algorithms are feasible for large-scale surfaces containing tens of thousands of vertices, and are a necessary component of near-isometric surface flattening methods that accurately transform curved manifolds into flat representations.National Institute for Biomedical Imaging and Bioengineering (R01 EB001550

    Birth/birth-death processes and their computable transition probabilities with biological applications

    Full text link
    Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationally expensive methods such as matrix exponentiation or Monte Carlo approximation, restricting likelihood-based inference to small systems, or indirect methods such as approximate Bayesian computation. In this paper, we introduce the birth(death)/birth-death process, a tractable bivariate extension of the birth-death process. We develop an efficient and robust algorithm to calculate the transition probabilities of birth(death)/birth-death processes using a continued fraction representation of their Laplace transforms. Next, we identify several exemplary models arising in molecular epidemiology, macro-parasite evolution, and infectious disease modeling that fall within this class, and demonstrate advantages of our proposed method over existing approaches to inference in these models. Notably, the ubiquitous stochastic susceptible-infectious-removed (SIR) model falls within this class, and we emphasize that computable transition probabilities newly enable direct inference of parameters in the SIR model. We also propose a very fast method for approximating the transition probabilities under the SIR model via a novel branching process simplification, and compare it to the continued fraction representation method with application to the 17th century plague in Eyam. Although the two methods produce similar maximum a posteriori estimates, the branching process approximation fails to capture the correlation structure in the joint posterior distribution

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Get PDF
    Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

    Hybrid model for vascular tree structures

    Get PDF
    This paper proposes a new representation scheme of the cerebral blood vessels. This model provides information on the semantics of the vascular structure: the topological relationships between vessels and the labeling of vascular accidents such as aneurysms and stenoses. In addition, the model keeps information of the inner surface geometry as well as of the vascular map volume properties, i.e. the tissue density, the blood flow velocity and the vessel wall elasticity. The model can be constructed automatically in a pre-process from a set of segmented MRA images. Its memory requirements are optimized on the basis of the sparseness of the vascular structure. It allows fast queries and efficient traversals and navigations. The visualizations of the vessel surface can be performed at different levels of detail. The direct rendering of the volume is fast because the model provides a natural way to skip over empty data. The paper analyzes the memory requirements of the model along with the costs of the most important operations on it.Postprint (published version

    Prediction of Emerging Technologies Based on Analysis of the U.S. Patent Citation Network

    Full text link
    The network of patents connected by citations is an evolving graph, which provides a representation of the innovation process. A patent citing another implies that the cited patent reflects a piece of previously existing knowledge that the citing patent builds upon. A methodology presented here (i) identifies actual clusters of patents: i.e. technological branches, and (ii) gives predictions about the temporal changes of the structure of the clusters. A predictor, called the {citation vector}, is defined for characterizing technological development to show how a patent cited by other patents belongs to various industrial fields. The clustering technique adopted is able to detect the new emerging recombinations, and predicts emerging new technology clusters. The predictive ability of our new method is illustrated on the example of USPTO subcategory 11, Agriculture, Food, Textiles. A cluster of patents is determined based on citation data up to 1991, which shows significant overlap of the class 442 formed at the beginning of 1997. These new tools of predictive analytics could support policy decision making processes in science and technology, and help formulate recommendations for action
    corecore