254 research outputs found

    A polynomial time algorithm for computing the area under a GDT curve

    Get PDF
    Background Progress in the field of protein three-dimensional structure prediction depends on the development of new and improved algorithms for measuring the quality of protein models. Perhaps the best descriptor of the quality of a protein model is the GDT function that maps each distance cutoff θ to the number of atoms in the protein model that can be fit under the distance θ from the corresponding atoms in the experimentally determined structure. It has long been known that the area under the graph of this function (GDT_A) can serve as a reliable, single numerical measure of the model quality. Unfortunately, while the well-known GDT_TS metric provides a crude approximation of GDT_A, no algorithm currently exists that is capable of computing accurate estimates of GDT_A. Methods We prove that GDT_A is well defined and that it can be approximated by the Riemann sums, using available methods for computing accurate (near-optimal) GDT function values. Results In contrast to the GDT_TS metric, GDT_A is neither insensitive to large nor oversensitive to small changes in model’s coordinates. Moreover, the problem of computing GDT_A is tractable. More specifically, GDT_A can be computed in cubic asymptotic time in the size of the protein model. Conclusions This paper presents the first algorithm capable of computing the near-optimal estimates of the area under the GDT function for a protein model. We believe that the techniques implemented in our algorithm will pave ways for the development of more practical and reliable procedures for estimating 3D model quality

    Digital twinning of existing reinforced concrete bridges from labelled point clusters

    Get PDF
    The automation of digital twinning for existing reinforced concrete bridges from point clouds remains an unresolved problem. Whilst current methods can automatically detect bridge objects in point clouds in the form of labelled point clusters, the fitting of accurate 3D shapes to point clusters remains largely human dependent largely. 95% of the total manual modelling time is spent on customizing shapes and fitting them correctly. The challenges exhibited in the fitting step are due to the irregular geometries of existing bridges. Existing methods can fit geometric primitives such as cuboids and cylinders to point clusters, assuming bridges are comprised of generic shapes. However, the produced geometric digital twins are too ideal to depict the real geometry of bridges. In addition, none of the existing methods have explicitly demonstrated how to evaluate the resulting Industry Foundation Classes bridge data models in terms of spatial accuracy using quantitative measurements. In this article, we tackle these challenges by delivering a slicing-based object fitting method that can generate the geometric digital twin of an existing reinforced concrete bridge from four types of labelled point cluster. The quality of the generated models is gauged using cloud-to-cloud distance-based metrics. Experiments on ten bridge point cloud datasets indicate that the method achieves an average modelling distance of 7.05 cm (while the manual method achieves 7.69 cm), and an average modelling time of 37.8 seconds. This is a huge leap over the current practice of digital twinning performed manually

    Graph Priors, Optimal Transport, and Deep Learning in Biomedical Discovery

    Get PDF
    Recent advances in biomedical data collection allows the collection of massive datasets measuring thousands of features in thousands to millions of individual cells. This data has the potential to advance our understanding of biological mechanisms at a previously impossible resolution. However, there are few methods to understand data of this scale and type. While neural networks have made tremendous progress on supervised learning problems, there is still much work to be done in making them useful for discovery in data with more difficult to represent supervision. The flexibility and expressiveness of neural networks is sometimes a hindrance in these less supervised domains, as is the case when extracting knowledge from biomedical data. One type of prior knowledge that is more common in biological data comes in the form of geometric constraints. In this thesis, we aim to leverage this geometric knowledge to create scalable and interpretable models to understand this data. Encoding geometric priors into neural network and graph models allows us to characterize the models’ solutions as they relate to the fields of graph signal processing and optimal transport. These links allow us to understand and interpret this datatype. We divide this work into three sections. The first borrows concepts from graph signal processing to construct more interpretable and performant neural networks by constraining and structuring the architecture. The second borrows from the theory of optimal transport to perform anomaly detection and trajectory inference efficiently and with theoretical guarantees. The third examines how to compare distributions over an underlying manifold, which can be used to understand how different perturbations or conditions relate. For this we design an efficient approximation of optimal transport based on diffusion over a joint cell graph. Together, these works utilize our prior understanding of the data geometry to create more useful models of the data. We apply these methods to molecular graphs, images, single-cell sequencing, and health record data

    Experiment Planning for Protein Structure Elucidation and Site-Directed Protein Recombination

    Get PDF
    In order to most effectively investigate protein structure and improve protein function, it is necessary to carefully plan appropriate experiments. The combinatorial number of possible experiment plans demands effective criteria and efficient algorithms to choose the one that is in some sense optimal. This thesis addresses experiment planning challenges in two significant applications. The first part of this thesis develops an integrated computational-experimental approach for rapid discrimination of predicted protein structure models by quantifying their consistency with relatively cheap and easy experiments (cross-linking and site-directed mutagenesis followed by stability measurement). In order to obtain the most information from noisy and sparse experimental data, rigorous Bayesian frameworks have been developed to analyze the information content. Efficient algorithms have been developed to choose the most informative, least expensive, and most robust experiments. The effectiveness of this approach has been demonstrated using existing experimental data as well as simulations, and it has been applied to discriminate predicted structure models of the pTfa chaperone protein from bacteriophage lambda. The second part of this thesis seeks to choose optimal breakpoint locations for protein engineering by site-directed recombination. In order to increase the possibility of obtaining folded and functional hybrids in protein recombination, it is necessary to retain the evolutionary relationships among amino acids that determine protein stability and functionality. A probabilistic hypergraph model has been developed to model these relationships, with edge weights representing their statistical significance derived from database and a protein family. The effectiveness of this model has been validated by showing its ability to distinguish functional hybrids from non-functional ones in existing experimental data. It has been proved to be NP-hard in general to choose the optimal breakpoint locations for recombination that minimize the total perturbation to these relationships, but exact and approximate algorithms have been developed for a number of important cases

    Efficient Monte Carlo methods for simulating diffusion-reaction processes in complex systems

    Full text link
    We briefly review the principles, mathematical bases, numerical shortcuts and applications of fast random walk (FRW) algorithms. This Monte Carlo technique allows one to simulate individual trajectories of diffusing particles in order to study various probabilistic characteristics (harmonic measure, first passage/exit time distribution, reaction rates, search times and strategies, etc.) and to solve the related partial differential equations. The adaptive character and flexibility of FRWs make them particularly efficient for simulating diffusive processes in porous, multiscale, heterogeneous, disordered or irregularly-shaped media

    New Approaches to Protein Structure Prediction

    Get PDF
    Protein structure prediction is concerned with the prediction of a protein's three dimensional structure from its amino acid sequence. Such predictions are commonly performed by searching the possible structures and evaluating each structure by using some scoring function. If it is assumed that the target protein structure resembles the structure of a known protein, the search space can be significantly reduced. Such an approach is referred to as comparative structure prediction. When such an assumption is not made, the approach is known as ab initio structure prediction. There are several difficulties in devising efficient searches or in computing the scoring function. Many of these problems have ready solutions from known mathematical methods. However, the problems that are yet unsolved have hindered structure prediction methods from more ideal predictions. The objective of this study is to present a complete framework for ab initio protein structure prediction. To achieve this, a new search strategy is proposed, and better techniques are devised for computing the known scoring functions. Some of the remaining problems in protein structure prediction are revisited. Several of them are shown to be intractable. In many of these cases, approximation methods are suggested as alternative solutions. The primary issues addressed in this thesis are concerned with local structures prediction, structure assembly or sampling, side chain packing, model comparison, and structural alignment. For brevity, we do not elaborate on these problems here; a concise introduction is given in the first section of this thesis. Results from these studies prompted the development of several programs, forming a utility suite for ab initio protein structure prediction. Due to the general usefulness of these programs, some of them are released with open source licenses to benefit the community

    Algorithms for Geometric Optimization and Enrichment in Industrialized Building Construction

    Get PDF
    The burgeoning use of industrialized building construction, coupled with advances in digital technologies, is unlocking new opportunities to improve the status quo of construction projects being over-budget, delayed and having undesirable quality. Yet there are still several objective barriers that need to be overcome in order to fully realize the full potential of these innovations. Analysis of literature and examples from industry reveal the following notable barriers: (1) geometric optimization methods need to be developed for the stricter dimensional requirements in industrialized construction, (2) methods are needed to preserve model semantics during the process of generating an updated as-built model, (3) semantic enrichment methods are required for the end-of-life stage of industrialized buildings, and (4) there is a need to develop pragmatic approaches for algorithms to ensure they achieve required computational efficiency. The common thread across these examples is the need for developing algorithms to optimize and enrich geometric models. To date, a comprehensive approach paired with pragmatic solutions remains elusive. This research fills this gap by presenting a new approach for algorithm development along with pragmatic implementations for the industrialized building construction sector. Computational algorithms are effective for driving the design, analysis, and optimization of geometric models. As such, this thesis develops new computational algorithms for design, fabrication and assembly, onsite construction, and end-of-life stages of industrialized buildings. A common theme throughout this work is the development and comparison of varied algorithmic approaches (i.e., exact vs. approximate solutions) to see which is optimal for a given process. This is implemented in the following ways. First, a probabilistic method is used to simulate the accumulation of dimensional tolerances in order to optimize geometric models during design. Second, a series of exact and approximate algorithms are used to optimize the topology of 2D panelized assemblies to minimize material use during fabrication and assembly. Third, a new approach to automatically update geometric models is developed whereby initial model semantics are preserved during the process of generating an as-built model. Finally, a series of algorithms are developed to semantically enrich geometric models to enable industrialized buildings to be disassembled and reused. The developments made in this research form a rational and pragmatic approach to addressing the existing challenges faced in industrialized building construction. Such developments are shown not only to be effective in improving the status quo in the industry (i.e., improving cost, reducing project duration, and improving quality), but also for facilitating continuous innovation in construction. By way of assessing the potential impact of this work, the proposed algorithms can reduce rework risk during fabrication and assembly (65% rework reduction in the case study for the new tolerance simulation algorithm), reduce waste during manufacturing (11% waste reduction in the case study for the new panel unfolding and nesting algorithms), improve accuracy and automation of as-built model generation (model error reduction from 50.4 mm to 5.7 mm in the case study for the new parametric BIM updating algorithms), reduce lifecycle cost for adapting industrialized buildings (15% reduction in capital costs in the computational building configurator) and reducing lifecycle impacts for reusing structural systems from industrialized buildings (between 54% to 95% reduction in average lifecycle impacts for the approach illustrated in Appendix B). From a computational standpoint, the novelty of the algorithms developed in this research can be described as follows. Complex geometric processes can be codified solely on the innate properties of geometry – that is, by parameterizing geometry and using methods such as combinatorial optimization, topology can be optimized and semantics can be automatically enriched for building assemblies. Employing the use of functional discretization (whereby continuous variable domains are converted into discrete variable domains) is shown to be highly effective for complex geometric optimization approaches. Finally, the algorithms encapsulate and balance the benefits posed by both parametric and non-parametric schemas, resulting in the ability to achieve both high representational accuracy and semantically rich information (which has previously not been achieved or demonstrated). In summary, this thesis makes several key improvements to industrialized building construction. One of the key findings is that rather than pre-emptively determining the best suited algorithm for a given process or problem, it is often more pragmatic to derive both an exact and approximate solution and then decide which is optimal to use for a given process. Generally, most tasks related to optimizing or enriching geometric models is best solved using approximate methods. To this end, this research presents a series of key techniques that can be followed to improve the temporal performance of algorithms. The new approach for developing computational algorithms and the pragmatic demonstrations for geometric optimization and enrichment are expected to bring the industry forward and solve many of the current barriers it faces

    Predicting Speech Recognition using the Speech Intelligibility Index (SII) for Cochlear Implant Users and Listeners with Normal Hearing

    Get PDF
    Although the AzBio test is well validated, has effective standardization data available, and is highly recommended for Cochlear Implant (CI) evaluation, no attempt has been made to derive a Frequency Importance Function (FIF) for its stimuli. In the first phase of this dissertation, we derived FIFs for the AzBio sentence lists using listeners with normal hearing. Traditional procedures described in studies by Studebaker and Sherbecoe (1991) were applied for this purpose. Fifteen participants with normal hearing listened to a large number of AzBio sentences that were high- and low-pass filtered under speech-spectrum shaped noise at various signal-to-noise ratios. Frequency weights for the AzBio sentences were greatest in the 1.5 to 2 kHz frequency regions as is the case with other speech materials. A cross-procedure comparison was conducted between the traditional procedure (Studebaker and Sherbecoe, 1991) and the nonlinear optimization procedure (Kates, 2013). Consecutive data analyses provided speech recognition scores for the AzBio sentences in relation to the Speech Intelligibility Index (SII). Our findings provided empirically derived FIFs for the AzBio test that can be used for future studies. It is anticipated that the accuracy of predicting SIIs for CI patients will be improved when using these derived FIFs for the AzBio test. In the second study, the SIIfor CIrecipients was calculated to investigate whether the SII is an effective tool for predicting speech perception performance in a CI population. A total of fifteen CI adults participated. The FIFs obtained from the first study were used to compute the SII in these CI listeners. The obtained SIIs were compared with predicted SIIs using a transfer function curve derived from the first study. Due to the considerably poor hearing and large individual variability in performance in the CI population, the SII failed to predict speech perception performance for this CI group. Other predictive factors that have been associated with speech perception performance were also examined using a multiple regression analysis. Gap detection thresholds and duration of deafness were found to be significant predictive factors. These predictor factors and SIIs are discussed in relation to speech perception performance in CI users

    Neuromorphic System Design and Application

    Get PDF
    With the booming of large scale data related applications, cognitive systems that leverage modern data processing technologies, e.g., machine learning and data mining, are widely used in various industry fields. These application bring challenges to conventional computer systems on both semiconductor manufacturing and computing architecture. The invention of neuromorphic computing system (NCS) is inspired by the working mechanism of human-brain. It is a promising architecture to combat the well-known memory bottleneck in Von Neumann architecture. The recent breakthrough on memristor devices and crossbar structure made an important step toward realizing a low-power, small-footprint NCS on-a-chip. However, the currently low manufacturing reliability of nano-devices and circuit level constrains, .e.g., the voltage IR-drop along metal wires and analog signal noise from the peripheral circuits, bring challenges on scalability, precision and robustness of memristor crossbar based NCS. In this dissertation, we quantitatively analyzed the robustness of memristor crossbar based NCS when considering the device process variations, signal fluctuation and IR-drop. Based on our analysis, we will explore deep understanding on hardware training methods, e.g., on-device training and off-device training. Then, new technologies, e.g., noise-eliminating training, variation-aware training and adaptive mapping, specifically designed to improve the training quality on memristor crossbar hardware will be proposed in this dissertation. A digital initialization step for hardware training is also introduced to reduce training time. The circuit level constrains will also limit the scalability of a single memristor crossbar, which will decrease the efficiency of implementation of NCS. We also leverage system reduction/compression techniques to reduce the required crossbar size for certain applications. Besides, running machine learning algorithms on embedded systems bring new security concerns to the service providers and the users. In this dissertation, we will first explore the security concerns by using examples from real applications. These examples will demonstrate how attackers can access confidential user data, replicate a sensitive data processing model without any access to model details and how expose some key features of training data by using the service as a normal user. Based on our understanding of these security concerns, we will use unique property of memristor device to build a secured NCS
    • …
    corecore