15 research outputs found

    Computer Science 2019 APR Self-Study & Documents

    Get PDF
    UNM Computer Science APR self-study report and review team report for Spring 2019, fulfilling requirements of the Higher Learning Commission

    Hierarchical multi-label classification for protein function prediction going beyond traditional approaches

    Get PDF
    Hierarchical multi-label classification is a variant of traditional classification in which the instances can belong to several labels, that are in turn organized in a hierarchy. Functional classification of genes is a challenging problem in functional genomics due to several reasons. First, each gene participates in multiple biological activities. Hence, prediction models should support multi-label classification. Second, the genes are organized and classified according to a hierarchical classification scheme that represents the relationships between the functions of the genes. These relationships should be maintained by the prediction models. In addition, various bimolecular data sources, such as gene expression data and protein-protein interaction data, can be used to assign biological functions to genes. Therefore, the integration of multiple data sources is required to acquire a precise picture of the roles of the genes in the living organisms through uncovering novel biology in the form of previously unknown functional annotations. In order to address these issues, the presented work deals with the hierarchical multi-label classification. The purpose of this thesis is threefold: first, Hierarchical Multi-Label classification algorithm using Boosting classifiers, HML-Boosting, for the hierarchical multi-label classification problem in the context of gene function prediction is proposed. HML-Boosting exploits the predefined hierarchical dependencies among the classes. We demonstrate, through HML-Boosting and using two approaches for class-membership inconsistency correction during the testing phase, the top-down approach and the bottom-up approach, that the HMLBoosting algorithm outperforms the flat classifier approach. Moreover, the author proposed the HiBLADE algorithm (Hierarchical multi-label Boosting with LAbel DEpendency), a novel algorithm that takes advantage of not only the pre-established hierarchical taxonomy of the classes, but also effectively exploits the hidden correlation among the classes that is not shown through the class hierarchy, thereby improving the quality of the predictions. According to the proposed approach, first, the pre-defined hierarchical taxonomy of the labels is used to decide upon the training set for each classifier. Second, the dependencies of the children for each label in the hierarchy are captured and analyzed using Bayes method and instance-based similarity. The primary objective of the proposed algorithm is to find and share a number of base models across the correlated labels. HiBLADE is different than the conventional algorithms in two ways. First, it allows the prediction of multiple functions for genes at the same time while maintaining the hierarchy constraint. Second, the classifiers are built based on the label understudy and its most similar sibling. Experimental results on several real-world biomolecular datasets show that the proposed method can improve the performance of hierarchical multilabel classification. More important, however, is then the third part that focuses on the integration of multiple heterogeneous data sources for improving hierarchical multi-label classification. Unlike most of the previous works, which mostly consider a single data source for gene function prediction, the author explores the integration of heterogeneous data sources for genome-wide gene function prediction. The integration of multiple heterogeneous data sources is addressed with a novel Hierarchical Bayesian iNtegration algorithm, HiBiN, a general framework that uses Bayesian reasoning to integrate heterogeneous data sources for accurate gene function prediction. The system formally uses posterior probabilities to assign class memberships to samples using multiple data sources while maintaining the hierarchical constraint that governs the annotation of the genes. The author demonstrates, through HiBiN, that the integration of the diverse datasets significantly improves the classification quality for hierarchical gene function prediction in terms of several measures, compared to single-source prediction models and fused-flat model, which are the baselines compared against. Moreover, the system has been extended to include a weighting scheme to control the contributions from each data source according to its relevance to the label under-study. The results show that the new weighting scheme compares favorably with the other approach along various performance criteria

    Pacific Symposium on Biocomputing 2023

    Get PDF
    The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field

    On Pattern Mining in Graph Data to Support Decision-Making

    Get PDF
    In recent years graph data models became increasingly important in both research and industry. Their core is a generic data structure of things (vertices) and connections among those things (edges). Rich graph models such as the property graph model promise an extraordinary analytical power because relationships can be evaluated without knowledge about a domain-specific database schema. This dissertation studies the usage of graph models for data integration and data mining of business data. Although a typical company's business data implicitly describes a graph it is usually stored in multiple relational databases. Therefore, we propose the first semi-automated approach to transform data from multiple relational databases into a single graph whose vertices represent domain objects and whose edges represent their mutual relationships. This transformation is the base of our conceptual framework BIIIG (Business Intelligence with Integrated Instance Graphs). We further proposed a graph-based approach to data integration. The process is executed after the transformation. In established data mining approaches interrelated input data is mostly represented by tuples of measure values and dimension values. In the context of graphs these values must be attached to the graph structure and aggregated measure values are graph attributes. Since the latter was not supported by any existing model, we proposed the use of collections of property graphs. They act as data structure of the novel Extended Property Graph Model (EPGM). The model supports vertices and edges that may appear in different graphs as well as graph properties. Further on, we proposed some operators that benefit from this data structure, for example, graph-based aggregation of measure values. A primitive operation of graph pattern mining is frequent subgraph mining (FSM). However, existing algorithms provided no support for directed multigraphs. We extended the popular gSpan algorithm to overcome this limitation. Some patterns might not be frequent while their generalizations are. Generalized graph patterns can be mined by attaching vertices to taxonomies. We proposed a novel approach to Generalized Multidimensional Frequent Subgraph Mining (GM-FSM), in particular the first solution to generalized FSM that supports not only directed multigraphs but also multiple dimensional taxonomies. In scenarios that compare patterns of different categories, e.g., fraud or not, FSM is not sufficient since pattern frequencies may differ by category. Further on, determining all pattern frequencies without frequency pruning is not an option due to the computational complexity of FSM. Thus, we developed an FSM extension to extract patterns that are characteristic for a specific category according to a user-defined interestingness function called Characteristic Subgraph Mining (CSM). Parts of this work were done in the context of GRADOOP, a framework for distributed graph analytics. To make the primitive operation of frequent subgraph mining available to this framework, we developed Distributed In-Memory gSpan (DIMSpan), a frequent subgraph miner that is tailored to the characteristics of shared-nothing clusters and distributed dataflow systems. Finally, the results of use case evaluations in cooperation with a large scale enterprise will be presented. This includes a report of practical experiences gained in implementation and application of the proposed algorithms

    MEMS Accelerometers

    Get PDF
    Micro-electro-mechanical system (MEMS) devices are widely used for inertia, pressure, and ultrasound sensing applications. Research on integrated MEMS technology has undergone extensive development driven by the requirements of a compact footprint, low cost, and increased functionality. Accelerometers are among the most widely used sensors implemented in MEMS technology. MEMS accelerometers are showing a growing presence in almost all industries ranging from automotive to medical. A traditional MEMS accelerometer employs a proof mass suspended to springs, which displaces in response to an external acceleration. A single proof mass can be used for one- or multi-axis sensing. A variety of transduction mechanisms have been used to detect the displacement. They include capacitive, piezoelectric, thermal, tunneling, and optical mechanisms. Capacitive accelerometers are widely used due to their DC measurement interface, thermal stability, reliability, and low cost. However, they are sensitive to electromagnetic field interferences and have poor performance for high-end applications (e.g., precise attitude control for the satellite). Over the past three decades, steady progress has been made in the area of optical accelerometers for high-performance and high-sensitivity applications but several challenges are still to be tackled by researchers and engineers to fully realize opto-mechanical accelerometers, such as chip-scale integration, scaling, low bandwidth, etc

    Antioxidant and DPPH-Scavenging Activities of Compounds and Ethanolic Extract of the Leaf and Twigs of Caesalpinia bonduc L. Roxb.

    Get PDF
    Antioxidant effects of ethanolic extract of Caesalpinia bonduc and its isolated bioactive compounds were evaluated in vitro. The compounds included two new cassanediterpenes, 1α,7α-diacetoxy-5α,6β-dihydroxyl-cass-14(15)-epoxy-16,12-olide (1)and 12α-ethoxyl-1α,14β-diacetoxy-2α,5α-dihydroxyl cass-13(15)-en-16,12-olide(2); and others, bonducellin (3), 7,4’-dihydroxy-3,11-dehydrohomoisoflavanone (4), daucosterol (5), luteolin (6), quercetin-3-methyl ether (7) and kaempferol-3-O-α-L-rhamnopyranosyl-(1Ç2)-β-D-xylopyranoside (8). The antioxidant properties of the extract and compounds were assessed by the measurement of the total phenolic content, ascorbic acid content, total antioxidant capacity and 1-1-diphenyl-2-picryl hydrazyl (DPPH) and hydrogen peroxide radicals scavenging activities.Compounds 3, 6, 7 and ethanolic extract had DPPH scavenging activities with IC50 values of 186, 75, 17 and 102 μg/ml respectively when compared to vitamin C with 15 μg/ml. On the other hand, no significant results were obtained for hydrogen peroxide radical. In addition, compound 7 has the highest phenolic content of 0.81±0.01 mg/ml of gallic acid equivalent while compound 8 showed the highest total antioxidant capacity with 254.31±3.54 and 199.82±2.78 μg/ml gallic and ascorbic acid equivalent respectively. Compound 4 and ethanolic extract showed a high ascorbic acid content of 2.26±0.01 and 6.78±0.03 mg/ml respectively.The results obtained showed the antioxidant activity of the ethanolic extract of C. bonduc and deduced that this activity was mediated by its isolated bioactive compounds
    corecore