107 research outputs found
Unbiased taxonomic annotation of metagenomic samples
The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this article, we show that the Rand index is a better indicator of classification error than the often used area under thereceiver operating characteristic (ROC) curve andF-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time and an exact solution can be obtained by integer linear programming. Experimental results with a proof-of-concept implementation of the set cover approach to taxonomic annotation in a next release of the TANGO software show that the set cover approach further reduces ambiguity in the taxonomic annotation obtained with TANGO without distorting the relative abundance profile of the metagenomic sample.Peer ReviewedPostprint (published version
A feasibility-based algorithm for Computer Aided Molecular and Process Design of solvent-based separation systems
Computer-aided molecular and product design (CAMPD) can in principle be used to find simultaneously the optimal conditions in separation processes and the structure of the optimal solvents. In many cases, however, the solution of CAMPD problems is challenging. In this paper, we propose a solution approach for the CAMPD of solvent-based separation systems in which implicit constraints on phase behaviour in process models are used to test the feasibility of the process and solvent domains. The tests not only eliminate infeasible molecules from the search space but also infeasible combinations of solvent molecules and process conditions. The tests also provide bounds for the optimization of the process model (primal problem) for each solvent, facilitating numerical solution. This is demonstrated on a prototypical natural gas purification process
Spatially explicit species distribution models: A missed opportunity in conservation planning?
Aim: Systematic conservation planning is vital for allocating protected areas given the spatial distribution of conservation features, such as species. Due to incomplete species inventories, species distribution models (SDMs) are often used for predicting species habitat suitability and species probability of occurrence. Currently, SDMs mostly ignore spatial dependencies in species and predictor data. Here, we provide a comparative evaluation of how accounting for spatial dependencies, that is, autocorrelation, affects the delineation of optimized protected areas. Location: Southeast Australia, Southeast U.S. Continental Shelf, Danube River Basin. Methods: We employ Bayesian spatially explicit and non-spatial SDMs for terrestrial, marine and freshwater species, using realm-specific planning unit shapes (grid, hexagon and subcatchment, respectively). We then apply the software gurobi to optimize conservation plans based on species targets derived from spatial and non-spatial SDMs (10% 50% each to analyse sensitivity), and compare the delineation of the plans. Results: Across realms and irrespective of the planning unit shape, spatially explicit SDMs (a) produce on average more accurate predictions in terms of AUC, TSS, sensitivity and specificity, along with a higher species detection probability. All spatial optimizations meet the species conservation targets. Spatial conservation plans that use predictions from spatially explicit SDMs (b) are spatially substantially different compared to those that use non-spatial SDM predictions, but (c) encompass a similar amount of planning units. The overlap in the selection of planning units is smallest for conservation plans based on the lowest targets and vice versa. Main conclusions: Species distribution models are core tools in conservation planning. Not surprisingly, accounting for the spatial characteristics in SDMs has drastic impacts on the delineation of optimized conservation plans. We therefore encourage practitioners to consider spatial dependencies in conservation features to improve the spatial representation of future protected areas. © 2019 The Authors. Diversity and Distributions Published by John Wiley and Sons LtdThis study was funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 642317. SDL has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska‐Curie grant agreement No. 748625, and SCJ from the German Federal Ministry of Education and Research (BMBF) for the “GLANCE” project (Global Change Effects in River Ecosystems; 01 LN1320A). We wish to thank Gwen Iacona and two anonymous referees for their constructive comments on an earlier version of the manuscript
LibiD: Reliable identification of obfuscated third-party android libraries
Third-party libraries are vital components of Android apps, yet they can also introduce serious security threats and impede the accuracy and reliability of app analysis tasks, such as app clone detection. Several library detection approaches have been proposed to address these problems. However, we show these techniques are not robust against popular code obfuscators, such as ProGuard, which is now used in nearly half of all apps. We then present LibID, a library detection tool that is more resilient to code shrinking and package modification than state-of-the-art tools. We show that the library identification problem can be formulated using binary integer programming models. LibID is able to identify specific versions of third-party libraries in candidate apps through static analysis of app binaries coupled with a database of third-party libraries. We propose a novel approach to generate synthetic apps to tune the detection thresholds. Then, we use F-Droid apps as the ground truth to evaluate LibID under different obfuscation settings, which shows that LibID is more robust to code obfuscators than state-of-the-art tools. Finally, we demonstrate the utility of LibID by detecting the use of a vulnerable version of the OkHttp library in nearly 10% of 3,958 most popular apps on the Google Play Store.The Boeing Company, China Scholarship Council, Microsoft Researc
On sparse ensemble methods: An application to short-term predictions of the evolution of COVID-19
Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components, but at the same time it may overfit, it may be distorted by base regressors with low accuracy, and it may be too complex to understand and explain. This paper proposes and studies a novel Mathematical Optimization model to build a sparse ensemble, which trades off the accuracy of the ensemble and the number of base regressors used. The latter is controlled by means of a regularization term that penalizes regressors with a poor individual performance. Our approach is flexible to incorporate desirable properties one may have on the ensemble, such as controlling the performance of the ensemble in critical groups of records, or the costs associated with the base regressors involved in the ensemble. We illustrate our approach with real data sets arising in the COVID-19 context. (c) 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )We thank the reviewers for their thorough comments and suggestions, which have been very valuable to strengthen the quality of the paper. This research has been financed in part by research projects EC H2020 MSCA RISE NeEDS (Grant agreement ID: 822214); FQM-329 and P18-FR-2369 (Junta de Andalucia, Spain); MTM2017-89422-P (Ministerio de Economia, Industria y Competitividad, Spain); PID2019-110886RB-I00 (Ministerio de Ciencia, Innovacin y Universidades, Spain); PR2019-029 (Universidad de Cadiz, Spain); PITUFLOW-CM-UC3M (Comunidad de Madrid and Universidad Carlos III de Madrid, Spain); and EP/R00370X/1 (EPSRC, United Kingdom). This support is gratefully acknowledged
Optimized synthesis of cost-effective, controllable oil system architectures for turbofan engines
Turbofan oil systems are used to provide lubrication and cooling in the engine . There is an increasing interest in oil system architectures which utilize electric pumps and/or valves to give optimized control of flows to individual oil chambers, leading to improved thermal management of oil and lubrication efficiency. The challenges here lie in the trade-off between increasing controllability and minimizing the addition of new components, which adds unwanted production and maintenance costs. This paper formulates the oil system architecture design as a constrained, multiobjective optimization problem. An architecture is described using a graph with nodes representing components and edges representing interconnections between components. A fixed set of nodes called the architecture template is provided as an input and the edges are optimized for a multicriteria objective function. A heuristic method for determining similarities between the different oil chamber flow requirements is presented. This is used in the optimization to evaluate the controllability objective based on the structure of the valve architecture. The methodology provides benefits to system designers by selecting cheaper architectures with fewer valves when the need to control oil chambers separately is small. The effect of manipulating the cost/controllability criteria weightings is investigated to show the impact on the resulting architecture
Development of a Framework to Compare Low-Altitude Unmanned Air Traffic Management Systems
Presented at the AIAA SciTech 2021 ForumSeveral reports forecast a very high demand for Urban Air Mobility services such as package delivery and air taxi. This would lead to very dense low-altitude operations which cannot be safely accommodated by the current air traffic management system. Many different architectures for low-altitude air traffic management have been proposed in the literature, however, the lack of a common framework makes it difficult to compare strategies. The work presented here establishes efficiency, safety and capacity metrics, defines the components of an automated traffic management system architecture and introduces a preliminary framework to compare different alternatives. This common framework allows for the evaluation and comparison of different alternatives for unmanned traffic management. The framework is showcased on different strategies with different architectures. The impact of algorithmic choices and airspace architectures is evaluated. A decoupled approach to 4D trajectory planning is shown to scale poorly with agents density. The impact of segregating traffic by heading is shown to be very different depending on the algorithms and airspace access rules chosen
Optimal motion planning for automated vehicles with scheduled arrivals at intersections
We design and compare three different optimal control strategies for the motion planning of automated vehicles approaching an intersection with scheduled arrivals. The objective is to minimize a combination of energy consumption and deviation from the schedule. The strategies differ in allowed deviations. When taking only vehicles inside the control region into account, the strategy that achieves the lowest energy consumption is the less strict one, albeit at the expense of higher travel times. When traffic conditions beyond the control region are considered, no strategy is able to achieve lower energy consumption or vehicle delay than the strategy that is the most strict in keeping with the schedule. Results suggests that in high traffic situations, from a global energy consumption standpoint, it is best to have vehicles crossing the intersection as soon as possible.</p
Autonomous Vehicle Decision-Making and Monitoring based on Signal Temporal Logic and Mixed-Integer Programming
Routing and schedule simulation of a biomass energy supply chain through SimPy simulation package
- …
