41,013 research outputs found
CONCISE: Compressed 'n' Composable Integer Set
Bit arrays, or bitmaps, are used to significantly speed up set operations in
several areas, such as data warehousing, information retrieval, and data
mining, to cite a few. However, bitmaps usually use a large storage space, thus
requiring compression. Nevertheless, there is a space-time tradeoff among
compression schemes. The Word Aligned Hybrid (WAH) bitmap compression trades
some space to allow for bitwise operations without first decompressing bitmaps.
WAH has been recognized as the most efficient scheme in terms of computation
time. In this paper we present CONCISE (Compressed 'n' Composable Integer Set),
a new scheme that enjoys significatively better performances than those of WAH.
In particular, when compared to WAH, our algorithm is able to reduce the
required memory up to 50%, by having similar or better performance in terms of
computation time. Further, we show that CONCISE can be efficiently used to
manipulate bitmaps representing sets of integral numbers in lieu of well-known
data structures such as arrays, lists, hashtables, and self-balancing binary
search trees. Extensive experiments over synthetic data show the effectiveness
of our approach.Comment: Preprint submitted to Information Processing Letters, 7 page
A computer program for thermal radiation from gaseous rocket exhuast plumes (GASRAD)
A computer code is presented for predicting incident thermal radiation from defined plume gas properties in either axisymmetric or cylindrical coordinate systems. The radiation model is a statistical band model for exponential line strength distribution with Lorentz/Doppler line shapes for 5 gaseous species (H2O, CO2, CO, HCl and HF) and an appoximate (non-scattering) treatment of carbon particles. The Curtis-Godson approximation is used for inhomogeneous gases, but a subroutine is available for using Young's intuitive derivative method for H2O with Lorentz line shape and exponentially-tailed-inverse line strength distribution. The geometry model provides integration over a hemisphere with up to 6 individually oriented identical axisymmetric plumes, a single 3-D plume, Shading surfaces may be used in any of 7 shapes, and a conical limit may be defined for the plume to set individual line-of-signt limits. Intermediate coordinate systems may specified to simplify input of plumes and shading surfaces
Algebraic shortcuts for leave-one-out cross-validation in supervised network inference
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models
On empirical methodology, constraints, and hierarchy in artificial grammar learning
This paper considers the AGL literature from a psycholinguistic perspective. It first presents a taxonomy of the experimental familiarization test procedures used, which is followed by a consideration of shortcomings and potential improvements of the empirical methodology. It then turns to reconsidering the issue of grammar learning from the point of view of acquiring constraints, instead of the traditional AGL approach in terms of acquiring sets of rewrite rules. This is, in particular, a natural way of handling longâdistance dependences. The final section addresses an underdeveloped issue in the AGL literature, namely how to detect latent hierarchical structure in AGL response patterns
Mapping Wide Row Crops with Video Sequences Acquired from a Tractor Moving at Treatment Speed
This paper presents a mapping method for wide row crop fields. The resulting map shows the crop rows and weeds present in the inter-row spacing. Because field videos are acquired with a camera mounted on top of an agricultural vehicle, a method for image sequence stabilization was needed and consequently designed and developed. The proposed stabilization method uses the centers of some crop rows in the image sequence as features to be tracked, which compensates for the lateral movement (sway) of the camera and leaves the pitch unchanged. A region of interest is selected using the tracked features, and an inverse perspective technique transforms the selected region into a birdâs-eye view that is centered on the image and that enables map generation. The algorithm developed has been tested on several video sequences of different fields recorded at different times and under different lighting conditions, with good initial results. Indeed, lateral displacements of up to 66% of the inter-row spacing were suppressed through the stabilization process, and crop rows in the resulting maps appear straight
- âŠ