170 research outputs found

    Better quality score compression through sequence-based quality smoothing

    Get PDF
    Current NGS techniques are becoming exponentially cheaper. As a result, there is an exponential growth of genomic data unfortunately not followed by an exponential growth of storage, leading to the necessity of compression. Most of the entropy of NGS data lies in the quality values associated to each read. Those values are often more diversified than necessary. Because of that, many tools such as Quartz or GeneCodeq, try to change (smooth) quality scores in order to improve compressibility without altering the important information they carry for downstream analysis like SNP calling

    Efficient Reconciliation of Genomic Datasets of High Similarity

    Get PDF
    We apply Invertible Bloom Lookup Tables (IBLTs) to the comparison of k-mer sets originated from large DNA sequence datasets. We show that for similar datasets, IBLTs provide a more space-efficient and, at the same time, more accurate method for estimating Jaccard similarity of underlying k-mer sets, compared to MinHash which is a go-to sketching technique for efficient pairwise similarity estimation. This is achieved by combining IBLTs with k-mer sampling based on syncmers, which constitute a context-independent alternative to minimizers and provide an unbiased estimator of Jaccard similarity. A key property of our method is that involved data structures require space proportional to the difference of k-mer sets and are independent of the size of sets themselves. As another application, we show how our ideas can be applied in order to efficiently compute (an approximation of) k-mers that differ between two datasets, still using space only proportional to their number. We experimentally illustrate our results on both simulated and real data (SARS-CoV-2 and Streptococcus Pneumoniae genomes)

    Locality-preserving minimal perfect hashing of k-mers

    Get PDF
    Motivation: Minimal perfect hashing is the problem of mapping a static set of n distinct keys into the address space {1,...,n} bijectively. It is well-known that n log(2) (e) bits are necessary to specify a minimal perfect hash function (MPHF) f, when no additional knowledge of the input keys is to be used. However, it is often the case in practice that the input keys have intrinsic relationships that we can exploit to lower the bit complexity of f. For example, consider a string and the set of all its distinct k-mers as input keys: since two consecutive k-mers share an overlap of k - 1 symbols, it seems possible to beat the classic log (2)(e) bits/key barrier in this case. Moreover, we would like f to map consecutive k-mers to consecutive addresses, as to also preserve as much as possible their relationship in the codomain. This is a useful feature in practice as it guarantees a certain degree of locality of reference for f, resulting in a better evaluation time when querying consecutive k-mers.Results: Motivated by these premises, we initiate the study of a new type of locality-preserving MPHF designed for k-mers extracted consecutively from a collection of strings. We design a construction whose space usage decreases for growing k and discuss experiments with a practical implementation of the method: in practice, the functions built with our method can be several times smaller and even faster to query than the most efficient MPHFs in the literature

    Consideration of the Behaviour of a Wind Turbine Wake Using High-Fidelity CFD Simulations

    Get PDF
    During operation of a wind turbine, wake flow occurs behind the wind turbine, reducing the amount of power generation and the life of the downwind wind turbine. To understand wind turbine wake flow, Computational Fluid Dynamics (CFD) simulations were conducted using 'RIAM-COMPACT' to reproduce wind turbine wake flow. There is no significant difference in the flow field of the wind turbine wake between upwind-type and downwind-type turbines. In the 5D downstream of the wind turbine, the vertical distribution of the mainstream velocity component is almost the same regardless of the power of the inflow profile in the swept area. When the inflow wind has a wind direction change of up to 10 degrees, the wind turbine wake is quite diffuse, and its vertical distribution is in good agreement with the field measurements made by the vertical profile lidar

    Geodesy reference points within Syowa Station, Antarctica, and their local geodetic ties

    Get PDF
    In order to study geodynamics in relation to atmospheric, oceanographic and glaciological interactions on a global scale, adequate distribution of precise geodesy stations over the Earth is important. Syowa Station (69.0°S , 39.6°E ), Antarctica, serves as one of the observatories in the Southern Hemisphere. This report briefly summarizes the location coordinates of the geodetic sensors, and chronology of related activity as of 2005, based on standardized format sheets for each sensor monument. Exchange of these formatted sheets among Antarctic stations will give us a data base for reviewing and archiving geodetic activity in Antarctica. Local geodetic ties among their monument marks are updated from the results given by M. Kanao et al. (J. Geod. Soc. Jpn., 41, 357, 1995), including later surveying with improved accuracy

    Indexing k-mers in linear space for quality value compression.

    Get PDF
    Many bioinformatics tools heavily rely on [Formula: see text]-mer dictionaries to describe the composition of sequences and allow for faster reference-free algorithms or look-ups. Unfortunately, naive [Formula: see text]-mer dictionaries are very memory-inefficient, requiring very large amount of storage space to save each [Formula: see text]-mer. This problem is generally worsened by the necessity of an index for fast queries. In this work, we discuss how to build an indexed linear reference containing a set of input [Formula: see text]-mers and its application to the compression of quality scores in FASTQ files. Most of the entropies of sequencing data lie in the quality scores, and thus they are difficult to compress. Here, we present an application to improve the compressibility of quality values while preserving the information for SNP calling. We show how a dictionary of significant [Formula: see text]-mers, obtained from SNP databases and multiple genomes, can be indexed in linear space and used to improve the compression of quality value. Availability: The software is freely available at https://github.com/yhhshb/yalff

    The first year of Antarctic VLBI observations

    Get PDF
    We are undertaking a series of geodetic VLBI observations between the Syowa Station 11-m antenna in Antarctica, and the 26-m antennas in Hobart Tasmania and Hartebeesthoek South Africa. These observations are the beginning of our campaign to monitor the motion and stability of the Antarctic plate. We describe here the results of the first year\u27s observations made during the southern summer and winter of 1998. Two mutually incompatible recording systems, K4 and S2, are used. The Mitaka FX Correlator was used to correlate these data. By using software called CALC3/MSOLV, the mean position of the antenna\u27s geodetic reference point was found to be X=1766194.152±0.006m, Y=1460410.923±0.005m and Z=- 5932273.329±0.015m at the epoch of 1998.9 in the International Terrestrial Reference Frame 2000 (ITRF2000) system. From a comparison with measurements made with other space geodetic techniques we estimate that our results have typical uncertainties of no more than 2 to 3cm in each coordinate

    Clinical evaluation of barium sulfate suspensions "Barytgen HD" - Second report -

    Get PDF
    混合粒子型硫酸バリウム「バリトゲンHD」の懸濁液最適濃度について,検討を行った。懸濁液安定性は,200w/v%と190w/v%は良好であったが,180w/v%は不良で臨床使用には不向きであると思われた。臨床的評価において200w/v%と190w/v%は付着性,辺縁の描出能,胃小区描出能においては同程度であった。200w/v%に多く見られた凝集・ムラ付き,気泡は,190w/v%では少なくなった。飲み易さは,200w/v%,190w/v%とも飲み易いと評価されたが,190w/v%でより飲み易い傾向にあった。バリトゲンHDの最適懸濁液濃度は,190w/v%であると思われる。Barium sulfate suspensions in 200w/v% and 190w/v% had good stability. But in 180w/v% it had worse stability. Significant difference was not observed in coating, visualization of gastric margin and gastric area between in 200w/v% and 190w/v%. Barium sulfate suspensions in 190w/v% had less sticky coating of gastric mucosa and fewer bubbles than 200w/v%. Barium sulfate suspensions in 190w/v% was easiest density to drink. 190w/v% seems to be most adequate density in our study

    A study of x-rays protection in a hip-joint radiography examination

    Get PDF
    幼小児を含めた若年者の股関節X線撮影検査においては鉛板などで生殖腺を防護して行うのが通常である。男性の場合は生殖腺は体外に露出しているので,それを鉛板で包むようにすればある程度目的は達成される。しかし,女性の場合,生殖腺は骨盤腔内に存在するため,卵巣及び子宮を防護でき診断目的領域にかからないように鉛板を成形し,腹壁上に置いて撮影する。X斬写真上ではグリッドで散乱線を除去しているため,鉛板の陰影がくっきりと撮影され,生殖腺は完全に防護されているように見える。しかし,体内では散乱線によるかなりの被曝があるものと考えられる。そこで今回,鉛板下の散乱線量を鉛板幅及び電圧を変化させ,ファントム内各深さの散乱線量を測定した。その結果,鉛板下の散乱線量が相当量認められ,その量は深さ3~4cmでピークを形成した。鉛板幅による変化は幅が狭いほど線量は大きくなり,電圧による変化は60kVと80kVを比べると80kVの方が多くなった。これを鉛板なしの場合と比較すると,ファントム内意さが増すにしたがい増大した。したがって,臨床において鉛板がずれて再撮影をすることのないよう細心の注意が必要であると考えられた。Usually in a hip-joint radiographic examination for the youth including children, the gonads should be well protected with an appropriate lead shield etc. Since the male gonads are in the outside of the body, if covered with a lead shield, the shield can protect them. However, in the case of the female, since the gonads of exist in a pelvic cavity, the lead shield is cut to a specific pattern so that it can protect the ovaries and the womb, and it is set on the abdomen during the radiographic exposure. Since the scattered radiation on an X-ray film can be removed with the grid, the image of a lead shield is obtained clearly, and the gonads seem to be protected completely. The shield can not protect the gonads of the female from the scattered radiation, though it protects them almost completely from the primary X-rays beam. Therefore, the gonads have radioactive contamination from scattered radiation. Then, in order to estimate the amount of scattered radiation under the lead shield, the dose under the shield was measured by using a phantom in this research, changing lead shield width, the tube-voltage, and the monitoring depth of a phantom. As a results, the dose under the lead shield was observed considerably and showed the peak at the depth of 3 or 4cm. Therefore, it was thought that a careful caution was required for obviating lead shield in clinical
    corecore