Search CORE

7 research outputs found

Privacy-preserving document similarity detection

Author: Khelik Ksenia
Publication venue: Universitetet i Agder / University of Agder
Publication date: 01/01/2011
Field of study

The document similarity detection is an important technique used in many applications. The existence of the tool that guarantees the privacy protection of the documents during the comparison will expand the area where this technique can be applied. The goal of this project is to develop a method for privacy-preserving document similarity detection capable to identify either semantically or syntactically similar documents. As the result two methods were designed, implemented, and evaluated. In the first method privacy-preserving data comparison protocol was applied for secure comparison. This original protocol was created as a part of this thesis. In the second method modified private-matching scheme was used. In both methods the Natural Language processing techniques were utilized to capture the semantic relations between documents. During the testing phase the first method was found to be too slow for the practical application. The second method, on the contrary, was rather fast and effective. It can be used for creation of the tool for detecting syntactical and semantic similarity in a privacy-preserving way

NORA - Norwegian Open Research Archives

Agder University Research Archive

Improved reference genome uncovers novel sex-linked regions in the Guppy (Poecilia reticulata)

Author: Alkan
Altschul
Andrews
Ankenbrand
Audoux
Bachtrog
Bachtrog
Bankevich
Bergero
Bissegger
Bonnie A Fraser
Buchfink
Camacho
Cameron J Weadick
Chalopin
Chang Liu
Charlesworth
Charlesworth
Charlesworth
Charlesworth
Christine Dreyer
Cotter
Danecek
Deborah Charlesworth
Dechaud
Detlef Weigel
Dor
Endler
Faber-Hammond
Felix Bemm
Fraser
Fry
Fu
García-Alcalde
Garrison
Girgis
Gordon
Gurevich
Haskins
Hatakeyama
Herpin
Hoff
Huerta-Cepas
Hughes
James R Whiting
Josephine R Paris
Kasimatis
Khelik
Kim
Kobayashi
Kondo
Krueger
Krzywinski
Kurtz
Künstner
Lajoie
Li
Li
Li
Li
Lieberman-Aiden
Lindholm
Lisachov
Liu
Lomsadze
Lubieniecki
Ma
Magurran
Margarete Hoffmann
Maria Costantini
Martin
Marçais
McKenna
Mitchell
Morris
Nanda
Nanda
Nei
Okonechnikov
Olendorf
O’Neil
Paul J Parsons
Pfeifer
Ponnikas
Purcell
Ramírez
Rice
Roberta Bergero
Schartl
Schulz
Seppey
Sharma
Smit
Song
Stanke
Stanke
Tomaszkiewicz
Tripathi
Tripathi
Verena A Kottler
Volff
Willing
Winge
Wright
Zhou
Publication venue: 'Oxford University Press (OUP)'
Publication date: 03/09/2020
Field of study

This is the author accepted manuscript. The final version is available on open access from Oxford University Press via the DOI in this recordData availability: Population genomics data are available on ENA: Study: PRJEB10680 PCR-free data are available on ENA: Study PRJEB36450 Genome assembly is available on ENA ID: PRJEB36704; ERP119926 All scripts and pipelines are available on github: https://github.com/bfrasercommits/guppy_genomeTheory predicts that the sexes can achieve greater fitness if loci with sexually antagonistic polymorphisms become linked to the sex determining loci, and this can favour the spread of reduced recombination around sex determining regions. Given that sex-linked regions are frequently repetitive and highly heterozygous, few complete Y chromosome assemblies are available to test these ideas. The guppy system (Poecilia reticulata) has long been invoked as an example of sex chromosome formation resulting from sexual conflict. Early genetics studies revealed that male colour patterning genes are mostly but not entirely Y-linked, and that X-linkage may be most common in low predation populations. More recent population genomic studies of guppies have reached varying conclusions about the size and placement of the Y-linked region. However, this previous work used a reference genome assembled from short-read sequences from a female guppy. Here, we present a new guppy reference genome assembly from a male, using long-read PacBio single-molecule real-time sequencing (SMRT) and chromosome contact information. Our new assembly sequences across repeat- and GC-rich regions and thus closes gaps and corrects mis-assemblies found in the short-read female-derived guppy genome. Using this improved reference genome, we then employed broad population sampling to detect sex differences across the genome. We identified two small regions that showed consistent male-specific signals. Moreover, our results help reconcile the contradictory conclusions put forth by past population genomic studies of the guppy sex chromosome. Our results are consistent with a small Y-specific region and rare recombination in male guppies.Max Planck SocietyEuropean Research Council (ERC)Natural Environment Research Council (NERC

Crossref

Edinburgh Research Explorer

Open Research Exeter

MPG.PuRe

NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences

Author: Khelik Ksenia
Lagesen Karin
Nederbragt Alexander J
Rognes Torbjørn
Sandve Geir K
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. Results We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. Conclusions We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license

Directory of Open Access Journals

NORA - Norwegian Open Research Archives

Additional file 1: of NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences

Author: Alexander Nederbragt (3664882)
Geir Sandve (4034936)
Karin Lagesen (234312)
Ksenia Khelik (4244317)
TorbjĂ¸rn Rognes (4244320)
Publication venue
Publication date
Field of study

Figure S1. Reference fragments placement order depending on query fragment orientations during detection of local differences. Figure S2. Circular genome alignment alternatives. Figure S3. Number of differences in each category obtained by NucDiff with the default parameter settings for all assemblers. Figure S4. Comparison of multiple assemblies against one reference using NucDiff. Figure S5. Examples of detection of long deletions located in all assemblies at the same place in the reference sequence. Table S1. Alignment fragmentation cases caused by simple differences. Table S2. Genome modifications implemented during the simulation process. Table S3. List of E. coli genomes usedÂ in the Comparison of genomes from different strains of the same speciesÂ section. Table S4. Parameter values used for each parameter settings. Table S5. Correspondence between the QUAST difference types and the simulated difference types. Table S6. Correspondence between the QUAST, dnadiff and NucDiff difference types and the expected difference types. (PDF 989 kb

FigShare

NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences

Author: A Gurevich
Alexander Johan Nederbragt
AV Zimin
DR Zerbino
Geir Kjetil Sandve
H Thorvaldsdóttir
JH Choi
JR Miller
JT Robinson
JT Simpson
JT Simpson
Karin Lagesen
Ksenia Khelik
M Blanchette
NA Belal
R Engels
S Kurtz
T Magoc
Torbjørn Rognes
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Clinicopathologic, hemodynamic, and echocardiographic effects of short-term oral administration of anti-inflammatory doses of prednisolone to systemically normal cats

Author: Belanger MC
Darren J. Berger
Imal A. Khelik
Jean-Sébastien Palerme
Jessica L. Ward
Jonathan P. Mochel
Lombard CW
Middleton DJ
Nakamoto H
Ortega TM
Plumb DC
Reece W
Smith SA
Stockham SL
Suzuki H
Wendy A. Ware
Yeon-Jung Seo
Publication venue: 'American Veterinary Medical Association (AVMA)'
Publication date
Field of study

Crossref

High-throughput long paired-end sequencing of a Fosmid library by PacBio

Author: A Adey
AC Mak
AM Wenger
AR Quinlan
AS Mikheyev
BJ Clavijo
C Camacho
CCYR Wu
CL Peichel
CM Carvalho
D Bovee
D Deamer
DE Jarvis
E Karlsson
E Tuzun
ES Lander
EW Loomis
F Sanger
FH Lu
G Marcais
GQ Zhang
H Bayley
H Chen
H Jo
H Li
H Li
H Lin
H Shizuya
HQ Ling
I Maccallum
International Human Genome Sequencing C
J Eid
J Safar
J Shendure
J Zhang
Jiadong Li
JL Bennetzen
JM Belton
JM Kidd
JN Burton
JS Baxter
K Khelik
KF Au
KHY Wong
KJ Travers
L Clarke
LE Vissers
LJ Williams
M Boetzer
M Luo
M Rosa-Garrido
MD Adams
ME Talkowski
Meizhong Luo
MJ Levene
N Chen
NM Springer
O Dudchenko
O Wang
OA Hampton
PhDHWP Zirui Dong
Q Liu
R Avni
RH Waterston
S Guo
S Koren
SH Johnson
Sha Tang
Tong Li
TP Michael
UJ Kim
W Chen
W Shen
X Shi
X Wang
X Wei
Xianmin Diao
Y Jiao
Y Pan
Y Yang
Yonglong Pan
Z Dong
Z Dong
Z Zhang
ZG Wei
Zhaozhao Dai
Zhifei Han
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref