Search CORE

6 research outputs found

Data integration in biological research: an overview

Author: A Basset
A Bateman
A Brazma
A Brazma
A Chatr-aryamontri
A Kenall
A Ma’ayan
A Nekrutenko
A Treloar
AY Halevy
AY Levy
B Chandrasekaran
B Giardine
B Mlecnik
B Smith
B Smith
BM Knoppers
C Brooksbank
C Goble
C Jonquet
C Von Mering
CF Taylor
CL Myers
D Doiron
D Field
D Field
D Garijo
D Gomez-Cabrero
D Howe
D Hwang
D Smedley
D Warde-Farley
DA Benson
DA Benson
daW Huang
daW Huang
E Bravo
E Demir
EG Cerami
F Achard
F Belleau
F Cunningham
G Mayer
GA Pavlopoulos
H Barsnes
H Hermjakob
H Klech
H Thorvaldsdottir
HA Piwowar
I Letunic
I Xenarios
J Gomez
J Kohler
J Wang
J Widom
JA Blake
JA Stamatoyannopoulos
JB Bard
JJ Sweet
JP Mathew
K-H Cheung
KA Frazer
KA Gray
KD Pruitt
L Andersson
L Goldovsky
L Mabile
L Martens
L Stein
LC Crosswell
LD Parnell
M Baker
M Corpas
M Hucka
M Saleem
M Tanabe
M Yuille
MA Musen
MD Ritchie
MD Stobbe
ME Smoot
N Gehlenborg
N Gehlenborg
N Juty
N Juty
O Etzioni
P Artimo
P Shannon
PD Karp
PL Whetzel
PT Shannon
R Apweiler
R Cote
R Engels
R Haw
R Jansen
R Margolis
RD Dowell
RG Cote
S Abiteboul
S Gotz
S Higgins
S Orchard
S Orchard
S Orchard
S Orchard
S Pettifer
S Wandelt
S-A Sansone
SA Chamberlain
SA Chervitz
SA Goff
SY Chung
T Etzold
T Hubbard
T Kauppinen
T Walter
TK Attwood
TK Attwood
W Luo
W Rieping
WJ Kent
Y Nakamura
Y Zhuge
ZG Ives
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

AIRR Data Under the EU Trade Secrets Directive: Aligning Scientific Practices with Commercial Realities

Author: Minssen Timo
Sherkow Jacob S.
Publication venue: 'Edward Elgar Publishing'
Publication date: 01/01/2020
Field of study

Whether the E.U. Trade Secrets Directive sufficiently and appropriately covers cutting-edge complex technologies is of critical interest to policy-makers, scientists, and commercial developers alike. One such technology — adaptive immune receptor repertoire sequencing, or AIRR-seq — raises difficult questions concerning what information is and should be protected under the new Directive, and how to best align scientific practices with commercial realities. The ‘raw’ form AIRR-seq data — massive genetic datasets of hundreds of millions of individuals’ immune cells — tends to be freely shared among academic researchers, thus typically destroying the protectability of the underlying information. But follow-on data — essentially, information interpreting that data — is nonetheless protectable under the Directive because it is both economically valuable and not readily available from an examination of the raw data itself. Protecting this follow-on information while encouraging the free sharing of AIRR-seq data best accords the purpose of the Trade Secrets Directive. Lessons from the case of AIRR-seq data also sheds light on other puzzles concerning the tensions between disclosure and various forms of legal protections, such as the mutual exclusivity of patents and trade secrets, the sharing of clinical trial data, and protecting genetic diagnostics.Ope

Crossref

Illinois Digital Environment for Access to Learning and Scholarship Repository

Copenhagen University Research Information System

Real-time Feature Detection in Mass Spectrometer Data

Author: Hillman Chris
Publication venue
Publication date: 01/01/2018
Field of study

University of Dundee Online Publications

A Computational Framework for Host-Pathogen Protein-Protein Interactions

Author: Chen Huaming
Publication venue: School of School of Computing and Information Technology
Publication date: 01/01/2020
Field of study

Infectious diseases cause millions of illnesses and deaths every year, and raise great health concerns world widely. How to monitor and cure the infectious diseases has become a prevalent and intractable problem. Since the host-pathogen interactions are considered as the key infection processes at the molecular level for infectious diseases, there have been a large amount of researches focusing on the host-pathogen interactions towards the understanding of infection mechanisms and the development of novel therapeutic solutions. For years, the continuously development of technologies in biology has benefitted the wet lab-based experiments, such as small-scale biochemical, biophysical and genetic experiments and large-scale methods (for example yeast-two-hybrid analysis and cryogenic electron microscopy approach). As a result of past decades of efforts, there has been an exploded accumulation of biological data, which includes multi omics data, for example, the genomics data and proteomics data. Thus, an initiative review of omics data has been conducted in Chapter 2, which has exclusively demonstrated the recent update of ‘omics’ study, particularly focusing on proteomics and genomics. With the high-throughput technologies, the increasing amount of ‘omics’ data, including genomics and proteomics, has even further boosted. An upsurge of interest for data analytics in bioinformatics comes as no surprise to the researchers from a variety of disciplines. Specifically, the astonishing rate at which genomics and proteomics data are generated leads the researchers into the realm of ‘Big Data’ research. Chapter 2 is thus developed to providing an update of the omics background and the state-of-the-art developments in the omics area, with a focus on genomics data, from the perspective of big data analytics..

Research Online

A novel compression approach for mapped high-throughput sequencing data set

Author: Popitsch Niko
Publication venue
Publication date: 01/01/2012
Field of study

Eine der größten aktuellen Herausforderungen im Zusammenhang mit Hochdurchsatz-Sequenzierungsexperimenten (High-Throughput Sequencing, HTS) liegt nicht im Erzeugen der Daten selbst, sondern in deren Prozessierung, Speicherung und Übertragung. Die enorme Größe dieser Daten motiviert die Entwicklung von Datenkompressionsalgorithmen für die Realisierung der verschiedenen Datenspeicherkonzepte die auf die produzierten (Zwischen-)Ergebnisse von HTS Experimenten angewandt werden. Die vorliegende Arbeit gibt einen Überblick über das Feld der Hochdurchsatz-Nukleinsäure-Sequenzierung und in aktuelle Ansätze für die Kompression solcher Daten. Im Hauptteil der Arbeit wird NGC vorgestellt, ein Werkzeug für die Kompression von gemappten reads die im weitverbreiteten SAM Format gespeichert sind (eine Art von HTS Daten). NGC ermöglicht sowohl verlustfreie als auch verlustbehaftete Kompression und beinhaltet zwei neuartige Ideen: Erstens enthält es eine Methode zur Reduktion der erforderlichen Code-Wörter, welche gemeinsame Merkmale der reads die an dieselbe genomische Position gemappt wurden ausnützt. Zweitens beinhaltet NGC eine konfigurierbare Methode für die Quantisierung der Qualitätswerte welche deren Einfluss auf nach-gelagerte Anwendungen berücksichtigt. NGC, mit mehreren echten Datensätzen evaluiert, spart 33-66% des benötigten Speicherplatzes bei verlustfreier und bis zu 98% des benötigten Speicherplatzes bei verlustbehafteter Kompression ein. Durch die Anwendung zweier gängiger Varianten- und Genotyp-Vorhersagewerkzeuge auf die dekomprimierten Daten wird gezeigt, dass die verlustbehaftete Kompression, besser als vergleichbare Werkzeuge in manchen Konfigurationen, über 99% der gefundenen Varianten präserviert.A major challenge of current high-throughput sequencing (HTS) experiments is not only the generation of the sequencing data itself but also their processing, storage and transmission. The enormous size of these data motivates the development of data compression algorithms usable for the implementation of the various storage policies that are applied to the produced intermediate and final result files. This thesis gives a brief introduction into the field of high-throughput nucleic acid sequencing and into current approaches for the compression of the data resulting from such experiments. In the main part of the thesis, NGC, a tool for the compression of mapped read data stored in the SAM format (one kind of HTS data), is presented. NGC enables lossless and lossy compression and introduces two novel ideas: First, it contains a way to reduce the number of required code words by exploiting common features of the sequenced reads mapped to the same genomic positions; second, it contains a highly configurable way for the quantization of per-base quality values which takes their influence on downstream analyses into account. NGC, evaluated with several real-world data sets, saves 33-66% of disc space using lossless and up to 98% disc space using lossy compression. By applying two popular variant and genotype prediction tools to the decompressed data, we show that the lossy compression modes preserve over 99% of all called variants while outperforming comparable methods in some configurations

OTHES

Data Management Challenges in Next Generation Sequencing

Author: AD Smith
AD Smith
B Langmead
B Langmead
B Langmead
B Li
C Hoffa
C Olston
C Trapnell
D Antoniou
D Battré
D Warneke
D Weese
DW Mount
E Deelman
E Deelman
E Pennisi
E Rivals
EE Schadt
EE Schadt
F Sanger
G Juve
GT Chiang
H Li
H Li
I Foster
J Dean
J Goecks
LD Stein
M Duc Cao
M Holtgrewe
M Zaharia
MC Brandon
MC Schatz
N Välimäki
P Ferragina
RA Baeza-Yates
RK Bharti
S Grumbach
S Kuruppu
S Kuruppu
SF Altschul
SF Altschul
T Nguyen
T Oinn
TJ Hudson
US Department of Health and Human Services
WJ Kent
X Chen
Y Chen
Y Li
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref