Search CORE

20 research outputs found

Summary of the 1st ICSSP-ICGSE Joint Event

Author: Britto Ricardo
Clarke Paul
Huang Liguo
Raffo David
Steinmacher Igor
Tell Paolo
Tuzun Eray
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2020
Field of study

The IT University of Copenhagen's Repository

Population Stratification of a Common APOBEC Gene Deletion Polymorphism

Author: Eray Tuzun
Evan E Eichler
Gilean A. T McVean
Jeffrey M Kidd
Rajinder Kaul
Tera L Newman
Publication venue: Public Library of Science
Publication date: 01/04/2007
Field of study

The APOBEC3 gene family plays a role in innate cellular immunity inhibiting retroviral infection, hepatitis B virus propagation, and the retrotransposition of endogenous elements. We present a detailed sequence and population genetic analysis of a 29.5-kb common human deletion polymorphism that removes the APOBEC3B gene. We developed a PCR-based genotyping assay, characterized 1,277 human diversity samples, and found that the frequency of the deletion allele varies significantly among major continental groups (global F (ST) = 0.2843). The deletion is rare in Africans and Europeans (frequency of 0.9% and 6%), more common in East Asians and Amerindians (36.9% and 57.7%), and almost fixed in Oceanic populations (92.9%). Despite a worldwide frequency of 22.5%, analysis of data from the International HapMap Project reveals that no single existing tag single nucleotide polymorphism may serve as a surrogate for the deletion variant, emphasizing that without careful analysis its phenotypic impact may be overlooked in association studies. Application of haplotype-based tests for selection revealed potential pitfalls in the direct application of existing methods to the analysis of genomic structural variation. These data emphasize the importance of directly genotyping structural variation in association studies and of accurately resolving variant breakpoints before proceeding with more detailed population-genetic analysis

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Non-alignment comparison of human and high primate genomes

Author: Alkan Can
Bailey Jeffrey A.
Eichler Evan E.
Green Eric D.
Liu Ge
Program NISC Comparative Sequencing
Sahinalp S. Cenk
Tuzun Eray
Zhao Shaying
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/03/2003
Field of study

Compositional spectra (CS) analysis based on k-mer scoring of DNA sequences was employed in this study for dot-plot comparison of human and primate genomes. The detection of extended conserved synteny regions was based on continuous fuzzy similarity rather than on chains of discrete anchors (genes or highly conserved noncoding elements). In addition to the high correspondence found in the comparisons of whole-genome sequences, a good similarity was also found after masking gene sequences, indicating that CS analysis manages to reveal phylogenetic signal in the organization of noncoding part of the genome sequences, including repetitive DNA and the genome "dark matter". Obviously, the possibility to reveal parallel ordering depends on the signal of common ancestor sequence organization varying locally along the corresponding segments of the compared genomes. We explored two sources contributing to this signal: sequence composition (GC content) and sequence organization (abundances of k-mers in the usual A,T,G,C or purine-pyrimidine alphabets). Whole-genome comparisons based on GC distribution along the analyzed sequences indeed gives reasonable results, but combining it with k-mer abundances dramatically improves the ordering quality, indicating that compositional and organizational heterogeneity comprise complementary sources of information on evolutionary conserved similarity of genome sequences

arXiv.org e-Print Archive

Crossref

PubMed Central

A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

Author: Aghamohammadi Alireza
Ahmadabadi Matin Nili
Aktas Ethem Utku
Alam Omar
Albrecht Ella
Aldaeej Abdullah
Amit Idan
Bossenmaier Tim
Chahal Kuljit Kaur
Chakroborti Debasish
Colomo-Palacios Ricardo
Davis James
Davis Willard
Eismann Simon
Erbel Johannes
Fard Fatemeh
Ghaleb Taher Ahmed
Henley Austin Z.
Herbold Steffen
Hoy Nathaniel
Kourtzanidis Stratos
Ledel Benjamin
Lenarduzzi Valentina
Madeja Matej
Makedonski Philip
Malavolta Ivano
Marcilio Diego
Nagaria Bhaveet
Pashchenko Ivan
Qin Yihao
Rodríguez-Pérez Gema
Serebrenik Alexander
Shamasbi Simin Maleki
Singh Paramvir
Spieker Helge
Strüber Daniel
Sulir Matus
Szabados Kristof
Trautsch Alexander
Treude Christoph
Turhan Burak
Tuzun Eray
Verdecchia Roberto
Walunj Vijay
Wang Shangwen
Wickert Anna-Katharina
Wu Hongjun
Wyrich Marvin
Publication venue
Publication date: 01/01/2021
Field of study

Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.Comment: Status: Accepted at Empirical Software Engineerin

arXiv.org e-Print Archive

University of Oulu Repository - Jultika

Monash University Research Portal

Erratum: Corrigendum: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution

Author: Abril Josep F
Aerts Jan
Andersson Leif
Antonarakis Stylianos E
Arakawa Hiroshi
Axelsson Erik
Backström Niclas
Bao Zhirong
Beattie Kevin J
Bejerano Gill
Berlin Sofia
Betran Esther
Bezzubov Yuri
Birney Ewan
Boardman Paul E
Bonfield James K
Bork Peer
Bourque Guillaume
Brandstrom Mikael
Brent Michael R
Brown William R A
Buerstedde Jean-Marie
Burt Peer
Bye Jacqueline M
Caldwell Randolph B
Camara Francisco
Castellano Sergi
Castelo Robert
Chatterjee Sourav
Cheng Ze
Chiaromonte Francesca
Chinwalla Asif
Cliften Paul F
Clifton Sandra W
Croning Michael D R
Crooijmans Richard P M A
Daniels Laura
Davies Robert M
de Jong Pieter J
Delany Mary E
Delehaunty Kimberly D
Dewey Colin
Dickens Nicholas J
Dodgson Jerry B
Dupanloup Isabelle
Eichler Evan
Ellegren Hans
Elnitski Laura
Emerson J J
Eswara Pallavi
Eyras Eduardo
Fillon Valerie
Flicek Paul
Francis Matthew D
Fronick Catrina
Fulton Lucinda A.
Fulton Robert S
Furey Terrence S
Goodstadt Leo
Gordon Laurie
Grafham Darren V
Graves Tina A
Griffin Darren K.
Griffiths-Jones Sam
Groenen Martien A M
Guigo Roderic
Hardison Ross C
Harris Robert S
Harte Rachel A
Hatzigeorgiou Artemis G
Haussler David
He Jianbin
Hillier LaDeana W
Hinrichs Angie S
Hoffman Michael M
Hubbard Simon J
Huckle Elizabeth J
Humphray Sean J
Inoko Hidetoshi
International Chicken Genome Sequencing Consortium
Ivarie Robert
Jacobbson Lina
Kaessmann Henrik
Kaufman Jim
Kent W James
Kerje Susanne
Kierzek Andrzej M
King David C
Koriabine Maxim
Kouranov Andrei
Kremitzki Colin
Law Andy S
Layman Dan
Letunic Ivica
Liu Bin
Long Manyuan
Lucas Susan
Magrini Vincent
Makova Kateryna
Mardis Elaine R.
Masabanda Julio S
McLaren Stuart R
McPherson John D
Miller Marcia M
Miller Webb
Miner Tracie L
Minx Patrick
Morrice David
Mourelatos Zissimos
Nash William E
Nefedov Mikhail
Nekrutenko Anton
Nelson Joanne O
Nhan Michael N
Oddy Lachlan G
Ovcharenko Ivan
Overton Ian M
Pachter Lior
Parra Genis
Paterson Andrew H
Paton Bob
Pevzner Pavel A.
Pohl Craig S
Ponting Chris P
Pourquie Olivier
Radakrishnan Anusha
Randall-Maher Jennifer
Raney Brian
Reymond Alexandre
Rijnkels Monique
Robertson Lindsay
Rogers Jane
Romanov Michael N
Rondelli Catherine M
Salomonsen Jan
Scott Carol E
Searle Stephen M J
Severin Jessica
Shiina Takashi
Shteynberg David D
Siepel Adam
Skjoedt Karsten
Smit Arian F. A.
Smith Jacqueline
Smith Scott M
Speed David
Stubbs Lisa
Suyama Mikita
Taylor James
Taylor Ruth G
Tempest Helen G
Tesler Glenn
Tickle Cheryll
Torrents David
Tuzun Eray
Tyekucheva Svitlana
Ucla Catherine
Ureta-Vidal Abe
van der Poel Jan J
Vignal Alain
von Mering Christian
Waddington David
Wallis John W
Wang Jian
Wang Jun
Warren Wesley
Webber Caleb
Webster Matthew T
Wilson Richard K.
Wilson Stuart A
Wong Gane Ka-Shu
Yang Huanming
Yang Shan
Yang Shiaw-Pyng
Yu Jun
Zdobnov Evgeny M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/02/2005
Field of study

International Chicken Genome Sequencing Consortium. The Original Article was published on 09 December 2004. Nature432, 695–716 (2004). In Table 5 of this Article, the last four values listed in the ‘Copy number’ column were incorrect. These should be: LTR elements, 30,000; DNA transposons, 20,000; simple repeats, 140,000; and satellites, 4,000. These errors do not affect any of the conclusions in our paper. Additional information. The online version of the original article can be found at 10.1038/nature0315

Kent Academic Repository

Understanding the knowledge gaps of software engineers: An empirical analysis based on SWEBOK

Author: Garousi Vahid
Giray Gorkem
Tuzun Eray
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Context: Knowledge level and productivity of the software engineering (SE) workforce are the subject of regular discussions among practitioners, educators, and researchers. There have been many efforts to measure and improve the knowledge gap between SE education and industrial needs. Objective: Although the existing efforts for aligning SE education and industrial needs have provided valuable insights, there is a need for analyzing the SE topics in a more “fine-grained” manner; i.e., knowing that SE university graduates should know more about requirements engineering is important, but it is more valuable to know the exact topics of requirements engineering that are most important in the industry. Method: We achieve the above objective by assessing the knowledge gaps of software engineers by designing and executing an opinion survey on levels of knowledge learned in universities versus skills needed in industry. We designed the survey by using the SE knowledge areas (KAs) from the latest version of the Software Engineering Body of Knowledge (SWEBOK v3), which classifies the SE knowledge into 12 KAs, which are themselves broken down into 67 subareas (sub-KAs) in total. Our analysis is based on (opinion) data gathered from 129 practitioners, who are mostly based in Turkey. Results: Based on our findings, we recommend that educators should include more materials on software maintenance, software configuration management, and testing in their SE curriculum. Based on the literature as well as the current trends in industry, we provide actionable suggestions to improve SE curriculum to decrease the knowledge gap

Queen's University Belfast Research Portal

Bilkent University Institutional Repository

Recent Segmental Duplications in the Working Draft Assembly of the Brown Norway Rat

Author: Bailey Jeffrey A.
Eichler Evan E.
Tuzun Eray
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/04/2004
Field of study

We assessed the content, structure, and distribution of segmental duplications (≥90% sequence identity, ≥5 kb length) within the published version of the Rattus norvegicus genome assembly (v.3.1). The overall fraction of duplicated sequence within the rat assembly (2.92%) is greater than that of the mouse (1%–1.2%) but significantly less than that of human (∼5%). Duplications were nonuniformly distributed, occurring predominantly as tandem and tightly clustered intrachromosomal duplications. Regions containing extensive interchromosomal duplications were observed, particularly within subtelomeric and pericentromeric regions. We identified 41 discrete genomic regions greater than 1 Mb in size, termed “duplication blocks.” These appear to have been the target of extensive duplication over millions of years of evolution. Gene content within duplicated regions (∼1%) was lower than expected based on the genome representation. Interestingly, sequence contigs lacking chromosome assignment (“the unplaced chromosome”) showed a marked enrichment for segmental duplication (45% of 75.2 Mb), indicating that segmental duplications have been problematic for sequence and assembly of the rat genome. Further targeted efforts are required to resolve the organization and complexity of these regions

Crossref

PubMed Central

Closing the Gap Between Software Engineering Education and Industrial Needs

Author: Catal Cagatay
Felderer Michael
Garousi Vahid
Giray Görkem
Tuzun Eray
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

According to different reports, many recent software engineering graduates often face difficulties when beginning their professional careers, due to misalignment of the skills learnt in their university education with what is needed in industry. To address that need, many studies have been conducted to align software engineering education with industry needs. To synthesize that body of knowledge, we present in this paper a systematic literature review (SLR) which summarizes the findings of 33 studies in this area. By doing a meta-analysis of all those studies and using data from 12 countries and over 4,000 data points, this study will enable educators and hiring managers to adapt their education / hiring efforts to best prepare the software engineering workforce.</p

Queen's University Belfast Research Portal

Bilkent University Institutional Repository

Wageningen University & Research Publications

Using Continuous Integration and Automated Test Techniques for a Robust C4ISR System

Author: Baykal Buyurman
Biyikli Emrah
Gelirli Erdogan
Tuzun Eray
Yuksel H. Mehmet
Publication venue
Publication date: 16/09/2009
Field of study

We have used Cl (Continuous Integration) and various software testing techniques to achieve a robust C4ISR (Command, Control, Communications, Computers, Intelligence, Surveillance, and Reconnaissance) multi-platform system. Because of rapid changes in the C4ISR domain and in the software technology, frequent critical design adjustments and in turn vast code modifications or additions become inevitable. Defect fixes might also incur code changes. These unavoidable code modifications may put a big risk in the reliability of a mission critical system. Also, in order to stay competitive in the C4ISR market, a company must make recurring releases without sacrificing quality. We have designed and implemented an XML driven automated test framework that enabled us developing numerous high quality tests rapidly. While using Cl with automated software test techniques, we have aimed at speeding up the delivery of high quality and robust software by decreasing integration procedure, which is one of the main bottleneck points in the industry. This work describes how we have used Cl and software test techniques in a large-scaled, multi-platform, multi-language, distributed C4ISR project and what the benefits of such a system are

OpenMETU (Middle East Technical University)