Search CORE

5,930 research outputs found

Same Difference: Detecting Collusion by Finding Unusual Shared Elements

Author: Bennett Steve
Green Pam
Lane Peter
Rainer Austen
Scholz Sven-Bodo
Publication venue
Publication date: 01/01/2012
Field of study

Pam Green, Peter Lane, Austen Rainer, Sven-Bodo Scholz, Steve Bennett, ‘Same Difference: Detecting Collusion by Finding Unusual Shared Elements’, paper presented at the 5th International Plagiarism Conference, Sage Gateshead, Newcastle, UK, 17-18 July, 2012.Many academic staff will recognise that unusual shared elements in student submissions trigger suspicion of inappropriate collusion. These elements may be odd phrases, strange constructs, peculiar layout, or spelling mistakes. In this paper we review twenty-nine approaches to source-code plagiarism detection, showing that the majority focus on overall file similarity, and not on unusual shared elements, and that none directly measure these elements. We describe an approach to detecting similarity between files which focuses on these unusual similarities. The approach is token-based and therefore largely language independent, and is tested on a set of student assignments, each one consisting of a mix of programming languages. We also introduce a technique for visualising one document in relation to another in the context of the group. This visualisation separates code which is unique to the document, that shared by just the two files, code shared by small groups, and uninteresting areas of the file.Peer reviewe

CiteSeerX

University of Hertfordshire Research Archive

SeqDoC: rapid SNP and mutation detection by direct comparison of DNA sequence chromatograms

Author: Crowe Mark L
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: This paper describes SeqDoC, a simple, web-based tool to carry out direct comparison of ABI sequence chromatograms. This allows the rapid identification of single nucleotide polymorphisms (SNPs) and point mutations without the need to install or learn more complicated analysis software. RESULTS: SeqDoC produces a subtracted trace showing differences between a reference and test chromatogram, and is optimised to emphasise those characteristic of single base changes. It automatically aligns sequences, and produces straightforward graphical output. The use of direct comparison of the sequence chromatograms means that artefacts introduced by automatic base-calling software are avoided. Homozygous and heterozygous substitutions and insertion/deletion events are all readily identified. SeqDoC successfully highlights nucleotide changes missed by the Staden package 'tracediff' program. CONCLUSION: SeqDoC is ideal for small-scale SNP identification, for identification of changes in random mutagenesis screens, and for verification of PCR amplification fidelity. Differences are highlighted, not interpreted, allowing the investigator to make the ultimate decision on the nature of the change

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

Weather persistence on sub-seasonal to seasonal timescales: a methodological review

Author: Martius Olivia
Tuel Alexandre
Publication venue: Copernicus
Publication date: 20/02/2023
Field of study

Persistence is an important concept in meteorology. It refers to surface weather or the atmospheric circulation either remaining in approximately the same state (stationarity) or repeatedly occupying the same state (recurrence) over some prolonged period of time. Persistence can be found at many different timescales; however, the sub-seasonal to seasonal (S2S) timescale is especially relevant in terms of impacts and atmospheric predictability. For these reasons, S2S persistence has been attracting increasing attention by the scientific community. The dynamics responsible for persistence and their potential evolution under climate change are a notable focus of active research. However, one important challenge facing the community is how to define persistence, from both a qualitative and quantitative perspective. Despite a general agreement on the concept, many different definitions and perspectives have been proposed over the years, among which it is not always easy to find one’s way. The purpose of this review is to present and discuss existing concepts of weather persistence, associated methodologies and physical interpretations. In particular, we call attention to the fact that persistence can be defined as a global or as a local property of a system, with important implications in terms of methods but also impacts. We also highlight the importance of timescale and similarity metric selection, and illustrate some of the concepts using the example of summertime atmospheric circulation over Western Europ

Bern Open Repository and Information System (BORIS)

The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences

Author: AC de Avila
C Büchen-Osmond
C Macken
CJ Peters
CM Fauquet
CS Schmaljohn
DJ Esteban
J Ehlers
M Clamp
Mark J Gibbs
Mathieu Fourment
R Vorou
RC Edgar
SF Altschul
SF Khaiboullina
Y Bao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Viruses of the <it>Bunyaviridae </it>have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains <it>Bunyaviridae </it>sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. <it>Bunyaviridae </it>sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the <it>Bunyaviridae </it>to be compiled and winnowed rapidly using criteria that are formulated heuristically.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Australian National University

Macquarie University ResearchOnline

Persistent topology of the reionisation bubble network. I: Formalism & Phenomenology

Author: Elbers Willem
van de Weygaert Rien
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/12/2018
Field of study

We present a new formalism for studying the topology of HII regions during the Epoch of Reionisation, based on persistent homology theory. With persistent homology, it is possible to follow the evolution of topological features over time. We introduce the notion of a persistence field as a statistical summary of persistence data and we show how these fields can be used to identify different stages of reionisation. We identify two new stages common to all bubble ionisation scenarios. Following an initial pre-overlap and subsequent overlap stage, the topology is first dominated by neutral filaments (filament stage) and then by enclosed patches of neutral hydrogen undergoing outside-in ionisation (patch stage). We study how these stages are affected by the degree of galaxy clustering. We also show how persistence fields can be used to study other properties of the ionisation topology, such as the bubble size distribution and the fractal-like topology of the largest ionised region.Comment: 18 pages, 12 figures, 1 table. Submitted to MNRA

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Statistical investigation of the factors influencing the performance of parallel programs, with application to a study of process migration strategies

Author: Phillips Joseph
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Edinburgh Research Archive

Time domain deconvolution in nonlinear elastoplastic soil deposits

Author: Hidalgo-Leiva Diego
Mánica Malcom Miguel Ángel
Ordaz Schroeder Mario Gustavo
Pinzón Ureña Luis
Pujades Beneit Lluís
Publication venue: Elsevier
Publication date: 01/01/2023
Field of study

The paper presents an iterative procedure for the time domain deconvolution in nonlinear elastoplastic materials. The approach is intended for the generation of input motions for dynamic soil–structure interaction (DSSI) numerical analyses when the desired earthquake is specified at the surface of a nonlinear soil deposit. The main advantage is that the same constitutive model (or models) to be used in the DSSI simulation to characterise the soil deposit is also employed in the deconvolution procedure. Therefore, the desired surface motion is recovered from the free-field propagation of the resulting input motion at the base of the numerical model, accounting for the assumed constitutive behaviour of the ground. An application example is also presented, where the potential of the proposed approach is shown.Peer ReviewedPostprint (published version

Repositorio Institucional de la Universidad de Costa Rica

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Big Chord Data Extraction and Mining

Author: Barthet M.
Dykes J.
Kachkaev A.
Plumbley M. D.
Weyde T.
Wolff D.
Publication venue
Publication date: 01/01/2014
Field of study

Harmonic progression is one of the cornerstones of tonal music composition and is thereby essential to many musical styles and traditions. Previous studies have shown that musical genres and composers could be discriminated based on chord progressions modeled as chord n-grams. These studies were however conducted on small-scale datasets and using symbolic music transcriptions. In this work, we apply pattern mining techniques to over 200,000 chord progression sequences out of 1,000,000 extracted from the I Like Music (ILM) commercial music audio collection. The ILM collection spans 37 musical genres and includes pieces released between 1907 and 2013. We developed a single program multiple data parallel computing approach whereby audio feature extraction tasks are split up and run simultaneously on multiple cores. An audio-based chord recognition model (Vamp plugin Chordino) was used to extract the chord progressions from the ILM set. To keep low-weight feature sets, the chord data were stored using a compact binary format. We used the CM-SPADE algorithm, which performs a vertical mining of sequential patterns using co-occurence information, and which is fast and efﬁcient enough to be applied to big data collections like the ILM set. In orderto derive key-independent frequent patterns, the transition between chords are modeled by changes of qualities (e.g. major, minor, etc.) and root keys (e.g. fourth, ﬁfth, etc.). The resulting key-independent chord progression patterns vary in length (from 2 to 16) and frequency (from 2 to 19,820) across genres. As illustrated by graphs generated to represent frequent 4-chord progressions, some patterns like circle-of-ﬁfths movements are well represented in most genres but in varying degrees. These large-scale results offer the opportunity to uncover similarities and discrepancies between sets of musical pieces and therefore to build classiﬁers for search and recommendation. They also support the empirical testing of music theory. It is however more difﬁcult to derive new hypotheses from such dataset due to its size. This can be addressed by using pattern detection algorithms or suitable visualisation which we present in a companion study

City Research Online

Surrey Research Insight