5,930 research outputs found
Same Difference: Detecting Collusion by Finding Unusual Shared Elements
Pam Green, Peter Lane, Austen Rainer, Sven-Bodo Scholz, Steve Bennett, âSame Difference: Detecting Collusion by Finding Unusual Shared Elementsâ, paper presented at the 5th International Plagiarism Conference, Sage Gateshead, Newcastle, UK, 17-18 July, 2012.Many academic staff will recognise that unusual shared elements in student submissions trigger suspicion of inappropriate collusion. These elements may be odd phrases, strange constructs, peculiar layout, or spelling mistakes. In this paper we review twenty-nine approaches to source-code plagiarism detection, showing that the majority focus on overall file similarity, and not on unusual shared elements, and that none directly measure these elements. We describe an approach to detecting similarity between files which focuses on these unusual similarities. The approach is token-based and therefore largely language independent, and is tested on a set of student assignments, each one consisting of a mix of programming languages. We also introduce a technique for visualising one document in relation to another in the context of the group. This visualisation separates code which is unique to the document, that shared by just the two files, code shared by small groups, and uninteresting areas of the file.Peer reviewe
SeqDoC: rapid SNP and mutation detection by direct comparison of DNA sequence chromatograms
BACKGROUND: This paper describes SeqDoC, a simple, web-based tool to carry out direct comparison of ABI sequence chromatograms. This allows the rapid identification of single nucleotide polymorphisms (SNPs) and point mutations without the need to install or learn more complicated analysis software. RESULTS: SeqDoC produces a subtracted trace showing differences between a reference and test chromatogram, and is optimised to emphasise those characteristic of single base changes. It automatically aligns sequences, and produces straightforward graphical output. The use of direct comparison of the sequence chromatograms means that artefacts introduced by automatic base-calling software are avoided. Homozygous and heterozygous substitutions and insertion/deletion events are all readily identified. SeqDoC successfully highlights nucleotide changes missed by the Staden package 'tracediff' program. CONCLUSION: SeqDoC is ideal for small-scale SNP identification, for identification of changes in random mutagenesis screens, and for verification of PCR amplification fidelity. Differences are highlighted, not interpreted, allowing the investigator to make the ultimate decision on the nature of the change
Weather persistence on sub-seasonal to seasonal timescales: a methodological review
Persistence is an important concept in meteorology. It refers to surface weather or the atmospheric circulation either remaining in approximately the same state (stationarity) or repeatedly occupying the same state (recurrence) over some prolonged period of time. Persistence can be found at many different timescales; however, the sub-seasonal to seasonal (S2S) timescale is especially relevant in terms of impacts and atmospheric predictability. For these reasons, S2S persistence has been attracting increasing attention by the scientific community. The dynamics responsible for persistence and their potential evolution under climate change are a notable focus of active research. However, one important challenge facing the community is how to define persistence, from both a qualitative and quantitative perspective. Despite a general agreement on the concept, many different definitions and perspectives have been proposed over the years, among which it is not always easy to find oneâs way. The purpose of this review is to present and discuss existing concepts of weather persistence, associated methodologies and physical interpretations. In particular, we call attention to the fact that persistence can be defined as a global or as a local property of a system, with important implications in terms of methods but also impacts. We also highlight the importance of timescale and similarity metric selection, and illustrate some of the concepts using the example of summertime atmospheric circulation over Western Europ
The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences
<p>Abstract</p> <p>Background</p> <p>Viruses of the <it>Bunyaviridae </it>have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database.</p> <p>Results</p> <p>The VirusBanker database contains <it>Bunyaviridae </it>sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. <it>Bunyaviridae </it>sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview.</p> <p>Conclusion</p> <p>VirusBanker allows large datasets of aligned nucleotide and protein sequences from the <it>Bunyaviridae </it>to be compiled and winnowed rapidly using criteria that are formulated heuristically.</p
Persistent topology of the reionisation bubble network. I: Formalism & Phenomenology
We present a new formalism for studying the topology of HII regions during
the Epoch of Reionisation, based on persistent homology theory. With persistent
homology, it is possible to follow the evolution of topological features over
time. We introduce the notion of a persistence field as a statistical summary
of persistence data and we show how these fields can be used to identify
different stages of reionisation. We identify two new stages common to all
bubble ionisation scenarios. Following an initial pre-overlap and subsequent
overlap stage, the topology is first dominated by neutral filaments (filament
stage) and then by enclosed patches of neutral hydrogen undergoing outside-in
ionisation (patch stage). We study how these stages are affected by the degree
of galaxy clustering. We also show how persistence fields can be used to study
other properties of the ionisation topology, such as the bubble size
distribution and the fractal-like topology of the largest ionised region.Comment: 18 pages, 12 figures, 1 table. Submitted to MNRA
Time domain deconvolution in nonlinear elastoplastic soil deposits
The paper presents an iterative procedure for the time domain deconvolution in nonlinear elastoplastic materials. The approach is intended for the generation of input motions for dynamic soilâstructure interaction (DSSI) numerical analyses when the desired earthquake is specified at the surface of a nonlinear soil deposit. The main advantage is that the same constitutive model (or models) to be used in the DSSI simulation to characterise the soil deposit is also employed in the deconvolution procedure. Therefore, the desired surface motion is recovered from the free-field propagation of the resulting input motion at the base of the numerical model, accounting for the assumed constitutive behaviour of the ground. An application example is also presented, where the potential of the proposed approach is shown.Peer ReviewedPostprint (published version
Recommended from our members
Big Chord Data Extraction and Mining
Harmonic progression is one of the cornerstones of tonal music composition and is thereby essential to many musical styles and traditions. Previous studies have shown that musical genres and composers could be discriminated based on chord progressions modeled as chord n-grams. These studies were however conducted on small-scale datasets and using symbolic music transcriptions.
In this work, we apply pattern mining techniques to over 200,000 chord progression sequences out of 1,000,000 extracted from the I Like Music (ILM) commercial music audio collection. The ILM collection spans 37 musical genres and includes pieces released between 1907 and 2013. We developed a single program multiple data parallel computing approach whereby audio feature extraction tasks are split up and run simultaneously on multiple cores. An audio-based chord recognition model (Vamp plugin Chordino) was used to extract the chord progressions from the ILM set. To keep low-weight feature sets, the chord data were stored using a compact binary format. We used the CM-SPADE algorithm, which performs a vertical mining of sequential patterns using co-occurence information, and which is fast and efïŹcient enough to be applied to big data collections like the ILM set. In orderto derive key-independent frequent patterns, the transition between chords are modeled by changes of qualities (e.g. major, minor, etc.) and root keys (e.g. fourth, ïŹfth, etc.). The resulting key-independent chord progression patterns vary in length (from 2 to 16) and frequency (from 2 to 19,820) across genres. As illustrated by graphs generated to represent frequent 4-chord progressions, some patterns like circle-of-ïŹfths movements are well represented in most genres but in varying degrees.
These large-scale results offer the opportunity to uncover similarities and discrepancies between sets of musical pieces and therefore to build classiïŹers for search and recommendation. They also support the empirical testing of music theory. It is however more difïŹcult to derive new hypotheses from such dataset due to its size. This can be addressed by using pattern detection algorithms or suitable visualisation which we present in a companion study
- âŠ