Search CORE

34,144 research outputs found

Editor\u27s Note

Author
Publication venue: DigitalCommons@Macalester College
Publication date: 31/08/2008
Field of study

Finding and tracking multi-density clusters in an online dynamic data stream

Author: Fahy Conor
Yang Shengxiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/05/2019
Field of study

The file attached to this record is the author's final peer reviewed version.Change is one of the biggest challenges in dynamic stream mining. From a data-mining perspective, adapting and tracking change is desirable in order to understand how and why change has occurred. Clustering, a form of unsupervised learning, can be used to identify the underlying patterns in a stream. Density-based clustering identifies clusters as areas of high density separated by areas of low density. This paper proposes a Multi-Density Stream Clustering (MDSC) algorithm to address these two problems; the multi-density problem and the problem of discovering and tracking changes in a dynamic stream. MDSC consists of two on-line components; discovered, labelled clusters and an outlier buffer. Incoming points are assigned to a live cluster or passed to the outlier buffer. New clusters are discovered in the buffer using an ant-inspired swarm intelligence approach. The newly discovered cluster is uniquely labelled and added to the set of live clusters. Processed data is subject to an ageing function and will disappear when it is no longer relevant. MDSC is shown to perform favourably to state-of-the-art peer stream-clustering algorithms on a range of real and synthetic data-streams. Experimental results suggest that MDSC can discover qualitatively useful patterns while being scalable and robust to noise

De Montfort University Open Research Archive

AsterixDB: A Scalable, Open Source BDMS

Author: Alsubaiee Sattam
Altowim Yasser
Altwaijry Hotham
Behm Alexander
Borkar Vinayak
Bu Yingyi
Carey Michael
Cetindil Inci
Cheelangi Madhusudan
Faraaz Khurram
Gabrielova Eugenia
Grover Raman
Heilbron Zachary
Kim Young-Seok
Li Chen
Li Guangqiang
Ok Ji Mahn
Onose Nicola
Pirzadeh Pouria
Tsotras Vassilis
Vernica Rares
Wen Jian
Westmann Till
Publication venue
Publication date: 02/07/2014
Field of study

AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

arXiv.org e-Print Archive

CiteSeerX

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

Author: Albutiu Martina-Cezara
Kemper Alfons
Neumann Thomas
Publication venue
Publication date: 01/01/2012
Field of study

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

The Rhetoric of Predictability: Reclaiming the Lay Ear in Music Copyright Infringement Litigation

Author: Padgett Austin
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/12/2008
Field of study

[Excerpt] “Some things cannot be described. This is the theory that recent literary criticism has placed as its cornerstone. Philosopher-critic Roland Barthes identified this trend in his Mythologies, stating that critics often “suddenly decide that the true subject of criticism is ineffable, and criticism, as a consequence, unnecessary. Unfortunately, this view has become singular within the legal academy whenever an author discusses music copyright infringement analysis. It seems that scholars fear the thought of trusting a jury with such an “ineffable” subject as music and must propose alternatives, such as expert testimony, specialized courts, or mechanical analysis, that will diminish the ability of a jury of lay ears to determine what is or is not substantially similar. This article proposes that the simplest and best approach to music copyright infringement litigation is to accept the jury‘s determination of substantial similarity in its most classic form. Part II of this paper will explore the development of the current standards that the federal courts use in music copyright infringement cases. Part III will survey scholarly reactions to these standards, detailing and categorizing the variety of proposals put forth by different authors. Part IV will describe the shortcomings and unnecessary complexity of these proposals, advocating for the simplest and original approach put forth by the courts in Part II.

UNH Scholars' Repository

The nested structure of urban business clusters

Author: Arcaute Elsa
Cottineau Clémentine
Publication venue
Publication date: 13/05/2019
Field of study

Although the cluster theory literature is bountiful in economics and regional science, there is still a lack of understanding of how the geographical scales of analysis (neighbourhood, city, region) relate to one another and impact the observed phenomenon, and to which extent the clusters are industrially bound or geographically consistent. In this paper, we cluster spatial economic activities through a multi-scalar approach following percolation theory. We consider both the industrial similarity and the geographical proximity of firms, through their joint probability function which is constructed as a copula. This gives rise to an emergent nested hierarchy of geoindustrial clusters, which enables us to analyse the relationships between the different scales, and specific industrial sectors. Using longitudinal business microdata from the Office for National Statistics, we look at the evolution of clusters which spans from very local groups of businesses to the metropolitan level, in 2007 and in 2014, so that the changes stemming from the financial crisis can be observed.Comment: 20 pages, 10 figure

arXiv.org e-Print Archive

UCL Discovery