Search CORE

729 research outputs found

Communication Efficient Checking of Big Data Operations

Author: Hübschle-Schneider Lorenz
Sanders Peter
Publication venue
Publication date: 01/01/2018
Field of study

We propose fast probabilistic algorithms with low (i.e., sublinear in the input size) communication volume to check the correctness of operations in Big Data processing frameworks and distributed databases. Our checkers cover many of the commonly used operations, including sum, average, median, and minimum aggregation, as well as sorting, union, merge, and zip. An experimental evaluation of our implementation in Thrill (Bingmann et al., 2016) confirms the low overhead and high failure detection rate predicted by theoretical analysis

arXiv.org e-Print Archive

Crossref

KITopen

Of Bogus Hunters, Queenpins and Mules: The Varied Roles of Women in Transnational Organized Crime in Southern Africa

Author: Hübschle A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Organized crime scholars have paid scant attention to gender and stereotyped roles of women in the commission of organized crime activities. Traditionally, organized crime is seen as a form of criminality perpetrated by men only. Women are usually portrayed as victims of organized crime or as “mean girls”, girlfriends, wives, lovers of brides of notorious gangsters and mobsters. In the southern African context, little historical or comparative data is available on the role of women in organized crime. Existing data is basic and proceeds on the assumption of gender-neutrality or the implied male composition of organized crime groups. The link of women to organized crime is one of suffering and exploitation. However, in reality women fulfill varied roles and functions within transnational organized crime networks in the region. In some instances, they are the foot soldiers of drug and human trafficking syndicates. Sometimes they are the intermediaries or powerful matriarchs at the apex of transnational organized crime networks. Reliant on empirical findings undertaken for a regional 3-year project on organized crime trends in southern Africa, this paper will examine the dynamism of the role of women in organized crime in the region and argues that women play a multifaceted role with implications for themselves, their families, society and organized crime. Gender mainstreaming within scholarly literature and policy research is in nascent stages, this paper pleads for a more gender-sensitive approach to organized crime analysis

MPG.PuRe

Assumptions and Reality: The securitisation of human trafficking in Southern Africa

Author: Hübschle Annette
Publication venue: 'Korean Institute of Criminology'
Publication date: 01/01/2010
Field of study

Our understanding of the concept of security has changed since the end of the Cold War. A cursive look at our daily news headlines confirms that a plethora of phenomena are phrased in security terms. The 'drug on wars' and the 'global war on terrorism' are the most obvious examples. Trafficking in persons has also been elevated to a security issue. The trend of 'securitising' non-traditional security threats has not stirred much controversy as yet. This dissertation will question why and how the issue of human trafficking has been securitised. In using the Copenhagen School's securitisation theory as an analytical framework, the dissertation will examine the international and regional (southern Africa) dimensions of the securitisation of human trafficking. The emergence of human trafficking as a social problem in public discourse will be discussed. Of principal concern are the underlying interests that propel the moral panic. Another chapter will look at global strategies aimed at combating and preventing trafficking. Before exploring the parallels between the 'Global War on Terrorism' and the dominant anti-trafficking paradigm, existing research evidence on the prevalence, scale and size of human trafficking will be scrutinised

Cape Town University OpenUCT

Interview with Major General Johan Jooste (Retired), South African National Parks, Head of Special Projects

Author: Hübschle A.
Publication venue: 'Academy of Science of South Africa'
Publication date: 01/01/2017
Field of study

A multitude of measures, including regulatory changes, law enforcement measures and demand reduction campaigns, appear to have done little to stem the tide against organised environmental crimes. However, fewer rhinos were poached in South Africa’s signature national park, the Kruger National Park (KNP), in 2015 and 2016 than in the year before and a steady decline was evident at the time of the interview in June 2017. The KNP is home to the largest number of free roaming rhinos in the world. The park has been in the ‘eye of the storm’, losing close to 4 000 rhinos to poaching between 2006 and 2016. In 2012, the South African National Parks (SANParks) management formed a unit named Special Projects. The function of the project team was to develop and implement mitigation measures to deal with the drastic increase in wildlife crime and, in particular, rhino poaching in the KNP. Major General Johan Jooste (Ret) heads the unit. Critical voices have questioned the efficacy of the anti-poaching strategy, suggesting that park authorities are waging a ‘war on poaching’ with unintended long-term consequences for protected areas management and community relations.1 Scholars have argued that ‘green militarisation’ has led to an arms race between poachers and rangers2 and, moreover, that ‘green violence’ has led to the deployment of violent instruments and tactics in pursuit of the protection of nature, and ideas and aspirations related to nature conservation

MPG.PuRe

Communication-Efficient Probabilistic Algorithms: Selection, Sampling, and Checking

Author: Hübschle-Schneider Lorenz
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 14/12/2020
Field of study

Diese Dissertation behandelt drei grundlegende Klassen von Problemen in Big-Data-Systemen, für die wir kommunikationseffiziente probabilistische Algorithmen entwickeln. Im ersten Teil betrachten wir verschiedene Selektionsprobleme, im zweiten Teil das Ziehen gewichteter Stichproben (Weighted Sampling) und im dritten Teil die probabilistische Korrektheitsprüfung von Basisoperationen in Big-Data-Frameworks (Checking). Diese Arbeit ist durch einen wachsenden Bedarf an Kommunikationseffizienz motiviert, der daher rührt, dass der auf das Netzwerk und seine Nutzung zurückzuführende Anteil sowohl der Anschaffungskosten als auch des Energieverbrauchs von Supercomputern und der Laufzeit verteilter Anwendungen immer weiter wächst. Überraschend wenige kommunikationseffiziente Algorithmen sind für grundlegende Big-Data-Probleme bekannt. In dieser Arbeit schließen wir einige dieser Lücken. Zunächst betrachten wir verschiedene Selektionsprobleme, beginnend mit der verteilten Version des klassischen Selektionsproblems, d. h. dem Auffinden des Elements von Rang

k

in einer großen verteilten Eingabe. Wir zeigen, wie dieses Problem kommunikationseffizient gelöst werden kann, ohne anzunehmen, dass die Elemente der Eingabe zufällig verteilt seien. Hierzu ersetzen wir die Methode zur Pivotwahl in einem schon lange bekannten Algorithmus und zeigen, dass dies hinreichend ist. Anschließend zeigen wir, dass die Selektion aus lokal sortierten Folgen – multisequence selection – wesentlich schneller lösbar ist, wenn der genaue Rang des Ausgabeelements in einem gewissen Bereich variieren darf. Dies benutzen wir anschließend, um eine verteilte Prioritätswarteschlange mit Bulk-Operationen zu konstruieren. Später werden wir diese verwenden, um gewichtete Stichproben aus Datenströmen zu ziehen (Reservoir Sampling). Schließlich betrachten wir das Problem, die global häufigsten Objekte sowie die, deren zugehörige Werte die größten Summen ergeben, mit einem stichprobenbasierten Ansatz zu identifizieren. Im Kapitel über gewichtete Stichproben werden zunächst neue Konstruktionsalgorithmen für eine klassische Datenstruktur für dieses Problem, sogenannte Alias-Tabellen, vorgestellt. Zu Beginn stellen wir den ersten Linearzeit-Konstruktionsalgorithmus für diese Datenstruktur vor, der mit konstant viel Zusatzspeicher auskommt. Anschließend parallelisieren wir diesen Algorithmus für Shared Memory und erhalten so den ersten parallelen Konstruktionsalgorithmus für Aliastabellen. Hiernach zeigen wir, wie das Problem für verteilte Systeme mit einem zweistufigen Algorithmus angegangen werden kann. Anschließend stellen wir einen ausgabesensitiven Algorithmus für gewichtete Stichproben mit Zurücklegen vor. Ausgabesensitiv bedeutet, dass die Laufzeit des Algorithmus sich auf die Anzahl der eindeutigen Elemente in der Ausgabe bezieht und nicht auf die Größe der Stichprobe. Dieser Algorithmus kann sowohl sequentiell als auch auf Shared-Memory-Maschinen und verteilten Systemen eingesetzt werden und ist der erste derartige Algorithmus in allen drei Kategorien. Wir passen ihn anschließend an das Ziehen gewichteter Stichproben ohne Zurücklegen an, indem wir ihn mit einem Schätzer für die Anzahl der eindeutigen Elemente in einer Stichprobe mit Zurücklegen kombinieren. Poisson-Sampling, eine Verallgemeinerung des Bernoulli-Sampling auf gewichtete Elemente, kann auf ganzzahlige Sortierung zurückgeführt werden, und wir zeigen, wie ein bestehender Ansatz parallelisiert werden kann. Für das Sampling aus Datenströmen passen wir einen sequentiellen Algorithmus an und zeigen, wie er in einem Mini-Batch-Modell unter Verwendung unserer im Selektionskapitel eingeführten Bulk-Prioritätswarteschlange parallelisiert werden kann. Das Kapitel endet mit einer ausführlichen Evaluierung unserer Aliastabellen-Konstruktionsalgorithmen, unseres ausgabesensitiven Algorithmus für gewichtete Stichproben mit Zurücklegen und unseres Algorithmus für gewichtetes Reservoir-Sampling. Um die Korrektheit verteilter Algorithmen probabilistisch zu verifizieren, schlagen wir Checker für grundlegende Operationen von Big-Data-Frameworks vor. Wir zeigen, dass die Überprüfung zahlreicher Operationen auf zwei „Kern“-Checker reduziert werden kann, nämlich die Prüfung von Aggregationen und ob eine Folge eine Permutation einer anderen Folge ist. Während mehrere Ansätze für letzteres Problem seit geraumer Zeit bekannt sind und sich auch einfach parallelisieren lassen, ist unser Summenaggregations-Checker eine neuartige Anwendung der gleichen Datenstruktur, die auch zählenden Bloom-Filtern und dem Count-Min-Sketch zugrunde liegt. Wir haben beide Checker in Thrill, einem Big-Data-Framework, implementiert. Experimente mit absichtlich herbeigeführten Fehlern bestätigen die von unserer theoretischen Analyse vorhergesagte Erkennungsgenauigkeit. Dies gilt selbst dann, wenn wir häufig verwendete schnelle Hash-Funktionen mit in der Theorie suboptimalen Eigenschaften verwenden. Skalierungsexperimente auf einem Supercomputer zeigen, dass unsere Checker nur sehr geringen Laufzeit-Overhead haben, welcher im Bereich von

2\,\%

liegt und dabei die Korrektheit des Ergebnisses nahezu garantiert wird

KITopen

Communication-Efficient (Weighted) Reservoir Sampling from Fully Distributed Data Streams

Author: Hübschle-Schneider Lorenz
Sanders Peter
Publication venue
Publication date: 01/01/2020
Field of study

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our goal is to showcase its usefulness. We present and analyze fully distributed, communication-efficient algorithms for both versions of the problem. An experimental evaluation of weighted reservoir sampling on up to 256 nodes (5120 processors) shows good speedups, while theoretical analysis promises further scaling to much larger machines.Comment: A previous version of this paper was titled "Communication-Efficient (Weighted) Reservoir Sampling

arXiv.org e-Print Archive

KITopen

Linear work generation of R-MAT graphs

Author: Hübschle-Schneider L.
Sanders P.
Publication venue: Cambridge University Press
Publication date: 09/05/2019
Field of study

R-MAT (for Recursive MATrix) is a simple, widely used model for generating graphs with a power law degree distribution, a small diameter, and communitys structure. It is particularly attractive for generating very large graphs because edges can be generated independently by an arbitrary number of processors. However, current R-MAT generators need time logarithmic in the number of nodes for generating an edge— constant time for generating one bit at a time for node IDs of the connected nodes. We achieve constant time per edge by precomputing pieces of node IDs of logarithmic length. Using an alias table data structure, these pieces can then be sampled in constant time. This simple technique leads to practical improvements by an order of magnitude. This further pushes the limits of attainable graph size and makes generation overhead negligible in most situations

arXiv.org e-Print Archive

KITopen

Communication-Efficient Weighted Reservoir Sampling from Fully Distributed Data Streams

Author: Hübschle-Schneider Lorenz
Sanders Peter
Publication venue: Association for Computing Machinery
Publication date: 01/01/2020
Field of study

We consider weighted random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our goal is to showcase its usefulness. We present and analyze a fully distributed, communication-efficient algorithm for weighted reservoir sampling in this model. An experimental evaluation on up to 256 nodes (5120 processors) shows good speedups, while theoretical analysis promises further scaling to much larger machines

KITopen

Parallel Weighted Random Sampling

Author: Hübschle-Schneider Lorenz
Sanders Peter
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH
Publication date: 01/01/2019
Field of study

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

KITopen