3,492 research outputs found
On utilising change over time in data mining
Magdeburg, Univ., Fak. fĂŒr Informatik, Diss., 2013von Mirko Böttche
Exploiting incomparability in solution dominance : improving general purpose constraint-based mining
In data mining, finding interesting patterns is a challenging task. Constraint-based mining is a well-known approach to this, and one for which constraint programming has been shown to be a well-suited and generic framework. Constraint dominance programming (CDP) has been proposed as an extension that can capture an even wider class of constraint-based mining problems, by allowing us to compare relations between patterns. In this paper we improve CDP with the ability to specify an incomparability condition. This allows us to overcome two major shortcomings of CDP: finding dominated solutions that must then be filtered out after search, and unnecessarily adding dominance blocking constraints between incomparable solutions. We demonstrate the efficacy of our approach by extending the problem specification language ESSENCE and implementing it in a solver-independent manner on top of the constraint modelling tool CONJURE. Our experiments on pattern mining tasks with both a CP solver and a SAT solver show that using the incomparability condition during search significantly improves the efficiency of dominance programming and reduces (and often eliminates entirely) the need for post-processing to filter dominated solutions.Publisher PD
Intermarried Couples: Transnationalism, and Racialized Experiences in Denmark and Canada
Despite an increase in interracial or mixed marriages (intermarriages) globally, the experiences of couples in such marriages are generally under-researched, particularly within psychology. Using a cultural psychological framework and qualitative methods, this paper studies the psychosocial experiences of couples in intermarriages. It focuses on four South Asians in ethnically intermarriages in two settings: two Indian-origin men married to native Danish women in Denmark, and two Indian-origin women married to Euro-American men in Canada. Data from in-depth interviews were subjected to a thematic analysis yielding an array of themes, of which this paper presents the two most dominant themes across the two contexts: âtransnationalismâ and âracialized experiences in social situationsâ. The results demonstrate that the participants lived transnational lives to varying degrees depending on their gender, socio-economic status and age, which in turn intersected with variables such as the nature of the transnational relationships they were attempting to sustain, and their own motivations and agency in maintaining these ties. While in some cases participants maintained a high level of contact with India through visits and digital technology, others kept up limited ongoing contact with the country of origin. Furthermore, varying racialized experiences emerged from the narratives, with differences in how these experiences were interpreted. While some participants recognized them as racial discrimination, others chose to rationalize these experiences in various ways. After offering an account of these results, the paper reflects briefly on the implications of these findings
Routes of freedom: slave resistance and the politics of literary geography
This dissertation integrates rhetorical, historical, and spatial analysis in an effort to expand our understanding of the cultural work performed by antebellum narratives that take slavery in the United States as their subject matter. Specifically, it focuses on the complicated relationship between place and human praxis as revealed in five texts: The Confessions of Nat Turner, Harriet Beecher Stoweâs Dred, Martin R. Delanyâs Blake, Frederick Douglassâs âThe Heroic Slave,â and Herman Melvilleâs Benito Cereno. In my attention to literary geographies, I trace spatial patterns in which considerations of organized resistance and slave rebellion are repeatedly placed in âwild-spacesâ such as the Great Dismal Swamp, the Red River region of Louisiana, and the open ocean. Exploring their strict alignment with considerations of violence, I argue that these wild-spaces do not function as passive settings, supporting and paralleling narrative events or themes. Instead they can be seen to drive narrative action as they carry with them powerful cultural associations that translate into plot momentum. My methodological approach employs two general steps. First I document how antislavery writers developed a historically resonant narrative landscape to defuse criticism and buttress their rhetorical indictments of slavery. Second, I investigate how these writers negotiated the complicated demands of such landscapes in order to supplement moral interpretations with creative imaginings of how alternative forms of slave resistance might play out. By isolating the ties between literary landscapes and the narrativesâ imaginings of slave resistance, we are able to see the intensely pragmatic, real world problem-solving in which these writers were engaged. Such a methodology highlights the formative function of place in literary output, while also providing insight into obstacles to real-world reform. I conclude that the narratives I examine served as a forum for cultural experimentation as their writers attempted to work through social and political problems that had no easy or ready solutions. Considerations of place are shown to be essential to antislavery writersâ attempts to see through the shadow of slavery to its end, and, in doing so, point the way forward
Using and extending itemsets in data mining : query approximation, dense itemsets, and tiles
Frequent itemsets are one of the best known concepts in data mining, and there is active research in itemset mining algorithms. An itemset is frequent in a database if its items co-occur in sufficiently many records. This thesis addresses two questions related to frequent itemsets. The first question is raised by a method for approximating logical queries by an inclusion-exclusion sum truncated to the terms corresponding to the frequent itemsets: how good are the approximations thereby obtained? The answer is twofold: in theory, the worst-case bound for the algorithm is very large, and a construction is given that shows the bound to be tight; but in practice, the approximations tend to be much closer to the correct answer than in the worst case. While some other algorithms based on frequent itemsets yield even better approximations, they are not as widely applicable.
The second question concerns extending the definition of frequent itemsets to relax the requirement of perfect co-occurrence: highly correlated items may form an interesting set, even if they never co-occur in a single record. The problem is to formalize this idea in a way that still admits efficient mining algorithms. Two different approaches are used. First, dense itemsets are defined in a manner similar to the usual frequent itemsets and can be found using a modification of the original itemset mining algorithm. Second, tiles are defined in a different way so as to form a model for the whole data, unlike frequent and dense itemsets. A heuristic algorithm based on spectral properties of the data is given and some of its properties are explored.Yksi tiedon louhinnan tunnetuimmista kÀsitteistÀ ovat kattavat joukot, ja niiden etsintÀalgoritmeja tutkitaan aktiivisesti. Joukko on tietokannassa kattava, jos sen alkiot esiintyvÀt yhdessÀ riittÀvÀn monessa tietueessa. VÀitöskirjassa kÀsitellÀÀn kahta kattaviin joukkoihin liittyvÀÀ kysymystÀ. EnsimmÀinen liittyy algoritmiin, jolla arvioidaan loogisten kyselyjen tuloksia laskemalla inkluusio-ekskluusio-summa pelkÀstÀÀn kattavilla joukoilla; kysymys on, kuinka hyviÀ arvioita nÀin saadaan. VÀitöskirjassa annetaan kaksi vastausta: Teoriassa algoritmin pahimman tapauksen raja on hyvin suuri, ja vastaesimerkillÀ osoitetaan, ettÀ raja on tiukka. KÀytÀnnössÀ arviot ovat paljon lÀhempÀnÀ oikeaa tulosta kuin teoreettinen raja antaa ymmÀrtÀÀ. Arvioita vertaillaan erÀisiin muihin algoritmeihin, joiden tulokset ovat vielÀ parempia mutta jotka eivÀt ole yhtÀ yleisesti sovellettavissa.
Toinen kysymys koskee kattavien joukkojen mÀÀritelmÀn yleistÀmistÀ siten, ettÀ tÀydellisen yhteisesiintymisen vaatimuksesta tingitÀÀn. Joukko korreloituneita alkioita voi olla kiinnostava, vaikka alkiot eivÀt koskaan esiintyisi kaikki samassa tietueessa. Ongelma on tÀmÀn ajatuksen muuttaminen sellaiseksi mÀÀritelmÀksi, ettÀ tehokkaita louhinta-algoritmeja voidaan kÀyttÀÀ. VÀitöskirjassa esitetÀÀn kaksi lÀhestymistapaa. EnsinnÀkin tiheÀt kattavat joukot mÀÀritellÀÀn samanlaiseen tapaan kuin tavalliset kattavat joukot, ja ne voidaan löytÀÀ samantyyppisellÀ algoritmilla. Toiseksi mÀÀritellÀÀn laatat, jotka muodostavat koko datalle mallin, toisin kuin kattavat ja tiheÀt kattavat joukot. Laattojen etsimistÀ varten kuvataan datan spektraalisiin ominaisuuksiin perustuva heuristiikka, jonka erÀitÀ ominaisuuksia tutkitaan.reviewe
Large-Scale Pattern-Based Information Extraction from the World Wide Web
Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web
Aggressive aggregation
Among the first steps in a compilation pipeline is the construction of an Intermediate Representation
(IR), an in-memory representation of the input program. Any attempt to program
optimisation, both in terms of size and running time, has to operate on this structure. There may
be one or multiple such IRs, however, most compilers use some form of a Control Flow Graph
(CFG) internally. This representation clearly aims at general-purpose programming languages,
for which it is well suited and allows for many classical program optimisations. On the other
hand, a growing structural difference between the input program and the chosen IR can lose
or obfuscate information that can be crucial for effective optimisation. With todayâs rise of a
multitude of different programming languages, Domain-Specific Languages (DSLs), and computing
platforms, the classical machine-oriented IR is reaching its limits and a broader variety of
IRs is needed. This realisation yielded, e.g., Multi-Level Intermediate Representation (MLIR),
a compiler framework that facilitates the creation of a wide range of IRs and encourages their
reuse among different programming languages and the corresponding compilers.
In this modern spirit, this dissertation explores the potential of Algebraic Decision Diagrams
(ADDs) as an IR for (domain-specific) program optimisation. The data structure remains the
state of the art for Boolean function representation for more than thirty years and is well-known
for its optimality in size and depth, i.e. running time. As such, it is ideally suited to represent
the corresponding classes of programs in the role of an IR. We will discuss its application in
a variety of different program domains, ranging from DSLs to machine-learned programs and
even to general-purpose programming languages.
Two representatives for DSLs, a graphical and a textual one, prove the adequacy of ADDs
for the program optimisation of modelled decision services. The resulting DSLs facilitate
experimentation with ADDs and provide valuable insight into their potential and limitations:
input programs can be aggregated in a radical fashion, at the risk of the occasional exponential
growth. With the aggregation of large Random Forests into a single aggregated ADD, we
bring this potential to a program domain of practical relevance. The results are impressive:
both running time and size of the Random Forest program are reduced by multiple orders of
magnitude. It turns out that this ADD-based aggregation can be generalised, even to generaliii
purpose programming languages. The resulting method achieves impressive speedups for a
seemingly optimal program: the iterative Fibonacci implementation.
Altogether, ADDs facilitate effective program optimisation where the input programs allow
for a natural transformation to the data structure. In these cases, they have proven to be an
extremely powerful tool for the optimisation of a programâs running time and, in some cases,
of its size. The exploration of their potential as an IR has only started and deserves attention in
future research
Recommended from our members
Latin Christians in the Literary Landscape of Early Rus, c. 988-1330
In the wake of the recent wave of interest in the ties between Early Rus and the Latin world, this dissertation investigates conceptions and depictions of Latin Christians in Early Rusian texts. Unlike previous smaller-scale studies, the present study takes into consideration all indigenous Early Rusian narrative sources which make reference to Latins or the Latin world. Its contribution is twofold. Firstly, it overturns the still prevalent assumption that Early Rusian writers tended to portray Latins as religious Others. There was certainly a place in Early Rusian writing for religious polemic against the Latin faith, but as I show, this place was very restricted. Secondly, having established the considerable diversity and complexity of rhetorical approaches to Latins, this study analyses and explains rhetorical patterns in Early Rusian portrayals of Latins and Latin Christendom. Scholars have tended to interpret these patterns as primarily influenced by extra-textual factors (most often, a textâs time of composition). This study, however, establishes that textual factorsâspecifically genre and themeâare the best predictors of a textâs portrayal of Latins, and explains the appearance and evolution of particular generic and thematic representations. It also demonstrates that a textâs place of composition tends to have a greater influence on its depictions of Latins than its time of composition. Through close engagement with the subtleties and ambiguities of Early Rusian depictions of Latins, this study furthers contemporary debate on questions of narrative, identity and difference in Rus and the medieval world.Funded by the Centre for East European Language-Based Area Studies (CEELBAS
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
- âŠ