Search CORE

3,492 research outputs found

On utilising change over time in data mining

Author: Böttcher Mirko
Publication venue: Universitätsbibl.
Publication date
Field of study

Magdeburg, Univ., Fak. für Informatik, Diss., 2013von Mirko Böttche

Digital University Library Saxony-Anhalt

Exploiting incomparability in solution dominance : improving general purpose constraint-based mining

Author: Akgun Ozgur
Guns Tias
Kocak Gokberk
Miguel Ian James
Publication venue: 'IOS Press'
Publication date: 29/08/2020
Field of study

In data mining, finding interesting patterns is a challenging task. Constraint-based mining is a well-known approach to this, and one for which constraint programming has been shown to be a well-suited and generic framework. Constraint dominance programming (CDP) has been proposed as an extension that can capture an even wider class of constraint-based mining problems, by allowing us to compare relations between patterns. In this paper we improve CDP with the ability to specify an incomparability condition. This allows us to overcome two major shortcomings of CDP: finding dominated solutions that must then be filtered out after search, and unnecessarily adding dominance blocking constraints between incomparable solutions. We demonstrate the efficacy of our approach by extending the problem specification language ESSENCE and implementing it in a solver-independent manner on top of the constraint modelling tool CONJURE. Our experiments on pattern mining tasks with both a CP solver and a SAT solver show that using the incomparability condition during search significantly improves the efficiency of dominance programming and reduces (and often eliminates entirely) the need for post-processing to filter dominated solutions.Publisher PD

University of St. Andrews - Pure

St Andrews Research Repository

Intermarried Couples: Transnationalism, and Racialized Experiences in Denmark and Canada

Author: Ganapathy-Coleman Hema
Singla Rashmi
Publication venue: ScholarWorks@GVSU
Publication date: 01/01/2020
Field of study

Despite an increase in interracial or mixed marriages (intermarriages) globally, the experiences of couples in such marriages are generally under-researched, particularly within psychology. Using a cultural psychological framework and qualitative methods, this paper studies the psychosocial experiences of couples in intermarriages. It focuses on four South Asians in ethnically intermarriages in two settings: two Indian-origin men married to native Danish women in Denmark, and two Indian-origin women married to Euro-American men in Canada. Data from in-depth interviews were subjected to a thematic analysis yielding an array of themes, of which this paper presents the two most dominant themes across the two contexts: ‘transnationalism’ and ‘racialized experiences in social situations’. The results demonstrate that the participants lived transnational lives to varying degrees depending on their gender, socio-economic status and age, which in turn intersected with variables such as the nature of the transnational relationships they were attempting to sustain, and their own motivations and agency in maintaining these ties. While in some cases participants maintained a high level of contact with India through visits and digital technology, others kept up limited ongoing contact with the country of origin. Furthermore, varying racialized experiences emerged from the narratives, with differences in how these experiences were interpreted. While some participants recognized them as racial discrimination, others chose to rationalize these experiences in various ways. After offering an account of these results, the paper reflects briefly on the implications of these findings

Scholarworks@GVSU

Routes of freedom: slave resistance and the politics of literary geography

Author: Kemerait Judith Louise
Publication venue: LSU Digital Commons
Publication date: 01/01/2004
Field of study

This dissertation integrates rhetorical, historical, and spatial analysis in an effort to expand our understanding of the cultural work performed by antebellum narratives that take slavery in the United States as their subject matter. Specifically, it focuses on the complicated relationship between place and human praxis as revealed in five texts: The Confessions of Nat Turner, Harriet Beecher Stowe’s Dred, Martin R. Delany’s Blake, Frederick Douglass’s “The Heroic Slave,” and Herman Melville’s Benito Cereno. In my attention to literary geographies, I trace spatial patterns in which considerations of organized resistance and slave rebellion are repeatedly placed in “wild-spaces” such as the Great Dismal Swamp, the Red River region of Louisiana, and the open ocean. Exploring their strict alignment with considerations of violence, I argue that these wild-spaces do not function as passive settings, supporting and paralleling narrative events or themes. Instead they can be seen to drive narrative action as they carry with them powerful cultural associations that translate into plot momentum. My methodological approach employs two general steps. First I document how antislavery writers developed a historically resonant narrative landscape to defuse criticism and buttress their rhetorical indictments of slavery. Second, I investigate how these writers negotiated the complicated demands of such landscapes in order to supplement moral interpretations with creative imaginings of how alternative forms of slave resistance might play out. By isolating the ties between literary landscapes and the narratives’ imaginings of slave resistance, we are able to see the intensely pragmatic, real world problem-solving in which these writers were engaged. Such a methodology highlights the formative function of place in literary output, while also providing insight into obstacles to real-world reform. I conclude that the narratives I examine served as a forum for cultural experimentation as their writers attempted to work through social and political problems that had no easy or ready solutions. Considerations of place are shown to be essential to antislavery writers’ attempts to see through the shadow of slavery to its end, and, in doing so, point the way forward

Louisiana State University

Using and extending itemsets in data mining : query approximation, dense itemsets, and tiles

Author: Seppänen Jouni K.
Publication venue: Teknillinen korkeakoulu
Publication date: 31/05/2006
Field of study

Frequent itemsets are one of the best known concepts in data mining, and there is active research in itemset mining algorithms. An itemset is frequent in a database if its items co-occur in sufficiently many records. This thesis addresses two questions related to frequent itemsets. The first question is raised by a method for approximating logical queries by an inclusion-exclusion sum truncated to the terms corresponding to the frequent itemsets: how good are the approximations thereby obtained? The answer is twofold: in theory, the worst-case bound for the algorithm is very large, and a construction is given that shows the bound to be tight; but in practice, the approximations tend to be much closer to the correct answer than in the worst case. While some other algorithms based on frequent itemsets yield even better approximations, they are not as widely applicable. The second question concerns extending the definition of frequent itemsets to relax the requirement of perfect co-occurrence: highly correlated items may form an interesting set, even if they never co-occur in a single record. The problem is to formalize this idea in a way that still admits efficient mining algorithms. Two different approaches are used. First, dense itemsets are defined in a manner similar to the usual frequent itemsets and can be found using a modification of the original itemset mining algorithm. Second, tiles are defined in a different way so as to form a model for the whole data, unlike frequent and dense itemsets. A heuristic algorithm based on spectral properties of the data is given and some of its properties are explored.Yksi tiedon louhinnan tunnetuimmista käsitteistä ovat kattavat joukot, ja niiden etsintäalgoritmeja tutkitaan aktiivisesti. Joukko on tietokannassa kattava, jos sen alkiot esiintyvät yhdessä riittävän monessa tietueessa. Väitöskirjassa käsitellään kahta kattaviin joukkoihin liittyvää kysymystä. Ensimmäinen liittyy algoritmiin, jolla arvioidaan loogisten kyselyjen tuloksia laskemalla inkluusio-ekskluusio-summa pelkästään kattavilla joukoilla; kysymys on, kuinka hyviä arvioita näin saadaan. Väitöskirjassa annetaan kaksi vastausta: Teoriassa algoritmin pahimman tapauksen raja on hyvin suuri, ja vastaesimerkillä osoitetaan, että raja on tiukka. Käytännössä arviot ovat paljon lähempänä oikeaa tulosta kuin teoreettinen raja antaa ymmärtää. Arvioita vertaillaan eräisiin muihin algoritmeihin, joiden tulokset ovat vielä parempia mutta jotka eivät ole yhtä yleisesti sovellettavissa. Toinen kysymys koskee kattavien joukkojen määritelmän yleistämistä siten, että täydellisen yhteisesiintymisen vaatimuksesta tingitään. Joukko korreloituneita alkioita voi olla kiinnostava, vaikka alkiot eivät koskaan esiintyisi kaikki samassa tietueessa. Ongelma on tämän ajatuksen muuttaminen sellaiseksi määritelmäksi, että tehokkaita louhinta-algoritmeja voidaan käyttää. Väitöskirjassa esitetään kaksi lähestymistapaa. Ensinnäkin tiheät kattavat joukot määritellään samanlaiseen tapaan kuin tavalliset kattavat joukot, ja ne voidaan löytää samantyyppisellä algoritmilla. Toiseksi määritellään laatat, jotka muodostavat koko datalle mallin, toisin kuin kattavat ja tiheät kattavat joukot. Laattojen etsimistä varten kuvataan datan spektraalisiin ominaisuuksiin perustuva heuristiikka, jonka eräitä ominaisuuksia tutkitaan.reviewe

Aaltodoc Publication Archive

Large-Scale Pattern-Based Information Extraction from the World Wide Web

Author: Blohm Sebastian
Publication venue: KIT Scientific Publishing
Publication date: 30/07/2019
Field of study

Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web

Directory of Open Access Books (DOAB)

Aggressive aggregation

Author: Gossen Frederik Jakob
Publication venue
Publication date: 01/01/2021
Field of study

Among the first steps in a compilation pipeline is the construction of an Intermediate Representation (IR), an in-memory representation of the input program. Any attempt to program optimisation, both in terms of size and running time, has to operate on this structure. There may be one or multiple such IRs, however, most compilers use some form of a Control Flow Graph (CFG) internally. This representation clearly aims at general-purpose programming languages, for which it is well suited and allows for many classical program optimisations. On the other hand, a growing structural difference between the input program and the chosen IR can lose or obfuscate information that can be crucial for effective optimisation. With today’s rise of a multitude of different programming languages, Domain-Specific Languages (DSLs), and computing platforms, the classical machine-oriented IR is reaching its limits and a broader variety of IRs is needed. This realisation yielded, e.g., Multi-Level Intermediate Representation (MLIR), a compiler framework that facilitates the creation of a wide range of IRs and encourages their reuse among different programming languages and the corresponding compilers. In this modern spirit, this dissertation explores the potential of Algebraic Decision Diagrams (ADDs) as an IR for (domain-specific) program optimisation. The data structure remains the state of the art for Boolean function representation for more than thirty years and is well-known for its optimality in size and depth, i.e. running time. As such, it is ideally suited to represent the corresponding classes of programs in the role of an IR. We will discuss its application in a variety of different program domains, ranging from DSLs to machine-learned programs and even to general-purpose programming languages. Two representatives for DSLs, a graphical and a textual one, prove the adequacy of ADDs for the program optimisation of modelled decision services. The resulting DSLs facilitate experimentation with ADDs and provide valuable insight into their potential and limitations: input programs can be aggregated in a radical fashion, at the risk of the occasional exponential growth. With the aggregation of large Random Forests into a single aggregated ADD, we bring this potential to a program domain of practical relevance. The results are impressive: both running time and size of the Random Forest program are reduced by multiple orders of magnitude. It turns out that this ADD-based aggregation can be generalised, even to generaliii purpose programming languages. The resulting method achieves impressive speedups for a seemingly optimal program: the iterative Fibonacci implementation. Altogether, ADDs facilitate effective program optimisation where the input programs allow for a natural transformation to the data structure. In these cases, they have proven to be an extremely powerful tool for the optimisation of a program’s running time and, in some cases, of its size. The exploration of their potential as an IR has only started and deserves attention in future research

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

A constraint-based grammar for dialogue utterances

Author: Verlinden M.E.M.C.
Publication venue: [n.n.]
Publication date: 01/01/1999
Field of study

Tilburg University Repository

Recommended from our members

Latin Christians in the Literary Landscape of Early Rus, c. 988-1330

Author: Sykes Catherine Philippa
Publication venue: University of Cambridge
Publication date: 05/03/2018
Field of study

In the wake of the recent wave of interest in the ties between Early Rus and the Latin world, this dissertation investigates conceptions and depictions of Latin Christians in Early Rusian texts. Unlike previous smaller-scale studies, the present study takes into consideration all indigenous Early Rusian narrative sources which make reference to Latins or the Latin world. Its contribution is twofold. Firstly, it overturns the still prevalent assumption that Early Rusian writers tended to portray Latins as religious Others. There was certainly a place in Early Rusian writing for religious polemic against the Latin faith, but as I show, this place was very restricted. Secondly, having established the considerable diversity and complexity of rhetorical approaches to Latins, this study analyses and explains rhetorical patterns in Early Rusian portrayals of Latins and Latin Christendom. Scholars have tended to interpret these patterns as primarily influenced by extra-textual factors (most often, a text’s time of composition). This study, however, establishes that textual factors—specifically genre and theme—are the best predictors of a text’s portrayal of Latins, and explains the appearance and evolution of particular generic and thematic representations. It also demonstrates that a text’s place of composition tends to have a greater influence on its depictions of Latins than its time of composition. Through close engagement with the subtleties and ambiguities of Early Rusian depictions of Latins, this study furthers contemporary debate on questions of narrative, identity and difference in Rus and the medieval world.Funded by the Centre for East European Language-Based Area Studies (CEELBAS

Apollo (Cambridge)

Machine Learning in Automated Text Categorization

Author: ANDROUTSOPOULOS I.
ATTARDI G.
BAKER L.D.
BIEBRICHER P.
CAROPRESO M.F.
CAVNAR W.B.
CHAKRABARTI S.
CLACK C.
CLEVERDON C.
COHEN W. W.
COHEN W. W.
COHEN W.W.
DAGAN I.
DEERWESTER S.
DENOYER L.
DIAZ ESTEBAN A.
DRUCKER H.
DUMAIS S.T.
DUMAIS S.T.
ESCUDERO G.
Fabrizio Sebastiani
FIELD B.
FORSYTH R. S.
FUHR N.
FUHR N.
FUHR N.
FURNKRANZ J.
GALAVOTTI L.
GALE W. A.
GOVERT N.
GRAY W.A.
GUTHRIE L.
HAYES P.J.
HEAPS H.
HERSH W.
HULL D. A.
HULL D. A.
ITTNER D.J.
IWAYAMA M.
IYER R.D.
JOACHIMS T.
JOACHIMS T.
JOACHIMS T.
JOHN G. H.
JUNKER M.
JUNKER M.
KESSLER B.
KIM Y.-H.
KLINKENBERG R.
KNORZ G.
KOLLER D.
LAM S.L.
LAM W.
LAM W.
LANG K.
LARKEY L. S.
LARKEY L. S.
LARKEY L.S.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LI H.
LI Y.H.
LIERE R.
LIM J. H.
MASAND B.
MASAND B.
MCCALLUM A. K.
MCCALLUM A.K.
MLADENIC D.
MLADENIC D.
MOULINIER I.
MOULINIER I.
MYERS K.
NG H.T.
OH H.-J.
PAZIENZA M. T.
RILOFF E.
ROBERTSON S.E.
ROBERTSON S.E.
ROTH D.
RUIZ M.E.
SABLE C.L.
SARACEVIC T.
SCHAPIRE R. E.
SCHUTZE H.
SCHUTZE H.
SCOTT S.
SEBASTIANI F.
SINGHAL A.
SLONIM N.
TAIRA H.
TUMER K.
TZERAS K.
VAN RIJSBERGEN C. J.
WIENER E.D.
YANG Y.
YANG Y.
YANG Y.
YANG Y.
YU K.L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

arXiv.org e-Print Archive

CiteSeerX

Crossref