Search CORE

46,946 research outputs found

Visual exploration and retrieval of XML document collections with the generic system X2

Author: Felix Weigel
François Bry
H Meuss
Holger Meuss
Klaus U. Schulz
S Ceri
S Mizzaro
Simone Leonardi
T Catarci
T Schlieder
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2005
Field of study

This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed

Crossref

Open Access LMU

Recommended from our members

Application of Radial Distribution Functions to Diffraction and Imaging Data: Interfacial Structures, Amorphous, Disordered Materials

Author: Mukaddem Karim Tanju
Publication venue: University of Cambridge
Publication date: 16/04/2020
Field of study

The central theme of this thesis is the application of radial and pair distribution function analysis to materials characterisation problems for nanotechnology. These concepts are introduced in Chapter 1, and the associated methods are described in Chapter 2. Chapter 3 details the first of the results which discusses the design and development of a software tool called ImageDataExtractor. This auto-extracts microscopy images and then analyses them to afford quantitative information regarding particles in a sample, such as shape, size and distribution. It realises an opportunity for data-mining the ubiquity of readily available images in the literature. Chapter 4 presents results of the development and execution of a novel experimental technique, called glancing-angle pair distribution function (gaPDF) analysis, applied to the structure of the working electrode in dye-sensitised solar cells (DSSCs). This structure was successfully observed, validating this novel method. The investigation also suggested preferred binding modes of the carboxylic acid anchoring groups present in this interfacial structure. Chapters 5 and 6 demonstrate the application of PDF analysis to synchrotron-based powder diffraction data of two material case studies: the rare earth phosphate glass (REPG) (Gd2O3)0.230(P2O5)0.770, and four Ru based photo-isomers. The closest R…R rare earth separation, which governs optical properties of REPGs, was determined to be 4.2(1) Å, aided by various statistical techniques. Analysis on four Ru-based photo-isomers confirmed: the existence of local structure in such compounds, their ability to be photo-isomerised in powder form, the theoretical models constructed using computational techniques, and the lack of heterogeneity in photo-isomerisation throughout a given light-induced sample. Chapter 7 concludes the work and offers a future outlook

Apollo (Cambridge)

Image Retrieval in Digital Libraries - A Large Scale Multicollection Experimentation of Machine Learning techniques

Author: Chiron Guillaume,
Moreux Jean-Philippe
Publication venue: HAL CCSD
Publication date: 13/05/2017
Field of study

International audienceWhile historically digital heritage libraries were first powered in image mode, they quickly took advantage of OCR technology to index printed collections and consequently improve the scope and performance of the information retrieval services offered to users. But the access to iconographic resources has not progressed in the same way, and the latter remain in the shadows: manual incomplete and heterogeneous indexation, data silos by iconographic genre. Today, however, it would be possible to make better use of these resources, especially by exploiting the enormous volumes of OCR produced during the last two decades, and thus valorize these engravings, drawings, photographs, maps, etc. for their own value but also as an attractive entry point into the collections, supporting discovery and serenpidity from document to document and collection to collection. This article presents an ETL (extract-transform-load) approach to this need, that aims to: Identify andextract iconography wherever it may be found, in image collections but also in printed materials (dailies, magazines, monographies); Transform, harmonize and enrich the image descriptive metadata (in particular with machine learning classification tools); Load it all into a web app dedicated to image retrieval. The approach is pragmatically dual, since it involves leveraging existing digital resources and (virtually) on-the-shelf technologies.Si historiquement, les bibliothèques numériques patrimoniales furent d’abord alimentées par des images, elles profitèrent rapidement de la technologie OCR pour indexer les collections imprimées afin d’améliorer périmètre et performance du service de recherche d’information offert aux utilisateurs. Mais l’accès aux ressources iconographiques n’a pas connu les mêmes progrès et ces dernières demeurent dans l’ombre : indexation manuelle lacunaire, hétérogène et non viable à grande échelle ; silos documentaires par genre iconographique ; recherche par le contenu (CBIR, content-based image retrieval) encore peu opérationnelle sur les collections patrimoniales. Aujourd’hui, il serait pourtant possible de mieux valoriser ces ressources, en particulier en exploitant les énormes volumes d’OCR produits durant les deux dernières décennies (tant comme descripteur textuel que pour l’identification automatique des illustrations imprimées). Et ainsi mettre en valeur ces gravures, dessins, photographies, cartes, etc. pour leur valeur propre mais aussi comme point d’entrée dans les collections, en favorisant découverte et rebond de document en document, de collection à collection. Cet article décrit une approche ETL (extract-transform-load) appliquée aux images d’une bibliothèque numérique à vocation encyclopédique : identifier et extraire l’iconographie partout où elle se trouve (dans les collections image mais aussi dans les imprimés : presse, revue, monographie) ; transformer, harmoniser et enrichir ses métadonnées descriptives grâce à des techniques d’apprentissage machine – machine learning – pour la classification et l’indexation automatiques ; charger ces données dans une application web dédiée à la recherche iconographique (ou dans d’autres services de la bibliothèque). Approche qualifiée de pragmatique à double titre, puisqu’il s’agit de valoriser des ressources numériques existantes et de mettre à profit des technologies (quasiment) mâtures

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Sächsische Landesbibliothek - Staats- und Universitätsbibliothek Dresden (SLUB): Qucosa

Recommended from our members

Docudrama performance: realism, recognition and representation

Author: Bignell Jonathan
Publication venue: 'Manchester University Press'
Publication date: 01/01/2010
Field of study

Central Archive at the University of Reading

Sentimental classification analysis of polarity multi-view textual data using data mining techniques

Author: Ali Mohanad Faeq
Alkhazraji Adel Abdul-Jabbar
Hameed Mustafa Emad
Mohammed Ali A.
Talib Mohammed Saad
Yassir Ali Hameed
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/10/2020
Field of study

The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Adaptive Algorithms for Automated Processing of Document Images

Author: Agrawal Mudit
Publication venue
Publication date: 01/01/2011
Field of study

Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections. We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures. Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features. Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts

Digital Repository at the University of Maryland

The Constitution and Legislative History

Author: Nourse Victoria
Publication venue: Scholarship @ GEORGETOWN LAW
Publication date: 01/01/2014
Field of study

In this article, the author provides an extended analysis of the constitutional claims against legislative history, arguing that, under textualists’ own preference for constitutional text, the use of legislative history should be constitutional to the extent it is supported by Congress’s rulemaking power, a constitutionally enumerated power. This article has five parts. In part I, the author explains the importance of this question, considering the vast range of cases to which this claim of unconstitutionality could possibly apply—after all, statutory interpretation cases are the vast bulk of the work of the federal courts. She also explains why these claims should be of greater concern to a variety of constitutional theorists, particularly those who embrace theories of popular and common law constitutionalism, but as well to originalists. In part II, the author considers the textualist arguments against the constitutionality of legislative history. Article I, Section 7 provides that any bill must pass the House and the Senate and be presented to the President for veto or signature. As a number of textualists have argued, legislative history is not passed by both houses or signed by the President. Call this the “bicameralism argument.” Her answer to the bicameralism argument lies in a constitutional text that statutory textualists seem to have forgotten: Article I, Section 5 gives explicit power to Congress to set its own procedures, a power that gives legitimacy to legislative history created pursuant to those procedures. In fact, new developments in statutory interpretation theory (decision process theory) suggest that, in some cases, the only way to resolve textual conflict is to consider legislative procedure. In part III, the author considers a second prominent argument against the constitutionality of legislative history: non-delegation. Critics argue that Congress may not delegate the “legislative power” granted under the Constitution to members or committees, as only the entire Congress may constitutionally exercise that power. Call this the “non-delegation” argument. Again, her response is based on constitutional text: Article I, Section 5 specifically sanctions delegation to less than the whole of Congress; more importantly, there is no general norm against self-delegation stated explicitly or even implicitly in the Constitution. Finally, the author suggests that there is a certain inconsistency in the assertion of these claims: the non-self-delegation and bicameralism arguments can both be used to indict canons of construction, which textualists offer as the leading alternative to legislative history, but which have no supporting text comparable to Article I section 5 in the Constitution. In part IV, she considers arguments that judges’ use of legislative history violates the separation of powers because it allows the legislature to exceed the bounds of the “judicial power.” This argument can rather easily be turned on its head: in the quotations offered at the beginning of this article, members of Congress argue that judges are exercising the “legislative power” when they rewrite statutes without considering legislative history. As has been argued at length elsewhere, the use of “adjectival” argument in structural controversies—relying upon the terms “legislative, executive, and judicial”—perpetuates a weak understanding of the separation of powers, and one that the Constitution’s own text belies. The separation of powers does not prevent recourse to legislative history; in fact, as the article explains, blindness to legislative history may create different kinds of structural risks—risks to federalism, rather than risks to the separation of powers. Finally, in part V, the author concludes by suggesting that we should retire the strong form of the legislative history unconstitutionality argument, by which she means the claim that the constitution bars any and all legislative history. Instead, we should far more actively interrogate serious questions about the use of legislative history in particular cases. Can it really be wise—or even constitutional—for a judge to impose a meaning on an ambiguous statute with reference to the state-ments of a filibustering minority, or privilege some texts in ways that violate Congress’s rules? Fidelity to Congress, and the importance of Congress’s constitutional rules—what Francis Lieber once called the “common law” of the Congress—has yet to be theorized within this more pressing, but particular, sphere

bepress Legal Repository

Georgetown Law Scholarly Commons

Penn Law: Legal Scholarship Repository