702 research outputs found
Linking Text and Image with SVG
Annotation and linking (or referring) have been described as "scholarly primitives", basic methods used in scholarly research and publication of all kinds. The online publication of manuscript images is one basic use case where the need for linking and annotation is very clear. High resolution images are of great use to scholars and transcriptions of texts provide for search and browsing, so the ideal method for the digital publication of manuscript works is the presentation of page images plus a transcription of the text therein. This has become a standard method, but leaves open the questions of how deeply the linkages can be done and how best to handle the annotation of sections of the image. This paper presents a new method (named img2xml) for connecting text and image using an XML-based tracing of the text on the page image. The tracing method was developed as part of a series of experiments in text and image linking beginning in the summer of 2008 and will continue under a grant funded by the National Endowment for the Humanities. It employs Scalable Vector Graphics (SVG) to represent the text in an image of a manuscript page in a referenceable form and enables linking and annotation of the page image in a variety of ways. The paper goes on to discuss the scholarly requirements for tools that will be developed around the tracing method, and explores some of the issues raised by the img2xml method
Towards a multimedia formatting vocabulary
Time-based, media-centric Web presentations can be described declaratively in the XML world through the development of languages such as SMIL. It is difficult, however, to fully integrate them in a complete document transformation processing chain. In order to achieve the desired processing of data-driven, time-based, media-centric presentations, the text-flow based formatting vocabularies used by style languages such as XSL, CSS and DSSSL need to be extended. The paper presents a selection of use cases which are used to derive a list of requirements for a multimedia style and transformation formatting vocabulary. The boundaries of applicability of existing text-based formatting models for media-centric transformations are analyzed. The paper then discusses the advantages and disadvantages of a fully-fledged time-based multimedia formatting model. Finally, the discussion is illustrated by describing the key properties of the example multimedia formatting vocabulary currently implemented in the back-end of our Cuypers multimedia transformation engine
Visualization of Online Deals
This project identifies the impact of online deals and coupons on the life of people. The basic idea is to find the trend in the sales of these deals and this project would be helpful for companies, restaurants and dealers who are trying to sell coupons to popularize their product. The final output of the project would be a timeline graph with the deals displayed based on the month of their sale. Information like the original price, deal price, discount and number of coupons sold will be displayed in a pop-up window when a deal is selected. Different colors are used to differentiate the different types of deals. A pie chart depicting this information for various US states is created which provides a summarized view of the deals. This way, an overall picture about the deals sold for a particular month can be obtained
Towards a multimedia formatting vocabulary
Time-based, media-centric Web presentations can be described declaratively in the XML world through the development of languages such as SMIL. It is difficult, however, to fully integrate them in a complete document transformation processing chain. In order to achieve the desired processing of data-driven, time-based, media-centric presentations, the text-flow based formatting vocabularies used by style languages such as XSL, CSS and DSSSL need to be extended. The paper presents a selection of use cases which are used to derive a list of requirements for a multimedia style and transformation formatting vocabulary. The boundaries of applicability of existing text-based formatting models for media-centric transformations are analyzed. The paper then discusses the advantages and disadvantages of a fully-fledged time-based multimedia formatting model. Finally, the discussion is illustrated by describing the key properties of the example multimedia formatting vocabulary currently implemented in the back-end of our Cuypers multimedia transformation engine
Libraries and Information Systems Need XML/RDF... but Do They Know It?
This article presents an approach to the uses of XML (eXtensible Markup Language) and Semantic Web technologies in
the field of information services, focusing mainly on the creation and management of digital libraries compared to traditional
libraries, while paying special attention to the concept and application of metadata, and RDF based integration
A Survey for Graphic Design Intelligence
Graphic design is an effective language for visual communication. Using
complex composition of visual elements (e.g., shape, color, font) guided by
design principles and aesthetics, design helps produce more visually-appealing
content. The creation of a harmonious design requires carefully selecting and
combining different visual elements, which can be challenging and
time-consuming. To expedite the design process, emerging AI techniques have
been proposed to automatize tedious tasks and facilitate human creativity.
However, most current works only focus on specific tasks targeting at different
scenarios without a high-level abstraction. This paper aims to provide a
systematic overview of graphic design intelligence and summarize literature in
the taxonomy of representation, understanding and generation. Specifically we
consider related works for individual visual elements as well as the overall
design composition. Furthermore, we highlight some of the potential directions
for future explorations.Comment: 10 pages, 2 figure
Common Atlas Format and 3D Brain Atlas Reconstructor: Infrastructure for Constructing 3D Brain Atlases
One of the challenges of modern neuroscience is integrating voluminous data of diferent modalities derived from a variety of specimens. This task requires a common spatial framework that can be provided by brain atlases. The first atlases were limited to two-dimentional presentation of structural data. Recently, attempts at creating 3D atlases have been made to offer navigation within non-standard anatomical planes and improve capability of localization of different types of data within the brain volume. The 3D atlases available so far have been created using frameworks which make it difficult for other researchers to replicate the results. To facilitate reproducible research and data sharing in the field we propose an SVG-based Common Atlas Format (CAF) to store 2D atlas delineations or other compatible data and 3D Brain Atlas Reconstructor (3dBAR), software dedicated to automated reconstruction of three-dimensional brain structures from 2D atlas data. The basic functionality is provided by (1) a set of parsers which translate various atlases from a number of formats into the CAF, and (2) a module generating 3D models from CAF datasets. The whole reconstruction process is reproducible and can easily be configured, tracked and reviewed, which facilitates fixing errors. Manual corrections can be made when automatic reconstruction is not sufficient. The software was designed to simplify interoperability with other neuroinformatics tools by using open file formats. The content can easily be exchanged at any stage of data processing. The framework allows for the addition of new public or proprietary content
Management of Scientific Images: An approach to the extraction, annotation and retrieval of figures in the field of High Energy Physics
El entorno de la información en la primera década del siglo XXI no tiene precedentes. Las barreras físicas que han limitado el acceso al conocimiento están desapareciendo a medida que los métodos tradicionales de acceso a información se reemplazan o se mejoran gracias al uso de sistemas basados en computador. Los sistemas digitales son capaces de gestionar colecciones mucho más grandes de documentos, confrontando a los usuarios de información con la avalancha de documentos asociados a su tópico de interés. Esta nueva situación ha creado un incentivo para el desarrollo de técnicas de minería de datos y la creación de motores de búsqueda más eficientes y capaces de limitar los resultados de búsqueda a un subconjunto reducido de los más relevantes. Sin embargo, la mayoría de los motores de búsqueda en la actualidad trabajan con descripciones textuales. Estas descripciones se pueden extraer o bien del contenido o a través de fuentes externas. La recuperación basada en el contenido no textual de documentos es un tema de investigación continua. En particular, la recuperación de imágenes y el desentrañar la información contenida en ellas están suscitando un gran interés en la comunidad científica. Las bibliotecas digitales se sitúan en una posición especial dentro de los sistemas que facilitan el acceso al conocimiento. Actúan como repositorios de documentos que comparten algunas características comunes (por ejemplo, pertenecer a la misma área de conocimiento o ser publicados por la misma institución) y como tales contienen documentos considerados de interés para un grupo particular de usuarios. Además, facilitan funcionalidades de recuperación sobre las colecciones gestionadas. Normalmente, las publicaciones científicas son las unidades más pequeñas gestionadas por las bibliotecas digitales científicas. Sin embargo, en el proceso de creación científica hay diferentes tipos de artefactos, entre otros: figuras y conjuntos de datos. Las figuras juegan un papel particularmente importante en el proceso de publicación científica. Representan los datos en una forma gráfica que nos permite mostrar patrones sobre grandes conjuntos de datos y transmitir ideas complejas de un modo fácilmente entendible. Los sistemas existentes para bibliotecas digitales facilitan el acceso a figuras, pero solo como parte de los ficheros sobre los que se serializa la publicación entera. El objetivo de esta tesis es proponer un conjunto de métodos ytécnicas que permitan transformar las figuras en productos de primera clase dentro del proceso de publicación científica, permitiendo que los investigadores puedan obtener el máximo beneficio a la hora de realizar búsquedas y revisiones de bibliografía existente. Los métodos y técnicas propuestos están orientados a facilitar la adquisición, anotación semántica y búsqueda de figuras contenidas en publicaciones científicas. Para demostrar la completitud de la investigación se han ilustrado las teorías propuestas mediante ejemplos en el campo de la Física de Partículas (también conocido como Física de Altas Energías). Para aquellos casos en los que se han necesitadoo en las figuras que aparecen con más frecuencia en las publicaciones de Física de Partículas: los gráficos científicos denominados en inglés con el término plots. Los prototipos que propuestas más detalladas han desarrollado para esta tesis se han integrado parcialmente dentro del software Invenio (1) para bibliotecas digitales, así como dentro de INSPIRE, una de las mayores bibliotecas digitales en Física de Partículas mantenida gracias a la colaboración de grandes laboratorios y centros de investigación como son el CERN, SLAC, DESY y Fermilab. 1). http://invenio-software.org
Recommended from our members
Automatic Analysis and Validation of the Chemical Literature
ThesisMethods to automatically extract and validate data from the chemical literature in legacy formats to machine-understandable forms are examined. The work focuses of three types of data: analytical data reported in articles, computational chemistry output files and crystallographic information files (CIFs). It is shown that machines are capable of reading and extracting analytical data from the current legacy formats with high recall and precision. Regular expressions cannot identify chemical names with high precision or recall but non-deterministic methods perform significantly better. The lack of machine-understandable connection tables in the literature has been identified as the major issue preventing molecule-based data-driven science being performed in the area. The extraction of data from computational chemistry output files using parser-like approaches is shown to be not generally possible although such methods work well for input files. A hierarchical regular expression based approach can parse > 99:9% of the output files correctly although significant human input is required to prepare the templates. CIFs may be parsed with extremely high recall and precision, contain connection tables and the data is of high quality. The comparison of bond lengths calculated by two computational chemistry programs show good agreement in general but structures containing specific moieties cause discrepancies. An initial protocol for the high-throughput geometry optimisation of molecules extracted from the CIFs is presented and the refinement of this protocol is discussed. Differences in bond length between calculated and experimentally determined values from the CIFs of less than 0.03 Angstrom are shown to be expected by random error. The final protocol is used to find high-quality structures from crystallography which can be reused for further science.Unilever Centre for Molecular Science Informatic
Halcyon -- A Pathology Imaging and Feature analysis and Management System
Halcyon is a new pathology imaging analysis and feature management system
based on W3C linked-data open standards and is designed to scale to support the
needs for the voluminous production of features from deep-learning feature
pipelines. Halcyon can support multiple users with a web-based UX with access
to all user data over a standards-based web API allowing for integration with
other processes and software systems. Identity management and data security is
also provided.Comment: 15 pages, 11 figures. arXiv admin note: text overlap with
arXiv:2005.0646
- …