Search CORE

557,582 research outputs found

Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition

Author: Fornés Alicia
Kang Lei
Riba Pau
Rusiñol Marçal
Villegas Mauricio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/05/2020
Field of study

Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.Comment: Accepted to WACV 202

arXiv.org e-Print Archive

Crossref

Considering documents in lifelog information retrieval

Author: Gupta Rashmi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2018
Field of study

Lifelogging is a research topic that is receiving increasing attention and although lifelog research has progressed in recent years, the concept of what represents a document in lifelog retrieval has not yet been sufficiently explored. Hence, the generation of multimodal lifelog documents is a fundamental concept that must be addressed. In this paper, I introduce my general perspective on generating documents in lifelogging and reflect on learnings from collecting multimodal lifelog data from a number of participants in a study on lifelog data organization. In addition, the main motivation be- hind document generation is proposed and the challenges faced while collecting data and generating documents are discussed in detail. Finally, a process for organizing the documents in lifelog data retrieval is proposed, which I intend to follow in my PhD research

DCU Online Research Access Service

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

Online Research Database In Technology

Recommended from our members

Producing Malaria Indicators Through District Health Information Software (DHIS2): Practices, Processes And Challenges In Kenya

Author: Okello George Awuor
Publication venue
Publication date: 27/11/2017
Field of study

Globally there is increasing interest in malaria indicators produced through routine information systems. Deficiencies in routine health information systems in many malaria endemic countries are well recognized and interventions such as the computerization of District Health Information Systems have been implemented to improve data quality, demand and use. However, little is known about the micro-practices and processes that shape routine malaria data generation at the frontline where these data are collected and reported. Using an ethnographic approach, this thesis critically examined how data for constructing malaria indicators are collected and reported through the District Health Information Software (DHIS2) in Kenya. The study was conducted over 18-months in four frontline health facilities and two sub-county health records offices. Data collection involved observations, review of tools and data quality audits, interviews and document reviews. Data were analysed using a thematic analysis approach. This study found that malaria indicator data generation at the health facility level was undermined by a range of factors including: understaffing; human resource management challenges; stock-out of essential commodities; poorly designed tools; and unclear/missing instructions for data collection and collation. In response to these challenges, health workers adopted various coping mechanisms such as informal task shifting and role sharing. They also used improvised tools which sustained the data collection process but had varied implications for the outcome of the process. Data quality problems were concealed in aggregated monthly reports. The DHIS2 autocorrected errors and masked data quality problems. Problems were compounded by inadequate data collection support systems such as supervision. Many challenges for malaria data generation were not HMIS or disease specific but reflected wider health system weaknesses. Any interventions seeking to improve routine malaria data generation must therefore look beyond malaria or HMIS initiatives to also include those that address the broader contextual factors that shape malaria data generation

Open Research Online (The Open University)

Applying digital content management to support localisation

Author: Jones Gareth J.F.
Lawless Séamus
O'Connor Alexander
Wade Vincent
Zhou Dong
Publication venue: Localisation Research Centre
Publication date: 01/10/2009
Field of study

The retrieval and presentation of digital content such as that on the World Wide Web (WWW) is a substantial area of research. While recent years have seen huge expansion in the size of web-based archives that can be searched efficiently by commercial search engines, the presentation of potentially relevant content is still limited to ranked document lists represented by simple text snippets or image keyframe surrogates. There is expanding interest in techniques to personalise the presentation of content to improve the richness and effectiveness of the user experience. One of the most significant challenges to achieving this is the increasingly multilingual nature of this data, and the need to provide suitably localised responses to users based on this content. The Digital Content Management (DCM) track of the Centre for Next Generation Localisation (CNGL) is seeking to develop technologies to support advanced personalised access and presentation of information by combining elements from the existing research areas of Adaptive Hypermedia and Information Retrieval. The combination of these technologies is intended to produce significant improvements in the way users access information. We review key features of these technologies and introduce early ideas for how these technologies can support localisation and localised content before concluding with some impressions of future directions in DCM

Irish Universities

DCU Online Research Access Service

The Economic Importance of Draught Oxen on Small Farms in Namibia\u27s Eastern Caprivi Region

Author: Conroy Andrew B.
Teweldmehidin Mogos Yakob
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 04/05/2010
Field of study

The main aim of this study was to analyse and document the value of smallholder farmers’ use of Draught Animal Power (DAP) systems in the Eastern Caprivi Region and to test the economic viability of DAP usage versus using tractors. This study applied Rapid Rural Appraisal techniques (RRA), including a survey. Semi-structured interviews were conducted with 312 farmers at their farms and data was gathered on the use of and economics related to the draught animal power system. Crop enterprise budgets, project reports, expert opinions and group discussions were analysed. The research found that the use of animal power performs better in terms of physical productivity per ha compared to tractor usage. Furthermore, agricultural production in Sibinda village area, with the use of oxen outperformed the other systems when it was evaluated with parametric analysis. From a financial perspective, faremrs in Sibinda and Linyanti using oxen ranked above their counterparts using tractors. Further, the exercise indicated that farmers are facing a multitude of challenges such as damage incurred from wild animals and high input costs. There were many difficulties facing the next generation in entering commericial agricultural production in Caprivi within the current cost-price squeeze environment. Therefore, understanding the role draught oxen power can play as a tool to increase the level of success for new farmers’ in agricultural production and management was noted

UNH Scholars' Repository

EURL ECVAM Workshop on New Generation of Physiologically-Based Kinetic Models in Risk Assessment

Author: Bessems J
Desalegn A
Dorne JL
Gosling JP
Heringa MB
Joossens E
Klaric M
Kramer N
Loizou G
Louisse J
Lumen A
Madden JC
Paini A
Patterson EA
Proenca S
Punt A
Setzer RW
Suciu N
Tan YM
Troutman J
Worth A
Yoon M
Publication venue: 'Publications Office of the European Union'
Publication date
Field of study

The European Union Reference Laboratory for Alternatives to Animal Testing (EURL ECVAM) Strategy Document on Toxicokinetics (TK) outlines strategies to enable prediction of systemic toxicity by applying new approach methodologies (NAM). The central feature of the strategy focuses on using physiologically-based kinetic (PBK) modelling to integrate data generated by in vitro and in silico methods for absorption, distribution, metabolism, and excretion (ADME) in humans for predicting whole-body TK behaviour, for environmental chemicals, drugs, nano-materials, and mixtures. In order to facilitate acceptance and use of this new generation of PBK models, which do not rely on animal/human in vivo data in the regulatory domain, experts were invited by EURL ECVAM to (i) identify current challenges in the application of PBK modelling to support regulatory decision making; (ii) discuss challenges in constructing models with no in vivo kinetic data and opportunities for estimating parameter values using in vitro and in silico methods; (iii) present the challenges in assessing model credibility relying on non-animal data and address strengths, uncertainties and limitations in such an approach; (iv) establish a good kinetic modelling practice workflow to serve as the foundation for guidance on the generation and use of in vitro and in silico data to construct PBK models designed to support regulatory decision making. To gauge the current state of PBK applications, experts were asked upfront of the workshop to fill a short survey. In the workshop, using presentations and discussions, the experts elaborated on the importance of being transparent about the model construct, assumptions, and applications to support assessment of model credibility. The experts offered several recommendations to address commonly perceived limitations of parameterization and evaluation of PBK models developed using non-animal data and its use in risk assessment, these include: (i) develop a decision tree for model construction; (ii) set up a task force for independent model peer review; (iii) establish a scoring system for model evaluation; (iv) attract additional funding to develop accessible modelling software.; (v) improve and facilitate communication between scientists (model developers, data provider) and risk assessors/regulators; and (vi) organise specific training for end users. The experts also acknowledged the critical need for developing a guidance document on building, characterising, reporting and documenting PBK models using non-animal data. This document would also need to include guidance on interpreting the model analysis for various risk assessment purposes, such as incorporating PBK models in integrated strategy approaches and integrating them with in vitro toxicity testing and adverse outcome pathways. This proposed guidance document will promote the development of PBK models using in vitro and silico data and facilitate the regulatory acceptance of PBK models for assessing safety of chemicals

LJMU Research Online (Liverpool John Moores University)