557,582 research outputs found
Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition
Handwritten Text Recognition (HTR) is still a challenging problem because it
must deal with two important difficulties: the variability among writing
styles, and the scarcity of labelled data. To alleviate such problems,
synthetic data generation and data augmentation are typically used to train HTR
systems. However, training with such data produces encouraging but still
inaccurate transcriptions in real words. In this paper, we propose an
unsupervised writer adaptation approach that is able to automatically adjust a
generic handwritten word recognizer, fully trained with synthetic fonts,
towards a new incoming writer. We have experimentally validated our proposal
using five different datasets, covering several challenges (i) the document
source: modern and historic samples, which may involve paper degradation
problems; (ii) different handwriting styles: single and multiple writer
collections; and (iii) language, which involves different character
combinations. Across these challenging collections, we show that our system is
able to maintain its performance, thus, it provides a practical and generic
approach to deal with new document collections without requiring any expensive
and tedious manual annotation step.Comment: Accepted to WACV 202
Considering documents in lifelog information retrieval
Lifelogging is a research topic that is receiving increasing attention and although lifelog research has progressed in recent years, the concept of what represents a document in lifelog retrieval has not yet been sufficiently explored. Hence, the generation of multimodal lifelog documents is a fundamental concept that must be addressed. In this paper, I introduce my general perspective on generating documents in lifelogging and reflect on learnings from collecting multimodal lifelog data from a number of participants in a study on lifelog data organization. In addition, the main motivation be- hind document generation is proposed and the challenges faced while collecting data and generating documents are discussed in detail. Finally, a process for organizing the documents in lifelog data retrieval is proposed, which I intend to follow in my PhD research
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Recommended from our members
Producing Malaria Indicators Through District Health Information Software (DHIS2): Practices, Processes And Challenges In Kenya
Globally there is increasing interest in malaria indicators produced through routine information systems. Deficiencies in routine health information systems in many malaria endemic countries are well recognized and interventions such as the computerization of District Health Information Systems have been implemented to improve data quality, demand and use. However, little is known about the micro-practices and processes that shape routine malaria data generation at the frontline where these data are collected and reported.
Using an ethnographic approach, this thesis critically examined how data for constructing malaria indicators are collected and reported through the District Health Information Software (DHIS2) in Kenya. The study was conducted over 18-months in four frontline health facilities and two sub-county health records offices. Data collection involved observations, review of tools and data quality audits, interviews and document reviews. Data were analysed using a thematic analysis approach.
This study found that malaria indicator data generation at the health facility level was undermined by a range of factors including: understaffing; human resource management challenges; stock-out of essential commodities; poorly designed tools; and unclear/missing instructions for data collection and collation. In response to these challenges, health workers adopted various coping mechanisms such as informal task shifting and role sharing. They also used improvised tools which sustained the data collection process but had varied implications for the outcome of the process. Data quality problems were concealed in aggregated monthly reports. The DHIS2 autocorrected errors and masked data quality problems. Problems were compounded by inadequate data collection support systems such as supervision.
Many challenges for malaria data generation were not HMIS or disease specific but reflected wider health system weaknesses. Any interventions seeking to improve routine malaria data generation must therefore look beyond malaria or HMIS initiatives to also include those that address the broader contextual factors that shape malaria data generation
Applying digital content management to support localisation
The retrieval and presentation of digital content such as that on the World Wide Web (WWW) is a substantial area of research. While recent years have seen huge expansion in the size of web-based archives that can be searched efficiently by commercial search engines, the presentation of potentially relevant content is still limited to ranked document lists represented by simple text snippets or image keyframe surrogates. There is expanding interest in techniques to personalise the presentation of content to improve the richness and effectiveness of the user experience. One of the most significant challenges to achieving this is the increasingly multilingual nature of this data, and the need to provide suitably localised responses to users based on this content. The Digital Content Management (DCM) track of the Centre for Next Generation Localisation (CNGL) is seeking to develop technologies to support advanced personalised access and presentation of information by combining elements from the existing research areas of Adaptive Hypermedia and Information Retrieval. The combination of these technologies is intended to produce significant improvements in the way users access information. We review key features of these technologies and introduce early ideas for how these technologies can support localisation and localised content before concluding with some impressions of future directions in DCM
The Economic Importance of Draught Oxen on Small Farms in Namibia\u27s Eastern Caprivi Region
The main aim of this study was to analyse and document the value of smallholder farmersâ use of Draught Animal Power (DAP) systems in the Eastern Caprivi Region and to test the economic viability of DAP usage versus using tractors. This study applied Rapid Rural Appraisal techniques (RRA), including a survey. Semi-structured interviews were conducted with 312 farmers at their farms and data was gathered on the use of and economics related to the draught animal power system. Crop enterprise budgets, project reports, expert opinions and group discussions were analysed. The research found that the use of animal power performs better in terms of physical productivity per ha compared to tractor usage. Furthermore, agricultural production in Sibinda village area, with the use of oxen outperformed the other systems when it was evaluated with parametric analysis. From a financial perspective, faremrs in Sibinda and Linyanti using oxen ranked above their counterparts using tractors. Further, the exercise indicated that farmers are facing a multitude of challenges such as damage incurred from wild animals and high input costs. There were many difficulties facing the next generation in entering commericial agricultural production in Caprivi within the current cost-price squeeze environment. Therefore, understanding the role draught oxen power can play as a tool to increase the level of success for new farmersâ in agricultural production and management was noted
EURL ECVAM Workshop on New Generation of Physiologically-Based Kinetic Models in Risk Assessment
The European Union Reference Laboratory for Alternatives to Animal Testing (EURL ECVAM) Strategy Document on Toxicokinetics (TK) outlines strategies to enable prediction of systemic toxicity by applying new approach methodologies (NAM). The central feature of the strategy focuses on using physiologically-based kinetic (PBK) modelling to integrate data generated by in vitro and in silico methods for absorption, distribution, metabolism, and excretion (ADME) in humans for predicting whole-body TK behaviour, for environmental chemicals, drugs, nano-materials, and mixtures. In order to facilitate acceptance and use of this new generation of PBK models, which do not rely on animal/human in vivo data in the regulatory domain, experts were invited by EURL ECVAM to (i) identify current challenges in the application of PBK modelling to support regulatory decision making; (ii) discuss challenges in constructing models with no in vivo kinetic data and opportunities for estimating parameter values using in vitro and in silico methods; (iii) present the challenges in assessing model credibility relying on non-animal data and address strengths, uncertainties and limitations in such an approach; (iv) establish a good kinetic modelling practice workflow to serve as the foundation for guidance on the generation and use of in vitro and in silico data to construct PBK models designed to support regulatory decision making.
To gauge the current state of PBK applications, experts were asked upfront of the workshop to fill a short survey. In the workshop, using presentations and discussions, the experts elaborated on the importance of being transparent about the model construct, assumptions, and applications to support assessment of model credibility. The experts offered several recommendations to address commonly perceived limitations of parameterization and evaluation of PBK models developed using non-animal data and its use in risk assessment, these include: (i) develop a decision tree for model construction; (ii) set up a task force for independent model peer review; (iii) establish a scoring system for model evaluation; (iv) attract additional funding to develop accessible modelling software.; (v) improve and facilitate communication between scientists (model developers, data provider) and risk assessors/regulators; and (vi) organise specific training for end users. The experts also acknowledged the critical need for developing a guidance document on building, characterising, reporting and documenting PBK models using non-animal data. This document would also need to include guidance on interpreting the model analysis for various risk assessment purposes, such as incorporating PBK models in integrated strategy approaches and integrating them with in vitro toxicity testing and adverse outcome pathways. This proposed guidance document will promote the development of PBK models using in vitro and silico data and facilitate the regulatory acceptance of PBK models for assessing safety of chemicals
- âŠ