612 research outputs found
Metadata enrichment for digital heritage: users as co-creators
This paper espouses the concept of metadata enrichment through an expert and user-focused approach to metadata creation and management. To this end, it is argued the Web 2.0 paradigm enables users to be proactive metadata creators. As Shirky (2008, p.47) argues Web 2.0’s social tools enable “action by loosely structured groups, operating without managerial direction and outside the profit motive”. Lagoze (2010, p. 37) advises, “the participatory nature of Web 2.0 should not be dismissed as just a popular phenomenon [or fad]”. Carletti (2016) proposes a participatory digital cultural heritage approach where Web 2.0 approaches such as crowdsourcing can be sued to enrich digital cultural objects. It is argued that “heritage crowdsourcing, community-centred projects or other forms of public participation”. On the other hand, the new collaborative approaches of Web 2.0 neither negate nor replace contemporary standards-based metadata approaches. Hence, this paper proposes a mixed metadata approach where user created metadata augments expert-created metadata and vice versa. The metadata creation process no longer remains to be the sole prerogative of the metadata expert. The Web 2.0 collaborative environment would now allow users to participate in both adding and re-using metadata. The case of expert-created (standards-based, top-down) and user-generated metadata (socially-constructed, bottom-up) approach to metadata are complementary rather than mutually-exclusive. The two approaches are often mistakenly considered as dichotomies, albeit incorrectly (Gruber, 2007; Wright, 2007) .
This paper espouses the importance of enriching digital information objects with descriptions pertaining the about-ness of information objects. Such richness and diversity of description, it is argued, could chiefly be achieved by involving users in the metadata creation process. This paper presents the importance of the paradigm of metadata enriching and metadata filtering for the cultural heritage domain. Metadata enriching states that a priori metadata that is instantiated and granularly structured by metadata experts is continually enriched through socially-constructed (post-hoc) metadata, whereby users are pro-actively engaged in co-creating metadata. The principle also states that metadata that is enriched is also contextually and semantically linked and openly accessible. In addition, metadata filtering states that metadata resulting from implementing the principle of enriching should be displayed for users in line with their needs and convenience. In both enriching and filtering, users should be considered as prosumers, resulting in what is called collective metadata intelligence
Time management : how to better manage your workload and time
Meeting proceedings of a seminar by the same name, held September 17, 2020
Time Management_ How To Better Manage Your Workload & Time
Meeting proceedings of a seminar by the same name, held September 16, 202
Automated classification of receipts and invoices along with document extraction
Companies might receive dozens or even hundreds of receipts and invoices per day. It
consumes a lot of working hours to keep them all organized – invoices must be paid on
time and receipts must be archived properly. This research aims to reduce the amount of
manual labor the organizing requires with automated classification.
Personally, I’m writing this thesis in collaboration with my workplace – a company called
Eneroc Ltd. They had a problem with document classification consuming too many working
hours. Therefore, they created a system to automate this process. The existing system
uses a text-based approach that searches for specific key words in the documents. The
system works rather well, but the company wanted to find out if some modern approach
could outperform the existing system and add more features into the process.
The goal of this research is to find out if a machine learning based approach could be
used to classify documents into invoices and receipts. In addition to the classification, the
approach should also be able to collect key information from the documents. This thesis
describes the workflow of creating a machine learning based solution to tackle the given
challenge.
The research resulted in an application that takes in invoices and receipts in PDF format.
The system trains a k-nearest neighbors model with training data, that was created in the
process of the research. The model is then used to classify different parts of the new PDF
files into predefined categories. The key information is extracted from these categories.
The k-NN model was validated with k-fold cross-validation. The validation showed that
the model is performing correctly. Some preprocessing was also introduced in the process,
which further improved the results. Good results with the k-NN model imply that using a
proper machine learning solution would be profitable.
The final classification between receipts and invoices, as well as the key information extraction,
is done based on the classified document parts. This works rather well on the
classification and simple key information extraction. But more complex key information
extraction – like the product list extraction – still requires more work.
The research proved that machine learning solution could be used to classify documents
into invoices and receipts, and also to collect key information from the documents. The
created application isn’t yet ready for deployment, but it gives a good foundation for
future development. The research also shows which steps to take next and where to focus
on when improving the system
Recommended from our members
Effective recordkeeping technologies to manage aging
Pacific Northwest Laboratory has investigated the capability of current recordkeeping technology to support aging management. This paper discusses technical issues associated with potential enhancements of nuclear plant records systems--from the perspective of the lessons learned about equipment aging degradation mechanisms and associated surveillance and monitoring techniques during the U. S. Nuclear Regulatory Commission's Nuclear Plant Aging Research Program. The paper considers both the specific types of technical data needed to ensure continued safe operation and the use of new technology to upgrade record systems. Specific topics discussed include: equipment reliability data needed to support the assessment of the impact of aging on the continued operation of the plant; operational history data to support the assessment of residual life of mechanical and structural components and piping; tools for the analysis and trending of equipment reliability data and operational history data; design and implementation of plant record systems that will provide a comprehensive and usable engineering design basis for the plant; proposed improvements in the data input process for the plant records system; computerization of plant records systems, including conversion of existing records into machine-readable forms
Analysis and Comparison of various Methods for Text Detection from Images using MSER Algorithm
In this paper analysis and comparison of various methods for text detection is carried by using canny edge detection algorithm and MSER based method along with the image enhancement which results in the improved performance in terms of text detection. In addition, we improve current MSERs by developing a contrast enhancement mechanism that enhances region stability of text patterns to remove the blurring caused during the capture of image Lucy Richardson de blurring Algorithm is used
Recognition and classification: the use of computer vision in the retail industry
Project work presented as a partial requisite to obtain the Master Degree in Information Management with specialization in Knowledge Management and Business intelligenceAutomatic recognition of text and classification of it, using image processing techniques such as optical character recognition and machine learning, are indicating new ways of capturing information on fast-moving consumer goods. Such systems can play an important role in market research processes and operations, in being more efficient and agile. The necessity is to create a system that is able to extract all text available on the packaging and quickly arrange it into attributes. The goal of this investigation is to use a combination of optical character recognition and machine learning to achieve a satisfactory level of efficiency and quality. In order for such a system to be introduced to the organization, it needs to be faster and more effective than currents process. One of the advantages of using such a system is the independence of the human factor, which leads to a higher probability of error
Time Management_ How to Better Manage Your Workload & Time
Meeting proceedings of a seminar by the same name, held September 16, 202
- …