18 research outputs found
Ground-Truth production in the tranScriptorium Project
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Tran Scriptorium is a 3-years project that aims to develop innovative, cost-effective solutions for the indexing, search and full transcription of historical handwritten document images, using Handwritten Text Recognition (HTR) technology. The production of ground-truth (GT) of a dataset of handwritten document images is among the first tasks. We address novel approaches for the faster production of this GT based on crowd-sourcing and on prior-knowledge methods. We also address here a novel low-cost semi-supervised procedure for obtaining pairs of correct line-level aligned detected/extracted text line images and text line transcripts, specially suitable for training models of the HTR technology employed in Tran Scriptorium.Work supported by the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement No.600707 - tranScriptorium.Gatos, B.; Louloudis, G.; Caser, T.; Grint, K.; Romero Gómez, V.; Sánchez Peiró, JA.; Toselli, AH.... (2014). Ground-Truth production in the tranScriptorium Project. En Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on. IEEE Computer Society - Conference Publishing Services (CPS). 237-241. https://doi.org/10.1109/DAS.2014.23S23724
Transforming scholarship in the archives through handwritten text recognition:Transkribus as a case study
Purpose: An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues. - Design/methodology/approach: This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material. - Findings: Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified. - Research limitations/implications: The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc. - Practical implications: Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field. - Social implications: The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals. - Originality/value: This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector
Character recognition in historical documents: Handwritten, cursive and printed documents
“Character recognition” refers to the procedure of ‘reading’ text using a computer, taking as input a document image as well as to the conversion of the document image to electronic text. This dissertation focuses on the segmentation of handwritten document images to the basic semantic units that comprise them, namely text lines and words. Concerning the problem of text line segmentation, we developed a new methodology whose novelties are: (i) an efficient block-based Hough transform in which voting occurs on the basis of equally spaced blocks after splitting of the connected components’ bounding box; (ii) a partitioning of the connected component domain into three spatial sub-domains, for which a different processing strategy of the corresponding connected components can be employed; and (iii) the efficient separation of vertically connected parts of text lines. The proposed text line segmentation methodology has been evaluated against other state-of-the-art text line segmentation methodologies and has proven to achieve better results. Concerning the word segmentation stage, we developed two different methodologies. Concerning the first methodology, the decision on whether a gap is between two words or inside a single word, we proposed a threshold which is calculated making use of several characteristics of the document image. On the second approach, we make use of a well-known methodology in the field of unsupervised clustering, the Gaussian mixture modeling in order to classify the gaps into each class. Experimental results prove the efficiency of the proposed methodologies. Finally, a novel two stage evaluation methodology for word segmentation techniques is proposed. This methodology treats the distance computation stage and the gap classification stage independently, in contrast to current evaluation methodologies for word segmentationΟ όρος «αναγνώριση χαρακτήρων» αφορά στην ‘ανάγνωση’ κειμένου από τον υπολογιστή ξεκινώντας από μία εικόνα κειμένου και στην μετατροπή της σε ηλεκτρονικό κείμενο. Στα πλαίσια της διδακτορικής διατριβής μελετήθηκε το στάδιο της κατάτμησης των χειρογράφων και αναπτύχθηκαν νέες μέθοδοι για την κατάτμηση τους στα βασικά στοιχεία που τα αποτελούν, δηλαδή στις γραμμές κειμένου και στις λέξεις. Αναφορικά με το πρόβλημα της κατάτμησης ενός εγγράφου σε γραμμές κειμένου, αναπτύχθηκε μία νέα μεθοδολογία η οποία στηρίζεται στον μετασχηματισμό Hough. Η καινοτομία της προτεινόμενης μεθοδολογίας συνίσταται στα εξής: (ι) εφαρμογή ενός τροποποιημένου μετασχηματισμού Hough στον οποίο η ψηφοφορία στον πίνακα συσσώρευσης γίνεται χρησιμοποιώντας σημεία από blocks ίσου πλάτους τα οποία προκύπτουν από διάσπαση των συνδεδεμένων τμημάτων (connected components), (ii) η διαμέριση του χώρου των συνδεδεμένων τμημάτων σε 3 υποχώρους τα συνδεδεμένα τμήματα των οποίων υπόκεινται σε διαφορετική επεξεργασία και (iii) αποδοτικός διαχωρισμός τμημάτων γειτονικών γραμμών που εφάπτονται. Αναφορικά με την κατάτμηση εικόνας εγγράφου στις λέξεις που την αποτελούν, αναπτύχθηκαν δύο μεθοδολογίες. Στην πρώτη μεθοδολογία, η απόφαση αν ένα κενό είναι μεταξύ λέξεων ή μεταξύ χαρακτήρων σε ίδια λέξη στηρίζεται στον προσδιορισμό ενός κατωφλίου το οποίο υπολογίζεται χρησιμοποιώντας βασικά γεωμετρικά χαρακτηριστικά της εικόνας. Σύμφωνα με τη δεύτερη μεθοδολογία, το πρόβλημα αυτό αντιμετωπίζεται χρησιμοποιώντας μία μέθοδο μη καθοδηγούμενης ομαδοποίησης (unsupervised clustering), που βασίζεται σε μοντελοποίηση με χρήση μίξης Gaussian κατανομών (Gaussian Mixture Modeling). Πειραματικά αποτελέσματα έδειξαν ότι οι τεχνικές αυτές έχουν καλύτερες επιδόσεις, σε σχέση με άλλες τεχνικές της βιβλιογραφίας. Επίσης, αναπτύχθηκε μία νέα μεθοδολογία αποτίμησης των μεθοδολογιών κατάτμησης λέξεων, δύο σταδίων. Η συγκεκριμένη μεθοδολογία αντιμετωπίζει το στάδιο υπολογισμού της απόστασης γειτονικών τμημάτων και το στάδιο ταξινόμησης των αποστάσεων ανεξάρτητα, σε αντίθεση με τις υπάρχουσες μεθοδολογίες αποτίμησης των μεθοδολογιών κατάτμησης λέξεων
A Review about the Sustainability of Pit Lakes as a Rehabilitation Factor after Mine Closure
At the end of surface mining activities, the remnant voids are of great concern regarding rehabilitating the final open pits. The investigation of the sustainability of pit lakes in post-mining regions constitutes a challenging research problem. This paper aims to highlight the effectiveness of pit lakes as a rehabilitation factor. In this framework, several cases worldwide and in Greece were examined in detail and evaluated. The results indicate that mine pit lakes must be evaluated as dynamic systems, natural or artificial, which demand rational mine water management to ensure their sustainability. Specifically in Greece, it is of great importance during the transition to the post-lignite era
Rational and Sustainable Water Resource Management in the Ptolemais Lignite Basin Using Remotely Sensed Data
Future investment feasibility studies concerning post-mining repurposing utilities and economic transitions should focus on regional water resource management and the hydraulic protection of any utilities. Satellite images in different bands and Digital Elevation Models (DEM) of the Ptolemais basin were processed, leading to a more accurate estimation of the runoff ratio and percolation ratio. Furthermore, the saturated and unsaturated areas were delineated, leading to the recognition of potential artificial ground water recharge zones and zones where appropriate hydraulic protection measures are necessary
Adeno-Associated Virus-Mediated Gain-of-Function mPCSK9 Expression in the Mouse Induces Hypercholesterolemia, Monocytosis, Neutrophilia, and a Hypercoagulative State
Hypercholesterolemia has previously been induced in the mouse by a single intravenous injection of adeno-associated virus (AAV)-based vector harboring gain-of-function pro-protein convertase subtilisin/kexin type 9. Despite the recent emergence of the PCSK9-AAV model, the profile of hematological and coagulation parameters associated with it has yet to be characterized. We injected 1.0 × 1011 viral particles of mPCSK9-AAV or control AAV into juvenile male C57BL/6N mice and fed them with either a Western-type high-fat diet (HFD) or standard diet over the course of 3 weeks. mPCSK9-AAV mice on HFD exhibited greater plasma PCSK9 concentration and lower low-density lipoprotein levels, concomitant with increased total cholesterol and non-high-density lipoprotein (non-HDL)-cholesterol concentrations, and lower HDL-cholesterol concentrations than control mice. Furthermore, mPCSK9-AAV-injected mice on HFD exhibited no signs of atherosclerosis at 3 weeks after the AAV injection. Hypercholesterolemia was associated with a thromboinflammatory phenotype, as neutrophil levels, monocyte levels, and neutrophil-to-lymphocyte ratios were higher and activated partial thromboplastin times (aPTTs) was lower in HFD-fed mPCSK9-AAV mice. Therefore, the mPCSK9-AAV is a suitable model of hypercholesterolemia to examine the role of thromboinflammatory processes in the pathogenesis of cardiovascular and cerebrovascular diseases
Multi-Seam Coal Deposit Modeling via Principal Component Analysis & GIS
Spatial modeling and evaluation is a critical step for planning the exploitation of mineral deposits. In this work, a methodology for the investigation of a multi-seam coal deposit spatial variability is proposed. The study area includes the Klidi (Florina, Greece) multi-seam lignite deposit which is suitable for surface mining. The analysis is based on the original data of 76 exploratory drill-holes in an area of 10 km2, in conjunction with the geological and geomorphological data of the deposit. The analytical methods include drill-hole data analysis and evaluation based on an appropriate algorithm, principal component analysis and geographic information techniques. The results proved to be very satisfactory for the explanation of the maximum variance of the initial data values as well as the identification of the deposit structure and the optimum planning of mine development. The proposed analysis can be also helpful for minimizing cost and optimizing efficiency of surface mining operations. Furthermore, the provided methods could be applied in other areas of geosciences, indicating the theoretical value as well as the important practical implications of the analysis