35 research outputs found

    μ •λ ¬ νŠΉμ„±λ“€ 기반의 λ¬Έμ„œ 및 μž₯λ©΄ ν…μŠ€νŠΈ μ˜μƒ ν‰ν™œν™” 기법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2017. 8. 쑰남읡.μΉ΄λ©”λΌλ‘œ μ΄¬μ˜ν•œ ν…μŠ€νŠΈ μ˜μƒμ— λŒ€ν•΄μ„œ, κ΄‘ν•™ 문자 인식(OCR)은 촬영된 μž₯면을 λΆ„μ„ν•˜λŠ”λ° μžˆμ–΄μ„œ 맀우 μ€‘μš”ν•˜λ‹€. ν•˜μ§€λ§Œ μ˜¬λ°”λ₯Έ ν…μŠ€νŠΈ μ˜μ—­ κ²€μΆœ 후에도, μ΄¬μ˜ν•œ μ˜μƒμ— λŒ€ν•œ 문자 인식은 μ—¬μ „νžˆ μ–΄λ €μš΄ 문제둜 여겨진닀. μ΄λŠ” μ’…μ΄μ˜ κ΅¬λΆ€λŸ¬μ§κ³Ό 카메라 μ‹œμ μ— μ˜ν•œ κΈ°ν•˜ν•™μ μΈ μ™œκ³‘ λ•Œλ¬Έμ΄κ³ , λ”°λΌμ„œ μ΄λŸ¬ν•œ ν…μŠ€νŠΈ μ˜μƒμ— λŒ€ν•œ ν‰ν™œν™”λŠ” 문자 인식에 μžˆμ–΄μ„œ ν•„μˆ˜μ μΈ μ „μ²˜λ¦¬ κ³Όμ •μœΌλ‘œ 여겨진닀. 이λ₯Ό μœ„ν•œ μ™œκ³‘λœ 촬영 μ˜μƒμ„ μ •λ©΄ μ‹œμ μœΌλ‘œ λ³΅μ›ν•˜λŠ” ν…μŠ€νŠΈ μ˜μƒ ν‰ν™œν™” 방법듀은 ν™œλ°œνžˆ μ—°κ΅¬λ˜μ–΄μ§€κ³  μžˆλ‹€. μ΅œκ·Όμ—λŠ”, ν‰ν™œν™”κ°€ 잘 된 ν…μŠ€νŠΈμ˜ μ„±μ§ˆμ— μ΄ˆμ μ„ 맞좘 연ꡬ듀이 주둜 μ§„ν–‰λ˜κ³  μžˆλ‹€. μ΄λŸ¬ν•œ κ΄€μ μ—μ„œ, λ³Έ ν•™μœ„ 논문은 ν…μŠ€νŠΈ μ˜μƒ ν‰ν™œν™”λ₯Ό μœ„ν•˜μ—¬ μƒˆλ‘œμš΄ μ •λ ¬ νŠΉμ„±λ“€μ„ 닀룬닀. μ΄λŸ¬ν•œ μ •λ ¬ νŠΉμ„±λ“€μ€ λΉ„μš© ν•¨μˆ˜λ‘œ μ„€κ³„λ˜μ–΄μ§€κ³ , λΉ„μš© ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™”ν•˜λŠ” 방법을 ν†΅ν•΄μ„œ ν‰ν™œν™”μ— μ‚¬μš©λ˜μ–΄μ§€λŠ” ν‰ν™œν™” λ³€μˆ˜λ“€μ΄ ꡬ해진닀. λ³Έ ν•™μœ„ 논문은 λ¬Έμ„œ μ˜μƒ ν‰ν™œν™”, μž₯λ©΄ ν…μŠ€νŠΈ ν‰ν™œν™”, 일반 λ°°κ²½ μ†μ˜ νœ˜μ–΄μ§„ ν‘œλ©΄ ν‰ν™œν™”μ™€ 같이 3가지 μ„ΈλΆ€ 주제둜 λ‚˜λˆ μ§„λ‹€. 첫 번째둜, λ³Έ ν•™μœ„ 논문은 ν…μŠ€νŠΈ 라인듀과 μ„ λΆ„λ“€μ˜ μ •λ ¬ νŠΉμ„±μ— 기반의 λ¬Έμ„œ μ˜μƒ ν‰ν™œν™” 방법을 μ œμ•ˆν•œλ‹€. 기쑴의 ν…μŠ€νŠΈ 라인 기반의 λ¬Έμ„œ μ˜μƒ ν‰ν™œν™” λ°©λ²•λ“€μ˜ 경우, λ¬Έμ„œκ°€ λ³΅μž‘ν•œ λ ˆμ΄μ•„μ›ƒ ν˜•νƒœμ΄κ±°λ‚˜ 적은 수의 ν…μŠ€νŠΈ 라인을 ν¬ν•¨ν•˜κ³  μžˆμ„ λ•Œ λ¬Έμ œκ°€ λ°œμƒν•œλ‹€. μ΄λŠ” λ¬Έμ„œμ— ν…μŠ€νŠΈ λŒ€μ‹  κ·Έλ¦Ό, κ·Έλž˜ν”„ ν˜Ήμ€ ν‘œμ™€ 같은 μ˜μ—­μ΄ λ§Žμ€ κ²½μš°μ΄λ‹€. λ”°λΌμ„œ λ ˆμ΄μ•„μ›ƒμ— κ°•μΈν•œ λ¬Έμ„œ μ˜μƒ ν‰ν™œν™”λ₯Ό μœ„ν•˜μ—¬ μ œμ•ˆν•˜λŠ” 방법은 μ •λ ¬λœ ν…μŠ€νŠΈ 라인뿐만 μ•„λ‹ˆλΌ 선뢄듀도 μ΄μš©ν•œλ‹€. μ˜¬λ°”λ₯΄κ²Œ ν‰ν™œν™” 된 선뢄듀은 μ—¬μ „νžˆ μΌμ§μ„ μ˜ ν˜•νƒœμ΄κ³ , λŒ€λΆ€λΆ„ κ°€λ‘œ ν˜Ήμ€ μ„Έλ‘œ λ°©ν–₯으둜 μ •λ ¬λ˜μ–΄ μžˆλ‹€λŠ” κ°€μ • 및 관츑에 κ·Όκ±°ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 μ΄λŸ¬ν•œ μ„±μ§ˆλ“€μ„ μˆ˜μ‹ν™”ν•˜κ³  이λ₯Ό ν…μŠ€νŠΈ 라인 기반의 λΉ„μš© ν•¨μˆ˜μ™€ κ²°ν•©ν•œλ‹€. 그리고 λΉ„μš© ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™” ν•˜λŠ” 방법을 톡해, μ œμ•ˆν•˜λŠ” 방법은 μ’…μ΄μ˜ κ΅¬λΆ€λŸ¬μ§, 카메라 μ‹œμ , 초점 거리와 같은 ν‰ν™œν™” λ³€μˆ˜λ“€μ„ μΆ”μ •ν•œλ‹€. λ˜ν•œ, μ˜€κ²€μΆœλœ ν…μŠ€νŠΈ 라인듀과 μž„μ˜μ˜ λ°©ν–₯을 κ°€μ§€λŠ” μ„ λΆ„λ“€κ³Ό 같은 이상점(outlier)을 κ³ λ €ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 반볡적인 λ‹¨κ³„λ‘œ μ„€κ³„λœλ‹€. 각 λ‹¨κ³„μ—μ„œ, μ •λ ¬ νŠΉμ„±μ„ λ§Œμ‘±ν•˜μ§€ μ•ŠλŠ” 이상점듀은 제거되고, μ œκ±°λ˜μ§€ μ•Šμ€ ν…μŠ€νŠΈ 라인 및 μ„ λΆ„λ“€λ§Œμ΄ λΉ„μš©ν•¨μˆ˜ μ΅œμ ν™”μ— μ΄μš©λœλ‹€. μˆ˜ν–‰ν•œ μ‹€ν—˜ 결과듀은 μ œμ•ˆν•˜λŠ” 방법이 λ‹€μ–‘ν•œ λ ˆμ΄μ•„μ›ƒμ— λŒ€ν•˜μ—¬ 강인함을 보여쀀닀. 두 λ²ˆμ§Έλ‘œλŠ”, λ³Έ 논문은 μž₯λ©΄ ν…μŠ€νŠΈ ν‰ν™œν™” 방법을 μ œμ•ˆν•œλ‹€. κΈ°μ‘΄ μž₯λ©΄ ν…μŠ€νŠΈ ν‰ν™œν™” λ°©λ²•λ“€μ˜ 경우, κ°€λ‘œ/μ„Έλ‘œ λ°©ν–₯의 획, λŒ€μΉ­ ν˜•νƒœμ™€ 같은 λ¬Έμžκ°€ κ°€μ§€λŠ” 고유의 μƒκΉ€μƒˆμ— κ΄€λ ¨λœ νŠΉμ„±μ„ μ΄μš©ν•œλ‹€. ν•˜μ§€λ§Œ, μ΄λŸ¬ν•œ 방법듀은 λ¬Έμžλ“€μ˜ μ •λ ¬ ν˜•νƒœλŠ” κ³ λ €ν•˜μ§€ μ•Šκ³ , 각각 κ°œλ³„ λ¬Έμžμ— λŒ€ν•œ νŠΉμ„±λ“€λ§Œμ„ μ΄μš©ν•˜κΈ° λ•Œλ¬Έμ— μ—¬λŸ¬ λ¬Έμžλ“€λ‘œ κ΅¬μ„±λœ ν…μŠ€νŠΈμ— λŒ€ν•΄μ„œ 잘 μ •λ ¬λ˜μ§€ μ•Šμ€ κ²°κ³Όλ₯Ό 좜λ ₯ν•œλ‹€. μ΄λŸ¬ν•œ λ¬Έμ œμ μ„ ν•΄κ²°ν•˜κΈ° μœ„ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 λ¬Έμžλ“€μ˜ μ •λ ¬ 정보λ₯Ό μ΄μš©ν•œλ‹€. μ •ν™•ν•˜κ²ŒλŠ”, 문자 고유의 λͺ¨μ–‘λΏλ§Œ μ•„λ‹ˆλΌ μ •λ ¬ νŠΉμ„±λ“€λ„ ν•¨κ»˜ λΉ„μš©ν•¨μˆ˜λ‘œ μˆ˜μ‹ν™”λ˜κ³ , λΉ„μš©ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™”ν•˜λŠ” 방법을 ν†΅ν•΄μ„œ ν‰ν™œν™”κ°€ μ§„ν–‰λœλ‹€. λ˜ν•œ, λ¬Έμžλ“€μ˜ μ •λ ¬ νŠΉμ„±μ„ μˆ˜μ‹ν™”ν•˜κΈ° μœ„ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 ν…μŠ€νŠΈλ₯Ό 각각 κ°œλ³„ λ¬Έμžλ“€λ‘œ λΆ„λ¦¬ν•˜λŠ” 문자 뢄리 λ˜ν•œ μˆ˜ν–‰ν•œλ‹€. κ·Έ λ’€, ν…μŠ€νŠΈμ˜ μœ„, μ•„λž˜ 선듀을 RANSAC μ•Œκ³ λ¦¬μ¦˜μ„ μ΄μš©ν•œ μ΅œμ†Œ μ œκ³±λ²•μ„ 톡해 μΆ”μ •ν•œλ‹€. 즉, 전체 μ•Œκ³ λ¦¬μ¦˜μ€ 문자 뢄리와 μ„  μΆ”μ •, ν‰ν™œν™”κ°€ 반볡적으둜 μˆ˜ν–‰λœλ‹€. μ œμ•ˆν•˜λŠ” λΉ„μš©ν•¨μˆ˜λŠ” 볼둝(convex)ν˜•νƒœκ°€ μ•„λ‹ˆκ³  λ˜ν•œ λ§Žμ€ λ³€μˆ˜λ“€μ„ ν¬ν•¨ν•˜κ³  있기 λ•Œλ¬Έμ—, 이λ₯Ό μ΅œμ ν™”ν•˜κΈ° μœ„ν•˜μ—¬ Augmented Lagrange Multiplier 방법을 μ΄μš©ν•œλ‹€. μ œμ•ˆν•˜λŠ” 방법은 일반 촬영 μ˜μƒκ³Ό ν•©μ„±λœ ν…μŠ€νŠΈ μ˜μƒμ„ 톡해 μ‹€ν—˜μ΄ μ§„ν–‰λ˜μ—ˆκ³ , μ‹€ν—˜ 결과듀은 μ œμ•ˆν•˜λŠ” 방법이 κΈ°μ‘΄ 방법듀에 λΉ„ν•˜μ—¬ 높은 인식 μ„±λŠ₯을 λ³΄μ΄λ©΄μ„œ λ™μ‹œμ— μ‹œκ°μ μœΌλ‘œλ„ 쒋은 κ²°κ³Όλ₯Ό λ³΄μž„μ„ 보여쀀닀. λ§ˆμ§€λ§‰μœΌλ‘œ, μ œμ•ˆν•˜λŠ” 방법은 일반 λ°°κ²½ μ†μ˜ νœ˜μ–΄μ§„ ν‘œλ©΄ ν‰ν™œν™” λ°©λ²•μœΌλ‘œλ„ ν™•μž₯λœλ‹€. 일반 배경에 λŒ€ν•΄μ„œ, μ•½λ³‘μ΄λ‚˜ 음료수 μΊ”κ³Ό 같이 원톡 ν˜•νƒœμ˜ λ¬Όμ²΄λŠ” 많이 μ‘΄μž¬ν•œλ‹€. κ·Έλ“€μ˜ ν‘œλ©΄μ€ 일반 원톡 ν‘œλ©΄(GCS)으둜 λͺ¨λΈλ§μ΄ κ°€λŠ₯ν•˜λ‹€. μ΄λŸ¬ν•œ νœ˜μ–΄μ§„ ν‘œλ©΄λ“€μ€ λ§Žμ€ λ¬Έμžμ™€ 그림듀을 ν¬ν•¨ν•˜κ³  μžˆμ§€λ§Œ, ν¬ν•¨λœ λ¬ΈμžλŠ” λ¬Έμ„œμ— λΉ„ν•΄μ„œ 맀우 λΆˆκ·œμΉ™μ μΈ ꡬ쑰λ₯Ό 가지고 μžˆλ‹€. λ”°λΌμ„œ 기쑴의 λ¬Έμ„œ μ˜μƒ ν‰ν™œν™” λ°©λ²•λ“€λ‘œλŠ” 일반 λ°°κ²½ 속 νœ˜μ–΄μ§„ ν‘œλ©΄ μ˜μƒμ„ ν‰ν™œν™”ν•˜κΈ° νž˜λ“€λ‹€. λ§Žμ€ νœ˜μ–΄μ§„ ν‘œλ©΄μ€ 잘 μ •λ ¬λœ μ„ λΆ„λ“€ (ν…Œλ‘λ¦¬ μ„  ν˜Ήμ€ λ°”μ½”λ“œ)을 ν¬ν•¨ν•˜κ³  μžˆλ‹€λŠ” 관츑에 κ·Όκ±°ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 방법은 μ•žμ„œ μ œμ•ˆν•œ 선뢄듀에 λŒ€ν•œ ν•¨μˆ˜λ₯Ό μ΄μš©ν•˜μ—¬ νœ˜μ–΄μ§„ ν‘œλ©΄μ„ ν‰ν™œν™”ν•œλ‹€. λ‹€μ–‘ν•œ λ‘₯κ·Ό 물체의 νœ˜μ–΄μ§„ ν‘œλ©΄ μ˜μƒλ“€μ— λŒ€ν•œ μ‹€ν—˜ 결과듀은 μ œμ•ˆν•˜λŠ” 방법이 ν‰ν™œν™”λ₯Ό μ •ν™•ν•˜κ²Œ μˆ˜ν–‰ν•¨μ„ 보여쀀닀.The optical character recognition (OCR) of text images captured by cameras plays an important role for scene understanding. However, the OCR of camera-captured image is still considered a challenging problem, even after the text detection (localization). It is mainly due to the geometric distortions caused by page curve and perspective view, therefore their rectification has been an essential pre-processing step for their recognition. Thus, there have been many text image rectification methods which recover the fronto-parallel view image from a single distorted image. Recently, many researchers have focused on the properties of the well-rectified text. In this respect, this dissertation presents novel alignment properties for text image rectification, which are encoded into the proposed cost functions. By minimizing the cost functions, the transformation parameters for rectification are obtained. In detail, they are applied to three topics: document image dewarping, scene text rectification, and curved surface dewarping in real scene. First, a document image dewarping method is proposed based on the alignments of text-lines and line segments. Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, the proposed method uses line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, the proposed method encodes this properties into the cost function in addition to the text-line based cost. By minimizing the function, the proposed method can obtain transformation parameters for page curve, camera pose, and focal length, which are used for document image rectification. Considering that there are many outliers in line segment directions and miss-detected text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, the proposed method removes the text-lines and line segments that are not well aligned, and then minimizes the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. This dissertation also presents a method for scene text rectification. Conventional methods for scene text rectification mainly exploited the glyph property, which means that the characters in many language have horizontal/vertical strokes and also some symmetric shapes. However, since they consider the only shape properties of individual character, without considering the alignments of characters, they work well for only images with a single character, and still yield mis-aligned results for images with multiple characters. In order to alleviate this problem, the proposed method explicitly imposes alignment constraints on rectified results. To be precise, character alignments as well as glyph properties are encoded in the proposed cost function, and the transformation parameters are obtained by minimizing the function. Also, in order to encode the alignments of characters into the cost function, the proposed method separates the text into individual characters using a projection profile method before optimizing the cost function. Then, top and bottom lines are estimated using a least squares line fitting with RANSAC. Overall algorithm is designed to perform character segmentation, line fitting, and rectification iteratively. Since the cost function is non-convex and many variables are involved in the function, the proposed method also develops an optimization method using Augmented Lagrange Multiplier method. This dissertation evaluates the proposed method on real and synthetic text images and experimental results show that the proposed method achieves higher OCR accuracy than the conventional approach and also yields visually pleasing results. Finally, the proposed method can be extended to the curved surface dewarping in real scene. In real scene, there are many circular objects such as medicine bottles or cans of drinking water, and their curved surfaces can be modeled as Generalized Cylindrical Surfaces (GCS). These curved surfaces include many significant text and figures, however their text has irregular structure compared to documents. Therefore, the conventional dewarping methods based on the properties of well-rectified text have problems in their rectification. Based on the observation that many curved surfaces include well-aligned line segments (boundary lines of objects or barcode), the proposed method rectifies the curved surfaces by exploiting the proposed line segment terms. Experimental results on a range of images with curved surfaces of circular objects show that the proposed method performs rectification robustly.1 Introduction 1 1.1 Document image dewarping 3 1.2 Scene text rectification 5 1.3 Curved surface dewarping in real scene 7 1.4 Contents 8 2 Related work 9 2.1 Document image dewarping 9 2.1.1 Dewarping methods using additional information 9 2.1.2 Text-line based dewarping methods 10 2.2 Scene text rectification 11 2.3 Curved surface dewarping in real scene 12 3 Document image dewarping 15 3.1 Proposed cost function 15 3.1.1 Parametric model of dewarping process 15 3.1.2 Cost function design 18 3.1.3 Line segment properties and cost function 19 3.2 Outlier removal and optimization 26 3.2.1 Jacobian matrix of the proposed cost function 27 3.3 Document region detection and dewarping 31 3.4 Experimental results 32 3.4.1 Experimental results on text-abundant document images 33 3.4.2 Experimental results on non conventional document images 34 3.5 Summary 47 4 Scene text rectification 49 4.1 Proposed cost function for rectification 49 4.1.1 Cost function design 49 4.1.2 Character alignment properties and alignment terms 51 4.2 Overall algorithm 54 4.2.1 Initialization 55 4.2.2 Character segmentation 56 4.2.3 Estimation of the alignment parameters 57 4.2.4 Cost function optimization for rectification 58 4.3 Experimental results 63 4.4 Summary 66 5 Curved surface dewarping in real scene 73 5.1 Proposed curved surface dewarping method 73 5.1.1 Pre-processing 73 5.1 Experimental results 74 5.2 Summary 76 6 Conclusions 83 Bibliography 85 Abstract (Korean) 93Docto

    Learning to Read by Spelling: Towards Unsupervised Text Recognition

    Full text link
    This work presents a method for visual text recognition without using any paired supervisory data. We formulate the text recognition task as one of aligning the conditional distribution of strings predicted from given text images, with lexically valid strings sampled from target corpora. This enables fully automated, and unsupervised learning from just line-level text-images, and unpaired text-string samples, obviating the need for large aligned datasets. We present detailed analysis for various aspects of the proposed method, namely - (1) impact of the length of training sequences on convergence, (2) relation between character frequencies and the order in which they are learnt, (3) generalisation ability of our recognition network to inputs of arbitrary lengths, and (4) impact of varying the text corpus on recognition accuracy. Finally, we demonstrate excellent text recognition accuracy on both synthetically generated text images, and scanned images of real printed books, using no labelled training examples

    Real-Time On-Site OpenGL-Based Object Speed Measuring Using Constant Sequential Image

    Get PDF
    This thesis presents a method that can detect moving objects and measure their speed of movement, using a constant rate series of sequential images, such as video recordings. It uses the industry standard non-vendor specific OpenGL ES so can be implemented on any platform with OpenGL ES support. It can run on low-end embedded system as it uses simple and basic foundations based on a few assumptions to lowering the overall implementation complexity in OpenGL ES. It also does not require any special peripheral devices, so existing infrastructure can be used with minimal modification, which will further lower the cost of this system. The sequential images are streamed from an IO device via the CPU into the GPU where a custom shader is used to detect changing pixels between frames to find potential moving objects. The GPU shader continues by measuring the pixel displacement of each object, and then maps this into a practical distance. These results are then sent back to the CPU for future processing. The algorithm was tested on two real world traffic videos (720p video at 10 FPS) and it successfully extracted the speed data of road vehicles in view on a low-end embedded system (Raspberry Pi 4)

    Facial expression recognition in the wild : from individual to group

    Get PDF
    The progress in computing technology has increased the demand for smart systems capable of understanding human affect and emotional manifestations. One of the crucial factors in designing systems equipped with such intelligence is to have accurate automatic Facial Expression Recognition (FER) methods. In computer vision, automatic facial expression analysis is an active field of research for over two decades now. However, there are still a lot of questions unanswered. The research presented in this thesis attempts to address some of the key issues of FER in challenging conditions mentioned as follows: 1) creating a facial expressions database representing real-world conditions; 2) devising Head Pose Normalisation (HPN) methods which are independent of facial parts location; 3) creating automatic methods for the analysis of mood of group of people. The central hypothesis of the thesis is that extracting close to real-world data from movies and performing facial expression analysis on movies is a stepping stone in the direction of moving the analysis of faces towards real-world, unconstrained condition. A temporal facial expressions database, Acted Facial Expressions in the Wild (AFEW) is proposed. The database is constructed and labelled using a semi-automatic process based on closed caption subtitle based keyword search. Currently, AFEW is the largest facial expressions database representing challenging conditions available to the research community. For providing a common platform to researchers in order to evaluate and extend their state-of-the-art FER methods, the first Emotion Recognition in the Wild (EmotiW) challenge based on AFEW is proposed. An image-only based facial expressions database Static Facial Expressions In The Wild (SFEW) extracted from AFEW is proposed. Furthermore, the thesis focuses on HPN for real-world images. Earlier methods were based on fiducial points. However, as fiducial points detection is an open problem for real-world images, HPN can be error-prone. A HPN method based on response maps generated from part-detectors is proposed. The proposed shape-constrained method does not require fiducial points and head pose information, which makes it suitable for real-world images. Data from movies and the internet, representing real-world conditions poses another major challenge of the presence of multiple subjects to the research community. This defines another focus of this thesis where a novel approach for modeling the perception of mood of a group of people in an image is presented. A new database is constructed from Flickr based on keywords related to social events. Three models are proposed: averaging based Group Expression Model (GEM), Weighted Group Expression Model (GEM_w) and Augmented Group Expression Model (GEM_LDA). GEM_w is based on social contextual attributes, which are used as weights on each person's contribution towards the overall group's mood. Further, GEM_LDA is based on topic model and feature augmentation. The proposed framework is applied to applications of group candid shot selection and event summarisation. The application of Structural SIMilarity (SSIM) index metric is explored for finding similar facial expressions. The proposed framework is applied to the problem of creating image albums based on facial expressions, finding corresponding expressions for training facial performance transfer algorithms

    What remains is the book: The idea of the book in and around electronic space

    Get PDF
    The purpose of this study is to question the idea of the book in general and how this idea is transforming in electronic space, understood as a space of flows as distinct to a space of places (Castells, 1989, p. 349). In order to question the idea of the book in electronic space we must begin at its ending, or more specifically, at a point in the histories of the book that is widely understood as representing a closing of a parenthesis - that began with the invention of the printing press, up to the end of printβ€”spanning some 500 years, beginning half way through the 15th century in Western Europe

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Reading poetry and dreams in the wake of Freud

    Get PDF
    Adapting the question at the end of Keats's 'Ode to a Nightingale', this thesis argues that reading poetic texts involves a form of suspension between waking and sleeping. Poems are not the product of an empirical dreamer, but psychoanalytic understandings of dream-work help to provide an account of certain poetic effects. Poetic texts resemble dreams in that both induce identificatory desires within, while simultaneously estranging, the reading process. In establishing a theoretical connection between poetic texts and drearit-work, the discussion raises issues concerning death, memory and the body. The introduction relates Freudian and post-Freudian articulations of dream-work to the language of poetry, and addresses the problem of attributing desire "in" a literary text. Interweaving the work of Borch-Jacobsen, Derrida and Blanchot, the discussion proposes a different space of poetry. By reconfiguring the subject-of-desire and the structure of poetic address, the thesis argues that poetic "dreams" characterize points in texts which radically question the identity and position of the reader. Several main chapters focus on texts - poems by Frost and Keats, and Freud's reading of literary dreams - in which distinctions between waking and sleeping, familiarity and strangeness, order and confusion are profoundly disturbed. The latter part of the thesis concentrates on a textual "unconscious" that insists undecidably between the cultural and the individual. Poems by Eliot, Tennyson, Arnold and Walcott are shown to figure strange dreams and enact displacements that blur the categories of public and private. Throughout, the study confronts the recurrent interpretive problem of reading "inside" and "outside" textual dreams. This thesis offers an original perspective on reading poetry in conjunction with psychoanalysis, in that it challenges traditional assumptions about phantasy and poetry dependent upon a subject constituted in advance of a poetic event or scene of phantasy. It brings poetry into systematic relation with Freud's work on dreams and consistently identifies conceptual and performative links between psychoanalysis and literature in later modernity

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms
    corecore