431 research outputs found
Automatic text segmentation and text recognition for video indexing
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our methods for automatic seg-mentation of text in digital videos. The output is directly passed to a standard OCR software package in order to translate the segmented text into ASCII. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. Especially the inter-frame dependencies of the characters provide new possibilities for their refinement. Then, a straightforward indexing and retrieval scheme is intro-duced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with exist-ing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher seman-tics in videos
Use of the Smartphone Camera to Monitor Adherence to Inhaled Therapy
Self-management strategies can lead to improved health outcomes, fewer unscheduled treatments, and improved disease control. Compliance with inhaled control drugs is essential to achieve good clinical outcomes in patients with chronic respiratory diseases. However, compliance assessments suffer from the difficulty of achieving a high degree of trustworthiness, as patients often self-report high compliance rates and are considered unreliable. This thesis aims to enable reliable adhesion measurement by developing a mobile application module to objectively verify inhalation usage using image snapshots of the inhalation counter. To achieve this, a mobile application module featuring pre and post processing techniques and a default machine learning framework was built, for inhaler and dosage counter numbers detection. In addition, in an effort to improve the app’s capabilities of text recognition on a worst-performing inhaler, a machine learning model was trained on an inhaler image dataset. Some of the features worked on during this project were incorporated on the current version of the app InspirerMundi, a medication management mobile application, planned to be made available at the PlayStore by the end of 2021. The proposed approach was validated through a series of different inhaler image datasets. The carried-out tests with the default machine learning configuration showed correct detection of dosage counters for 70% of inhaler registration events and 93% for three commonly used inhalers in Portugal. On the other hand, the trained model had an average accuracy of 88 % in recognizing the digits on the dose counter of one of the worst-performing inhaler models. These results show the potential to explore mobile and embedded capabilities to gain additional evidence for inhaler compliance. These systems can help bridge the gap between patients and healthcare professionals. By empowering patients with disease selfmanagement and drug adherence tools and providing additional relevant data, these systems pave the way for informed disease management decisions
Neural text line extraction in historical documents: a two-stage clustering approach
Accessibility of the valuable cultural heritage which is hidden in countless scanned historical documents is the motivation for the presented dissertation. The developed (fully automatic) text line extraction methodology combines state-of-the-art machine learning techniques and modern image processing methods. It demonstrates its quality by outperforming several other approaches on a couple of benchmarking datasets. The method is already being used by a wide audience of researchers from different disciplines and thus contributes its (small) part to the aforementioned goal.Das Erschließen des unermesslichen Wissens, welches in unzähligen gescannten historischen Dokumenten verborgen liegt, bildet die Motivation für die vorgelegte Dissertation. Durch das Verknüpfen moderner Verfahren des maschinellen Lernens und der klassischen Bildverarbeitung wird in dieser Arbeit ein vollautomatisches Verfahren zur Extraktion von Textzeilen aus historischen Dokumenten entwickelt. Die Qualität wird auf verschiedensten Datensätzen im Vergleich zu anderen Ansätzen nachgewiesen. Das Verfahren wird bereits durch eine Vielzahl von Forschern verschiedenster Disziplinen genutzt
Recommended from our members
Bridging the Gap Between People, Mobile Devices, and the Physical World
Human-computer interaction (HCI) is being revolutionized by computational design and artificial intelligence. As the diversity of user interfaces shifts from personal desktops to mobile and wearable devices, yesterday’s tools and interfaces are insufficient to meet the demands of tomorrow’s devices. This dissertation describes my research on leveraging different physical channels (e.g., vibration, light, capacitance) to enable novel interaction opportunities. We first introduce FontCode, an information embedding technique for text documents. Given a text document with specific fonts, our method can embed user-specified information (e.g., URLs, meta data, etc) in the text by perturbing the glyphs of text characters while preserving the text content. The embedded information can later be retrieved using a smartphone in real time. Then, we present Vidgets, a family of mechanical widgets, specifically push buttons and rotary knobs that augment mobile devices with tangible user interfaces. When these widgets are attached to a mobile device and a user interacts with them, the nonlinear mechanical response of the widgets shifts the device slightly and quickly. Subsequently, this subtle motion can be detected by the Inertial Measurement Units (IMUs), which is commonly installed on mobile devices.
Next, we propose BackTrack, a trackpad placed on the back of a smartphone to track finegrained finger motions. Our system has a small form factor, with all the circuits encapsulated in a thin layer attached to a phone case. It can be used with any off-the-shelf smartphone, requiring no power supply or modification of the operating systems. BackTrack simply extends the finger tracking area of the front screen, without interrupting the use of the front screen.
Lastly, we demonstrate MoiréBoard, a new camera tracking method that leverages a seemingly irrelevant visual phenomenon, the moiré effect. Based on a systematic analysis of the moiré effect under camera projection, MoiréBoard requires no power nor camera calibration. It can easily be made at a low cost (e.g., through 3D printing) and ready to use with any stock mobile device with a camera. Its tracking algorithm is computationally efficient and can run at a high frame rate. It is not only simple to implement, but also tracks devices at a high accuracy, comparable to the state-of-the-art commercial VR tracking systems
Engineering data compendium. Human perception and performance. User's guide
The concept underlying the Engineering Data Compendium was the product of a research and development program (Integrated Perceptual Information for Designers project) aimed at facilitating the application of basic research findings in human performance to the design and military crew systems. The principal objective was to develop a workable strategy for: (1) identifying and distilling information of potential value to system design from the existing research literature, and (2) presenting this technical information in a way that would aid its accessibility, interpretability, and applicability by systems designers. The present four volumes of the Engineering Data Compendium represent the first implementation of this strategy. This is the first volume, the User's Guide, containing a description of the program and instructions for its use
The generalization of the R-transform for invariant pattern representation
International audienceThe beneficial properties of the Radon transform make it an useful intermediate representation for the extraction of invariant features from pattern images for the purpose of indexing/matching. This paper revisits the problem of Radon image utilization with a generic view on a popular Radon transform-based transform and pattern descriptor, the R-transform and R-signature, bringing in a class of transforms and descriptors spatially describing patterns at all directions and at different levels, while maintaining the beneficial properties of the conventional R-transform and R-signature. The domain of this class, which is delimited due to the existence of singularities and the effect of sampling/quantization and additive noise, is examined. Moreover, the ability of the generic R-transform to encode the dominant directions of pattern is also discussed, adding to the robustness to additive noise of the generic R-signature. The stability of dominant direction encoding by the generic R-transform and the superiority of the generic R-signature over existing invariant pattern descriptors on grayscale and binary noisy datasets have been confirmed by experiments
Human-Centric Machine Vision
Recently, the algorithms for the processing of the visual information have greatly evolved, providing efficient and effective solutions to cope with the variability and the complexity of real-world environments. These achievements yield to the development of Machine Vision systems that overcome the typical industrial applications, where the environments are controlled and the tasks are very specific, towards the use of innovative solutions to face with everyday needs of people. The Human-Centric Machine Vision can help to solve the problems raised by the needs of our society, e.g. security and safety, health care, medical imaging, and human machine interface. In such applications it is necessary to handle changing, unpredictable and complex situations, and to take care of the presence of humans
Digital scaling of binary images
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1979.MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING.Includes bibliographical references.by Robert A. Ulichney.M.S
- …