358 research outputs found
AGE ESTIMATION UNTUK INTELLIGENT ADVERTISING PADA POSTER DIGITAL MENGGUNAKAN CONVOLUTIONAL NEURAL NETWORK
Sebagai bagian dari intelligent advertising, age estimation digunakan untuk menyesuaikan iklan dari hasil estimasi usia audience. Age estimation (AE) dapat dibangun menggunakan deep learning menggunakan ConvNet dengan kendala seperti data training wajah usia tua yang sedikit, ketidak seimbangan dataset di dalamnya, serta membutuhkan jumlah data yang besar. Salah satu solusi dari permasalahan ini adalah melakukan data augmentasi menggunakan model generatif ACGAN untuk melakukan generate gambar sesuai dengan kelas. Intelligent advertising pada poster digital hanya disimulasikan pada komputer. Simulasi intelligent advertising berfungsi dengan baik terlepas dari terbatasnya iklan dan tidak konsistennya hasil estimasi usia. Hasil dari penggunaan model generatifACGAN untuk data augmentation berhasil meningkatkan performa hasil pada model AE terlepas dari rendahnya skor IS dan FID serta kualitas gambar yang dihasilkan. Hasil data augmentation lebih terlihat pada model B dengan peningkatan akurasi cumulative score sebesar 4,8% dan skor MAE sebesar 1,297
Comparative analysis of features extraction techniques for black face age estimation
A computer-based age estimation is a technique that predicts an individual's
age based on visual traits derived by analyzing a 2D picture of the individual's
face. Age estimation is critical for access control, e-government, and effective
human–computer interaction. The other-race effect has the potential to cause
techniques designed for white faces to underperform when used in a region
with black faces. The outcome is the consequence of intermittent training with
faces of the same race and the encoding structure of the trained face images,
which is based on the feature extraction technique used. This study
contributes to a constructive comparison of three feature-extraction
techniques, namely, local binary pattern (LBP), Gabor Wavelet (GW), and
wavelet transformation, used in the development of a genetic algorithm (GA)-
artificial neural network (ANN)-based age estimation system. The feature
extraction techniques used are proven to produce a wealth of shape and
textural information. The GA-ANN constitutes the age classifier module. The
correct classification rate was chosen as the performance metrics in this study.
The results demonstrated that the LBP is a more robust representation of the
black face than the GW and Wavelet transformations, as evidenced by its accuracy rate of 91.76 compared to 89.41 and 84.71 achieved with the GW and
Wavelet transformation age estimation systems, respectively
The Menpo benchmark for multi-pose 2D and 3D facial landmark localisation and tracking
In this article, we present the Menpo 2D and Menpo 3D benchmarks, two new datasets for multi-pose 2D and 3D facial landmark localisation and tracking. In contrast to the previous benchmarks such as 300W and 300VW, the proposed benchmarks contain facial images in both semi-frontal and profile pose. We introduce an elaborate semi-automatic methodology for providing high-quality annotations for both the Menpo 2D and Menpo 3D benchmarks. In Menpo 2D benchmark, different visible landmark configurations are designed for semi-frontal and profile faces, thus making the 2D face alignment full-pose. In Menpo 3D benchmark, a united landmark configuration is designed for both semi-frontal and profile faces based on the correspondence with a 3D face model, thus making face alignment not only full-pose but also corresponding to the real-world 3D space. Based on the considerable number of annotated images, we organised Menpo 2D Challenge and Menpo 3D Challenge for face alignment under large pose variations in conjunction with CVPR 2017 and ICCV 2017, respectively. The results of these challenges demonstrate that recent deep learning architectures, when trained with the abundant data, lead to excellent results. We also provide a very simple, yet effective solution, named Cascade Multi-view Hourglass Model, to 2D and 3D face alignment. In our method, we take advantage of all 2D and 3D facial landmark annotations in a joint way. We not only capitalise on the correspondences between the semi-frontal and profile 2D facial landmarks but also employ joint supervision from both 2D and 3D facial landmarks. Finally, we discuss future directions on the topic of face alignment
The Menpo benchmark for multi-pose 2D and 3D facial landmark localisation and tracking
In this article, we present the Menpo 2D and Menpo 3D benchmarks, two new datasets for multi-pose 2D and 3D facial landmark localisation and tracking. In contrast to the previous benchmarks such as 300W and 300VW, the proposed benchmarks contain facial images in both semi-frontal and profile pose. We introduce an elaborate semi-automatic methodology for providing high-quality annotations for both the Menpo 2D and Menpo 3D benchmarks. In Menpo 2D benchmark, different visible landmark configurations are designed for semi-frontal and profile faces, thus making the 2D face alignment full-pose. In Menpo 3D benchmark, a united landmark configuration is designed for both semi-frontal and profile faces based on the correspondence with a 3D face model, thus making face alignment not only full-pose but also corresponding to the real-world 3D space. Based on the considerable number of annotated images, we organised Menpo 2D Challenge and Menpo 3D Challenge for face alignment under large pose variations in conjunction with CVPR 2017 and ICCV 2017, respectively. The results of these challenges demonstrate that recent deep learning architectures, when trained with the abundant data, lead to excellent results. We also provide a very simple, yet effective solution, named Cascade Multi-view Hourglass Model, to 2D and 3D face alignment. In our method, we take advantage of all 2D and 3D facial landmark annotations in a joint way. We not only capitalise on the correspondences between the semi-frontal and profile 2D facial landmarks but also employ joint supervision from both 2D and 3D facial landmarks. Finally, we discuss future directions on the topic of face alignment
Deep Learning in Breast Cancer Imaging: A Decade of Progress and Future Directions
Breast cancer has reached the highest incidence rate worldwide among all
malignancies since 2020. Breast imaging plays a significant role in early
diagnosis and intervention to improve the outcome of breast cancer patients. In
the past decade, deep learning has shown remarkable progress in breast cancer
imaging analysis, holding great promise in interpreting the rich information
and complex context of breast imaging modalities. Considering the rapid
improvement in the deep learning technology and the increasing severity of
breast cancer, it is critical to summarize past progress and identify future
challenges to be addressed. In this paper, we provide an extensive survey of
deep learning-based breast cancer imaging research, covering studies on
mammogram, ultrasound, magnetic resonance imaging, and digital pathology images
over the past decade. The major deep learning methods, publicly available
datasets, and applications on imaging-based screening, diagnosis, treatment
response prediction, and prognosis are described in detail. Drawn from the
findings of this survey, we present a comprehensive discussion of the
challenges and potential avenues for future research in deep learning-based
breast cancer imaging.Comment: Survey, 41 page
A Survey on Computer Vision based Human Analysis in the COVID-19 Era
The emergence of COVID-19 has had a global and profound impact, not only on
society as a whole, but also on the lives of individuals. Various prevention
measures were introduced around the world to limit the transmission of the
disease, including face masks, mandates for social distancing and regular
disinfection in public spaces, and the use of screening applications. These
developments also triggered the need for novel and improved computer vision
techniques capable of (i) providing support to the prevention measures through
an automated analysis of visual data, on the one hand, and (ii) facilitating
normal operation of existing vision-based services, such as biometric
authentication schemes, on the other. Especially important here, are computer
vision techniques that focus on the analysis of people and faces in visual data
and have been affected the most by the partial occlusions introduced by the
mandates for facial masks. Such computer vision based human analysis techniques
include face and face-mask detection approaches, face recognition techniques,
crowd counting solutions, age and expression estimation procedures, models for
detecting face-hand interactions and many others, and have seen considerable
attention over recent years. The goal of this survey is to provide an
introduction to the problems induced by COVID-19 into such research and to
present a comprehensive review of the work done in the computer vision based
human analysis field. Particular attention is paid to the impact of facial
masks on the performance of various methods and recent solutions to mitigate
this problem. Additionally, a detailed review of existing datasets useful for
the development and evaluation of methods for COVID-19 related applications is
also provided. Finally, to help advance the field further, a discussion on the
main open challenges and future research direction is given.Comment: Submitted to Image and Vision Computing, 44 pages, 7 figure
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals
translate speech between any two languages? While recent breakthroughs in
text-based models have pushed machine translation coverage beyond 200
languages, unified speech-to-speech translation models have yet to achieve
similar strides. More specifically, conventional speech-to-speech translation
systems rely on cascaded systems that perform translation progressively,
putting high-performing unified systems out of reach. To address these gaps, we
introduce SeamlessM4T, a single model that supports speech-to-speech
translation, speech-to-text translation, text-to-speech translation,
text-to-text translation, and automatic speech recognition for up to 100
languages. To build this, we used 1 million hours of open speech audio data to
learn self-supervised speech representations with w2v-BERT 2.0. Subsequently,
we created a multimodal corpus of automatically aligned speech translations.
Filtered and combined with human-labeled and pseudo-labeled data, we developed
the first multilingual system capable of translating from and into English for
both speech and text. On FLEURS, SeamlessM4T sets a new standard for
translations into multiple target languages, achieving an improvement of 20%
BLEU over the previous SOTA in direct speech-to-text translation. Compared to
strong cascaded models, SeamlessM4T improves the quality of into-English
translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in
speech-to-speech. Tested for robustness, our system performs better against
background noises and speaker variations in speech-to-text tasks compared to
the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and
added toxicity to assess translation safety. Finally, all contributions in this
work are open-sourced and accessible at
https://github.com/facebookresearch/seamless_communicatio
A continuum robotic platform for endoscopic non-contact laser surgery: design, control, and preclinical evaluation
The application of laser technologies in surgical interventions has been accepted in the clinical
domain due to their atraumatic properties. In addition to manual application of fibre-guided
lasers with tissue contact, non-contact transoral laser microsurgery (TLM) of laryngeal tumours
has been prevailed in ENT surgery. However, TLM requires many years of surgical training
for tumour resection in order to preserve the function of adjacent organs and thus preserve the
patient’s quality of life. The positioning of the microscopic laser applicator outside the patient
can also impede a direct line-of-sight to the target area due to anatomical variability and limit
the working space. Further clinical challenges include positioning the laser focus on the tissue
surface, imaging, planning and performing laser ablation, and motion of the target area during
surgery. This dissertation aims to address the limitations of TLM through robotic approaches and
intraoperative assistance. Although a trend towards minimally invasive surgery is apparent, no
highly integrated platform for endoscopic delivery of focused laser radiation is available to date.
Likewise, there are no known devices that incorporate scene information from endoscopic imaging
into ablation planning and execution. For focusing of the laser beam close to the target tissue, this
work first presents miniaturised focusing optics that can be integrated into endoscopic systems.
Experimental trials characterise the optical properties and the ablation performance. A robotic
platform is realised for manipulation of the focusing optics. This is based on a variable-length
continuum manipulator. The latter enables movements of the endoscopic end effector in five
degrees of freedom with a mechatronic actuation unit. The kinematic modelling and control of the
robot are integrated into a modular framework that is evaluated experimentally. The manipulation
of focused laser radiation also requires precise adjustment of the focal position on the tissue. For
this purpose, visual, haptic and visual-haptic assistance functions are presented. These support
the operator during teleoperation to set an optimal working distance. Advantages of visual-haptic
assistance are demonstrated in a user study. The system performance and usability of the overall
robotic system are assessed in an additional user study. Analogous to a clinical scenario, the
subjects follow predefined target patterns with a laser spot. The mean positioning accuracy of the
spot is 0.5 mm. Finally, methods of image-guided robot control are introduced to automate laser
ablation. Experiments confirm a positive effect of proposed automation concepts on non-contact
laser surgery.Die Anwendung von Lasertechnologien in chirurgischen Interventionen hat sich aufgrund der atraumatischen Eigenschaften in der Klinik etabliert. Neben manueller Applikation von fasergefĂĽhrten
Lasern mit Gewebekontakt hat sich die kontaktfreie transorale Lasermikrochirurgie (TLM) von
Tumoren des Larynx in der HNO-Chirurgie durchgesetzt. Die TLM erfordert zur Tumorresektion
jedoch ein langjähriges chirurgisches Training, um die Funktion der angrenzenden Organe zu
sichern und damit die Lebensqualität der Patienten zu erhalten. Die Positionierung des mikroskopis chen Laserapplikators außerhalb des Patienten kann zudem die direkte Sicht auf das Zielgebiet
durch anatomische Variabilität erschweren und den Arbeitsraum einschränken. Weitere klinische
Herausforderungen betreffen die Positionierung des Laserfokus auf der Gewebeoberfläche, die
Bildgebung, die Planung und AusfĂĽhrung der Laserablation sowie intraoperative Bewegungen
des Zielgebietes. Die vorliegende Dissertation zielt darauf ab, die Limitierungen der TLM durch
robotische Ansätze und intraoperative Assistenz zu adressieren. Obwohl ein Trend zur minimal
invasiven Chirurgie besteht, sind bislang keine hochintegrierten Plattformen fĂĽr die endoskopische
Applikation fokussierter Laserstrahlung verfĂĽgbar. Ebenfalls sind keine Systeme bekannt, die
Szeneninformationen aus der endoskopischen Bildgebung in die Ablationsplanung und -ausfĂĽhrung
einbeziehen. Für eine situsnahe Fokussierung des Laserstrahls wird in dieser Arbeit zunächst
eine miniaturisierte Fokussieroptik zur Integration in endoskopische Systeme vorgestellt. Experimentelle Versuche charakterisieren die optischen Eigenschaften und das Ablationsverhalten. Zur
Manipulation der Fokussieroptik wird eine robotische Plattform realisiert. Diese basiert auf einem
längenveränderlichen Kontinuumsmanipulator. Letzterer ermöglicht in Kombination mit einer
mechatronischen Aktuierungseinheit Bewegungen des Endoskopkopfes in fĂĽnf Freiheitsgraden.
Die kinematische Modellierung und Regelung des Systems werden in ein modulares Framework
eingebunden und evaluiert. Die Manipulation fokussierter Laserstrahlung erfordert zudem eine
präzise Anpassung der Fokuslage auf das Gewebe. Dafür werden visuelle, haptische und visuell haptische Assistenzfunktionen eingeführt. Diese unterstützen den Anwender bei Teleoperation
zur Einstellung eines optimalen Arbeitsabstandes. In einer Anwenderstudie werden Vorteile der
visuell-haptischen Assistenz nachgewiesen. Die Systemperformanz und Gebrauchstauglichkeit
des robotischen Gesamtsystems werden in einer weiteren Anwenderstudie untersucht. Analog zu
einem klinischen Einsatz verfolgen die Probanden mit einem Laserspot vorgegebene Sollpfade. Die
mittlere Positioniergenauigkeit des Spots beträgt dabei 0,5 mm. Zur Automatisierung der Ablation
werden abschließend Methoden der bildgestützten Regelung vorgestellt. Experimente bestätigen
einen positiven Effekt der Automationskonzepte fĂĽr die kontaktfreie Laserchirurgie
- …