222 research outputs found
A Framework for Devanagari Script-based Captcha
Human Interactive Proofs (HIPs) are automatic reverse Turing tests designed
to distinguish between various groups of users. Completely Automatic Public
Turing test to tell Computers and Humans Apart (CAPTCHA) is a HIP system that
distinguish between humans and malicious computer programs. Many CAPTCHAs have
been proposed in the literature that text-graphical based, audio-based,
puzzle-based and mathematical questions-based. The design and implementation of
CAPTCHAs fall in the realm of Artificial Intelligence. We aim to utilize
CAPTCHAs as a tool to improve the security of Internet based applications. In
this paper we present a framework for a text-based CAPTCHA based on Devanagari
script which can exploit the difference in the reading proficiency between
humans and computer programs. Our selection of Devanagari script-based CAPTCHA
is based on the fact that it is used by a large number of Indian languages
including Hindi which is the third most spoken language. There is potential for
an exponential rise in the applications that are likely to be developed in that
script thereby making it easy to secure Indian language based applications.Comment: 10 pages, 8 Figures, CCSEA 2011 - First International Conference,
Chennai, July 15-17, 201
Proposing a Scheme for Human Interactive Proof Test using Plasma Effect
Human Interactive Proofs (HIPs) are automatic inverse Turing tests, which are intended to differentiate between people and malicious computer programs. The mission of making good HIP system is a challenging issue, since the resultant HIP must be secure against attacks and in the same time it must be practical for humans. Text-based HIPs is one of the most popular HIPs types. It exploits the capability of humans to recite text images more than Optical Character Recognition (OCR), but the current text-based HIPs are not well-matched with rapid development of computer vision techniques, since they are either vey simply passed or very hard to resolve, thus this motivate that continuous efforts are required to improve the development of HIPs base text. In this paper, a new proposed scheme is designed for animated text-based HIP; this scheme exploits the gap between the usual perception of human and the ability of computer to mimic this perception and to achieve more secured and more human usable HIP. This scheme could prevent attacks since it's hard for the machine to distinguish characters with animation environment displayed by digital video, but it's certainly still easy and practical to be used by humans because humans are attuned to perceiving motion easily. The proposed scheme has been tested by many Optical Character Recognition applications, and it overtakes all these tests successfully and it achieves a high usability rate of 95%
Deep Self-Taught Learning for Handwritten Character Recognition
Recent theoretical and empirical work in statistical machine learning has
demonstrated the importance of learning algorithms for deep architectures,
i.e., function classes obtained by composing multiple non-linear
transformations. Self-taught learning (exploiting unlabeled examples or
examples from other distributions) has already been applied to deep learners,
but mostly to show the advantage of unlabeled examples. Here we explore the
advantage brought by {\em out-of-distribution examples}. For this purpose we
developed a powerful generator of stochastic variations and noise processes for
character images, including not only affine transformations but also slant,
local elastic deformations, changes in thickness, background images, grey level
changes, contrast, occlusion, and various types of noise. The
out-of-distribution examples are obtained from these highly distorted images or
by including examples of object classes different from those in the target test
set. We show that {\em deep learners benefit more from out-of-distribution
examples than a corresponding shallow learner}, at least in the area of
handwritten character recognition. In fact, we show that they beat previously
published results and reach human-level performance on both handwritten digit
classification and 62-class handwritten character recognition
CAPTCHA Types and Breaking Techniques: Design Issues, Challenges, and Future Research Directions
The proliferation of the Internet and mobile devices has resulted in
malicious bots access to genuine resources and data. Bots may instigate
phishing, unauthorized access, denial-of-service, and spoofing attacks to
mention a few. Authentication and testing mechanisms to verify the end-users
and prohibit malicious programs from infiltrating the services and data are
strong defense systems against malicious bots. Completely Automated Public
Turing test to tell Computers and Humans Apart (CAPTCHA) is an authentication
process to confirm that the user is a human hence, access is granted. This
paper provides an in-depth survey on CAPTCHAs and focuses on two main things:
(1) a detailed discussion on various CAPTCHA types along with their advantages,
disadvantages, and design recommendations, and (2) an in-depth analysis of
different CAPTCHA breaking techniques. The survey is based on over two hundred
studies on the subject matter conducted since 2003 to date. The analysis
reinforces the need to design more attack-resistant CAPTCHAs while keeping
their usability intact. The paper also highlights the design challenges and
open issues related to CAPTCHAs. Furthermore, it also provides useful
recommendations for breaking CAPTCHAs
Evaluating the usability and security of a video CAPTCHA
A CAPTCHA is a variation of the Turing test, in which a challenge is used to distinguish humans from computers (`bots\u27) on the internet. They are commonly used to prevent the abuse of online services. CAPTCHAs discriminate using hard articial intelligence problems: the most common type requires a user to transcribe distorted characters displayed within a noisy image. Unfortunately, many users and them frustrating and break rates as high as 60% have been reported (for Microsoft\u27s Hotmail). We present a new CAPTCHA in which users provide three words (`tags\u27) that describe a video. A challenge is passed if a user\u27s tag belongs to a set of automatically generated ground-truth tags. In an experiment, we were able to increase human pass rates for our video CAPTCHAs from 69.7% to 90.2% (184 participants over 20 videos). Under the same conditions, the pass rate for an attack submitting the three most frequent tags (estimated over 86,368 videos) remained nearly constant (5% over the 20 videos, roughly 12.9% over a separate sample of 5146 videos). Challenge videos were taken from YouTube.com. For each video, 90 tags were added from related videos to the ground-truth set; security was maintained by pruning all tags with a frequency 0.6%. Tag stemming and approximate matching were also used to increase human pass rates. Only 20.1% of participants preferred text-based CAPTCHAs, while 58.2% preferred our video-based alternative. Finally, we demonstrate how our technique for extending the ground truth tags allows for different usability/security trade-offs, and discuss how it can be applied to other types of CAPTCHAs
Mothers\u27 Adaptation to Caring for a New Baby
To date, most research on parents\u27 adjustment after adding a new baby to their family unit has focused on mothers\u27 initial transition to parenthood. This past research has examined changes in mothers\u27 marital satisfaction and perceived well-being across the transition, and has compared their prenatal expectations to their postnatal experiences. This project assessed first-time and experienced mothers\u27 stress and satisfaction associated with parenting, their adjustment to competing demands, and their perceived well-being longitudinally before and after the birth of a baby. Additionally, how maternal and child-related variables influenced the trajectory of mothers\u27 postnatal adaptation was assessed. These variables included mothers\u27 age, their education level, their prenatal expectations and postnatal experiences concerning shared infant care, their satisfaction with the division of infant caregiving, and their perceptions of their infant\u27s temperament. Mothers (N = 136) completed an online survey during their third trimester and additional online surveys when their baby was approximately 2, 4, 6, and 8 weeks old.;First-time mothers prenatally expected a more equal division of infant caregiving between themselves and their partners than did experienced mothers. Both first-time and experienced mothers reported less assistance from their partners than they had prenatally expected. Additionally, they experienced almost twice as many violated expectations than met expectations. Growth curve modeling revealed that a cubic function of time best fit the trajectory of mothers\u27 postnatal parenting satisfaction. Mothers reported less parenting satisfaction at 4 weeks, compared to 2 and 6 weeks, and reported stability in their satisfaction between 6 and 8 weeks. A quadratic function of time best fit the trajectories of mothers\u27 postnatal parenting stress and adjustment to the demands of their baby. Mothers reported more stress and difficulty adjusting to their baby\u27s demands at 4 and 6 weeks, compared to 2 and 8 weeks. A linear function of time best fit the trajectories of mothers\u27 adjustment to home demands, generalized state anxiety, and depressive symptoms. Mothers reported less difficulty meeting home demands, less generalized anxiety, and fewer depressive symptoms across the postnatal period. Mothers\u27 violated expectations were associated with level differences in all aspects of mothers\u27 postnatal adaptation except their adjustment to home demands. Specifically, more violated expectations, in number or in magnitude, were associated with poorer postnatal adaptation. Mothers\u27 violated expectations were not associated with the slope of mothers\u27 postnatal adaptation trajectories. Exploratory models revealed that other maternal and child-related variables also impacted the level and slope of mothers\u27 postnatal adaptation.;Overall, first-time and experienced mothers were more similar than different in regards to their postnatal adaptation. This study suggests that prior findings concerning adults\u27 initial transition to parenthood may also apply to adults during each addition of a new baby into the family unit. Additionally, mothers who reported less of a mismatch between their expectations and experiences concerning shared infant care had fewer issues adapting the postnatal period. Thus, methods to increase the assistance mothers receive from their partner should be sought. Limitations of this study and suggestions for future research are also discussed
Enhancing Online Security with Image-based Captchas
Given the data loss, productivity, and financial risks posed by security breaches, there is a great need to protect online systems from automated attacks. Completely Automated Public Turing Tests to Tell Computers and Humans Apart, known as CAPTCHAs, are commonly used as one layer in providing online security. These tests are intended to be easily solvable by legitimate human users while being challenging for automated attackers to successfully complete. Traditionally, CAPTCHAs have asked users to perform tasks based on text recognition or categorization of discrete images to prove whether or not they are legitimate human users. Over time, the efficacy of these CAPTCHAs has been eroded by improved optical character recognition, image classification, and machine learning techniques that can accurately solve many CAPTCHAs at rates approaching those of humans. These CAPTCHAs can also be difficult to complete using the touch-based input methods found on widely used tablets and smartphones.;This research proposes the design of CAPTCHAs that address the shortcomings of existing implementations. These CAPTCHAs require users to perform different image-based tasks including face detection, face recognition, multimodal biometrics recognition, and object recognition to prove they are human. These are tasks that humans excel at but which remain difficult for computers to complete successfully. They can also be readily performed using click- or touch-based input methods, facilitating their use on both traditional computers and mobile devices.;Several strategies are utilized by the CAPTCHAs developed in this research to enable high human success rates while ensuring negligible automated attack success rates. One such technique, used by fgCAPTCHA, employs image quality metrics and face detection algorithms to calculate a fitness value representing the simulated performance of human users and automated attackers, respectively, at solving each generated CAPTCHA image. A genetic learning algorithm uses these fitness values to determine customized generation parameters for each CAPTCHA image. Other approaches, including gradient descent learning, artificial immune systems, and multi-stage performance-based filtering processes, are also proposed in this research to optimize the generated CAPTCHA images.;An extensive RESTful web service-based evaluation platform was developed to facilitate the testing and analysis of the CAPTCHAs developed in this research. Users recorded over 180,000 attempts at solving these CAPTCHAs using a variety of devices. The results show the designs created in this research offer high human success rates, up to 94.6\% in the case of aiCAPTCHA, while ensuring resilience against automated attacks
Mathematical Expression Recognition based on Probabilistic Grammars
[EN] Mathematical notation is well-known and used all over the
world. Humankind has evolved from simple methods representing
countings to current well-defined math notation able to account for
complex problems. Furthermore, mathematical expressions constitute a
universal language in scientific fields, and many information
resources containing mathematics have been created during the last
decades. However, in order to efficiently access all that information,
scientific documents have to be digitized or produced directly in
electronic formats.
Although most people is able to understand and produce mathematical
information, introducing math expressions into electronic devices
requires learning specific notations or using editors. Automatic
recognition of mathematical expressions aims at filling this gap
between the knowledge of a person and the input accepted by
computers. This way, printed documents containing math expressions
could be automatically digitized, and handwriting could be used for
direct input of math notation into electronic devices.
This thesis is devoted to develop an approach for mathematical
expression recognition. In this document we propose an approach for
recognizing any type of mathematical expression (printed or
handwritten) based on probabilistic grammars. In order to do so, we
develop the formal statistical framework such that derives several
probability distributions. Along the document, we deal with the
definition and estimation of all these probabilistic sources of
information. Finally, we define the parsing algorithm that globally
computes the most probable mathematical expression for a given input
according to the statistical framework.
An important point in this study is to provide objective performance
evaluation and report results using public data and standard
metrics. We inspected the problems of automatic evaluation in this
field and looked for the best solutions. We also report several
experiments using public databases and we participated in several
international competitions. Furthermore, we have released most of the
software developed in this thesis as open source.
We also explore some of the applications of mathematical expression
recognition. In addition to the direct applications of transcription
and digitization, we report two important proposals. First, we
developed mucaptcha, a method to tell humans and computers apart by
means of math handwriting input, which represents a novel application
of math expression recognition. Second, we tackled the problem of
layout analysis of structured documents using the statistical
framework developed in this thesis, because both are two-dimensional
problems that can be modeled with probabilistic grammars.
The approach developed in this thesis for mathematical expression
recognition has obtained good results at different levels. It has
produced several scientific publications in international conferences
and journals, and has been awarded in international competitions.[ES] La notación matemática es bien conocida y se utiliza en todo el
mundo. La humanidad ha evolucionado desde simples métodos para
representar cuentas hasta la notación formal actual capaz de modelar
problemas complejos. Además, las expresiones matemáticas constituyen
un idioma universal en el mundo científico, y se han creado muchos
recursos que contienen matemáticas durante las últimas décadas. Sin
embargo, para acceder de forma eficiente a toda esa información, los
documentos científicos han de ser digitalizados o producidos
directamente en formatos electrónicos.
Aunque la mayoría de personas es capaz de entender y producir
información matemática, introducir expresiones matemáticas en
dispositivos electrónicos requiere aprender notaciones especiales o
usar editores. El reconocimiento automático de expresiones matemáticas
tiene como objetivo llenar ese espacio existente entre el conocimiento
de una persona y la entrada que aceptan los ordenadores. De este modo,
documentos impresos que contienen fórmulas podrían digitalizarse
automáticamente, y la escritura se podría utilizar para introducir
directamente notación matemática en dispositivos electrónicos.
Esta tesis está centrada en desarrollar un método para reconocer
expresiones matemáticas. En este documento proponemos un método para
reconocer cualquier tipo de fórmula (impresa o manuscrita) basado en
gramáticas probabilísticas. Para ello, desarrollamos el marco
estadístico formal que deriva varias distribuciones de probabilidad. A
lo largo del documento, abordamos la definición y estimación de todas
estas fuentes de información probabilística. Finalmente, definimos el
algoritmo que, dada cierta entrada, calcula globalmente la expresión
matemática más probable de acuerdo al marco estadístico.
Un aspecto importante de este trabajo es proporcionar una evaluación
objetiva de los resultados y presentarlos usando datos públicos y
medidas estándar. Por ello, estudiamos los problemas de la evaluación
automática en este campo y buscamos las mejores soluciones. Asimismo,
presentamos diversos experimentos usando bases de datos públicas y
hemos participado en varias competiciones internacionales. Además,
hemos publicado como código abierto la mayoría del software
desarrollado en esta tesis.
También hemos explorado algunas de las aplicaciones del reconocimiento
de expresiones matemáticas. Además de las aplicaciones directas de
transcripción y digitalización, presentamos dos propuestas
importantes. En primer lugar, desarrollamos mucaptcha, un método para
discriminar entre humanos y ordenadores mediante la escritura de
expresiones matemáticas, el cual representa una novedosa aplicación
del reconocimiento de fórmulas. En segundo lugar, abordamos el
problema de detectar y segmentar la estructura de documentos
utilizando el marco estadístico formal desarrollado en esta tesis,
dado que ambos son problemas bidimensionales que pueden modelarse con
gramáticas probabilísticas.
El método desarrollado en esta tesis para reconocer expresiones
matemáticas ha obtenido buenos resultados a diferentes niveles. Este
trabajo ha producido varias publicaciones en conferencias
internacionales y revistas, y ha sido premiado en competiciones
internacionales.[CA] La notació matemàtica és ben coneguda i s'utilitza a tot el món. La
humanitat ha evolucionat des de simples mètodes per representar
comptes fins a la notació formal actual capaç de modelar
problemes complexos. A més, les expressions matemàtiques
constitueixen un idioma universal al món científic, i s'han creat
molts recursos que contenen matemàtiques durant les últimes
dècades. No obstant això, per accedir de forma eficient a tota
aquesta informació, els documents científics han de ser
digitalitzats o produïts directament en formats electrònics.
Encara que la majoria de persones és capaç d'entendre i produir
informació matemàtica, introduir expressions matemàtiques en
dispositius electrònics requereix aprendre notacions especials o usar
editors. El reconeixement automàtic d'expressions matemàtiques
té per objectiu omplir aquest espai existent entre el coneixement
d'una persona i l'entrada que accepten els ordinadors. D'aquesta
manera, documents impresos que contenen fórmules podrien
digitalitzar-se automàticament, i l'escriptura es podria utilitzar per
introduir directament notació matemàtica en dispositius electrònics.
Aquesta tesi està centrada en desenvolupar un mètode per reconèixer
expressions matemàtiques. En aquest document proposem un mètode per
reconèixer qualsevol tipus de fórmula (impresa o manuscrita) basat en
gramàtiques probabilístiques. Amb aquesta finalitat, desenvolupem el
marc estadístic formal que deriva diverses distribucions de
probabilitat. Al llarg del document, abordem la definició i estimació
de totes aquestes fonts d'informació probabilística. Finalment,
definim l'algorisme que, donada certa entrada, calcula globalment
l'expressió matemàtica més probable d'acord al marc estadístic.
Un aspecte important d'aquest treball és proporcionar una avaluació
objectiva dels resultats i presentar-los usant dades públiques i
mesures estàndard. Per això, estudiem els problemes de l'avaluació
automàtica en aquest camp i busquem les millors solucions. Així
mateix, presentem diversos experiments usant bases de dades públiques
i hem participat en diverses competicions internacionals. A més, hem
publicat com a codi obert la majoria del software desenvolupat en
aquesta tesi.
També hem explorat algunes de les aplicacions del reconeixement
d'expressions matemàtiques. A més de les aplicacions directes de
transcripció i digitalització, presentem dues propostes
importants. En primer lloc, desenvolupem mucaptcha, un mètode per
discriminar entre humans i ordinadors mitjançant l'escriptura
d'expressions matemàtiques, el qual representa una nova aplicació del
reconeixement de fórmules. En segon lloc, abordem el problema de
detectar i segmentar l'estructura de documents utilitzant el marc
estadístic formal desenvolupat en aquesta tesi, donat que ambdós són
problemes bidimensionals que poden modelar-se amb gramàtiques
probabilístiques.
El mètode desenvolupat en aquesta tesi per reconèixer expressions
matemàtiques ha obtingut bons resultats a diferents nivells. Aquest
treball ha produït diverses publicacions en conferències
internacionals i revistes, i ha sigut premiat en competicions
internacionals.Álvaro Muñoz, F. (2015). Mathematical Expression Recognition based on Probabilistic Grammars [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/51665TESI
- …