9,306 research outputs found

    Real-time on-board pedestrian detection using generic single-stage algorithms and on-road databases

    Full text link
    [EN] Pedestrian detection is a particular case of object detection that helps to reduce accidents in advanced driver-assistance systems and autonomous vehicles. It is not an easy task because of the variability of the objects and the time constraints. A performance comparison of object detection methods, including both GPU and non-GPU implementations over a variety of on-road specific databases, is provided. Computer vision multi-class object detection can be integrated on sensor fusion modules where recall is preferred over precision. For this reason, ad hoc training with a single class for pedestrians has been performed and we achieved a significant increase in recall. Experiments have been carried out on several architectures and a special effort has been devoted to achieve a feasible computational time for a real-time system. Finally, an analysis of the input image size allows to fine-tune the model and get better results with practical costs.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by PRYSTINE project which had received funding within the Electronic Components and Systems for European Leadership Joint Undertaking (ECSEL JU) in collaboration with the European Union's H2020 Framework Programme and National Authorities, under grant agreement no. 783190. It was also funded by Generalitat Valenciana through the Instituto Valenciano de Competitividad Empresarial (IVACE).Ortiz, V.; Del Tejo Catala, O.; Salvador Igual, I.; Perez-Cortes, J. (2020). Real-time on-board pedestrian detection using generic single-stage algorithms and on-road databases. International Journal of Advanced Robotic Systems. 17(5). https://doi.org/10.1177/1729881420929175S175Zhang, S., Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2018). Towards Reaching Human Performance in Pedestrian Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 973-986. doi:10.1109/tpami.2017.2700460Viola, P., Jones, M. J., & Snow, D. (2005). Detecting Pedestrians Using Patterns of Motion and Appearance. International Journal of Computer Vision, 63(2), 153-161. doi:10.1007/s11263-005-6644-8Dollar, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast Feature Pyramids for Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1532-1545. doi:10.1109/tpami.2014.2300479Dollar, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian Detection: An Evaluation of the State of the Art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 743-761. doi:10.1109/tpami.2011.155Munder, S., & Gavrila, D. M. (2006). An Experimental Study on Pedestrian Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1863-1868. doi:10.1109/tpami.2006.217Enzweiler, M., & Gavrila, D. M. (2009). Monocular Pedestrian Detection: Survey and Experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2179-2195. doi:10.1109/tpami.2008.260He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904-1916. doi:10.1109/tpami.2015.2389824McGehee, D. V., Mazzae, E. N., & Baldwin, G. H. S. (2000). Driver Reaction Time in Crash Avoidance Research: Validation of a Driving Simulator Study on a Test Track. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 44(20), 3-320-3-323. doi:10.1177/15419312000440202

    Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning

    Get PDF
    Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vectors and convolutional neural network features. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset, which verifies the effectiveness of our approach.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing

    Full text link
    [EN] This paper presents the `NoisyOffice¿ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository. Finally, a challenge organized by Kaggle Inc. to denoise images, using the database, is described in order to show its suitability for benchmarking of image processing systems.This research was undertaken as part of the project TIN2017-85854-C4-2-R, jointly funded by the Spanish MINECO and FEDER founds.Castro-Bleda, MJ.; España Boquera, S.; Pastor Pellicer, J.; Zamora Martínez, FJ. (2020). The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing. The Computer Journal. 63(11):1658-1667. https://doi.org/10.1093/comjnl/bxz098S165816676311Bozinovic, R. M., & Srihari, S. N. (1989). Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1), 68-83. doi:10.1109/34.23114Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63-84. doi:10.1109/34.824821Vinciarelli, A. (2002). A survey on off-line Cursive Word Recognition. Pattern Recognition, 35(7), 1433-1446. doi:10.1016/s0031-3203(01)00129-7Impedovo, S. (2014). More than twenty years of advancements on Frontiers in handwriting recognition. Pattern Recognition, 47(3), 916-928. doi:10.1016/j.patcog.2013.05.027Baird, H. S. (2007). The State of the Art of Document Image Degradation Modelling. Advances in Pattern Recognition, 261-279. doi:10.1007/978-1-84628-726-8_12Egmont-Petersen, M., de Ridder, D., & Handels, H. (2002). Image processing with neural networks—a review. Pattern Recognition, 35(10), 2279-2301. doi:10.1016/s0031-3203(01)00178-9Marinai, S., Gori, M., & Soda, G. (2005). Artificial neural networks for document analysis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 23-35. doi:10.1109/tpami.2005.4Rehman, A., & Saba, T. (2012). Neural networks for document image preprocessing: state of the art. Artificial Intelligence Review, 42(2), 253-273. doi:10.1007/s10462-012-9337-zLazzara, G., & Géraud, T. (2013). Efficient multiscale Sauvola’s binarization. International Journal on Document Analysis and Recognition (IJDAR), 17(2), 105-123. doi:10.1007/s10032-013-0209-0Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., & Stolz, M. (2010). Ground truth creation for handwriting recognition in historical documents. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems - DAS ’10. doi:10.1145/1815330.1815331Belhedi, A., & Marcotegui, B. (2016). Adaptive scene‐text binarisation on images captured by smartphones. IET Image Processing, 10(7), 515-523. doi:10.1049/iet-ipr.2015.0695Kieu, V. C., Visani, M., Journet, N., Mullot, R., & Domenger, J. P. (2013). An efficient parametrization of character degradation model for semi-synthetic image generation. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP ’13. doi:10.1145/2501115.2501127Fischer, A., Visani, M., Kieu, V. C., & Suen, C. Y. (2013). Generation of learning samples for historical handwriting recognition using image degradation. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP ’13. doi:10.1145/2501115.2501123Journet, N., Visani, M., Mansencal, B., Van-Cuong, K., & Billy, A. (2017). DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. Journal of Imaging, 3(4), 62. doi:10.3390/jimaging3040062Walker, D., Lund, W., & Ringger, E. (2012). A synthetic document image dataset for developing and evaluating historical document processing methods. Document Recognition and Retrieval XIX. doi:10.1117/12.912203Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295-307. doi:10.1109/tpami.2015.2439281Suzuki, K., Horiba, I., & Sugie, N. (2003). Neural edge enhancer for supervised edge enhancement from noisy images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1582-1596. doi:10.1109/tpami.2003.1251151Hidalgo, J. L., España, S., Castro, M. J., & Pérez, J. A. (2005). Enhancement and Cleaning of Handwritten Data by Using Neural Networks. Lecture Notes in Computer Science, 376-383. doi:10.1007/11492429_46Pastor-Pellicer, J., España-Boquera, S., Zamora-Martínez, F., Afzal, M. Z., & Castro-Bleda, M. J. (2015). Insights on the Use of Convolutional Neural Networks for Document Image Binarization. Lecture Notes in Computer Science, 115-126. doi:10.1007/978-3-319-19222-2_10España-Boquera, S., Zamora-Martínez, F., Castro-Bleda, M. J., & Gorbe-Moya, J. (s. f.). Efficient BP Algorithms for General Feedforward Neural Networks. Lecture Notes in Computer Science, 327-336. doi:10.1007/978-3-540-73053-8_33Zamora-Martínez, F., España-Boquera, S., & Castro-Bleda, M. J. (s. f.). Behaviour-Based Clustering of Neural Networks Applied to Document Enhancement. Lecture Notes in Computer Science, 144-151. doi:10.1007/978-3-540-73007-1_18Graves, A., Fernández, S., & Schmidhuber, J. (2007). Multi-dimensional Recurrent Neural Networks. Artificial Neural Networks – ICANN 2007, 549-558. doi:10.1007/978-3-540-74690-4_56Sauvola, J., & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern Recognition, 33(2), 225-236. doi:10.1016/s0031-3203(99)00055-2Pastor-Pellicer, J., Castro-Bleda, M. J., & Adelantado-Torres, J. L. (2015). esCam: A Mobile Application to Capture and Enhance Text Images. Lecture Notes in Computer Science, 601-604. doi:10.1007/978-3-319-19222-2_5

    Expanded Parts Model for Semantic Description of Humans in Still Images

    Get PDF
    We introduce an Expanded Parts Model (EPM) for recognizing human attributes (e.g. young, short hair, wearing suit) and actions (e.g. running, jumping) in still images. An EPM is a collection of part templates which are learnt discriminatively to explain specific scale-space regions in the images (in human centric coordinates). This is in contrast to current models which consist of a relatively few (i.e. a mixture of) 'average' templates. EPM uses only a subset of the parts to score an image and scores the image sparsely in space, i.e. it ignores redundant and random background in an image. To learn our model, we propose an algorithm which automatically mines parts and learns corresponding discriminative templates together with their respective locations from a large number of candidate parts. We validate our method on three recent challenging datasets of human attributes and actions. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.Comment: Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI
    corecore