14,537 research outputs found

    Horizontal accuracy assessment of very high resolution Google Earth images in the city of Rome, Italy

    Get PDF
    Google Earth (GE) has recently become the focus of increasing interest and popularity among available online virtual globes used in scientific research projects, due to the free and easily accessed satellite imagery provided with global coverage. Nevertheless, the uses of this service raises several research questions on the quality and uncertainty of spatial data (e.g. positional accuracy, precision, consistency), with implications for potential uses like data collection and validation. This paper aims to analyze the horizontal accuracy of very high resolution (VHR) GE images in the city of Rome (Italy) for the years 2007, 2011, and 2013. The evaluation was conducted by using both Global Positioning System ground truth data and cadastral photogrammetric vertex as independent check points. The validation process includes the comparison of histograms, graph plots, tests of normality, azimuthal direction errors, and the calculation of standard statistical parameters. The results show that GE VHR imageries of Rome have an overall positional accuracy close to 1 m, sufficient for deriving ground truth samples, measurements, and large-scale planimetric maps

    ๋‹จ์ผ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์˜ ํ‘œํ˜„์  ์ „์‹  3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ •

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021. 2. ์ด๊ฒฝ๋ฌด.Human is the most centric and interesting object in our life: many human-centric techniques and studies have been proposed from both industry and academia, such as motion capture and human-computer interaction. Recovery of accurate 3D geometry of human (i.e., 3D human pose and shape) is a key component of the human-centric techniques and studies. With the rapid spread of cameras, a single RGB image has become a popular input, and many single RGB-based 3D human pose and shape estimation methods have been proposed. The 3D pose and shape of the whole body, which includes hands and face, provides expressive and rich information, including human intention and feeling. Unfortunately, recovering the whole-body 3D pose and shape is greatly challenging; thus, it has been attempted by few works, called expressive methods. Instead of directly solving the expressive 3D pose and shape estimation, the literature has been developed for recovery of the 3D pose and shape of each part (i.e., body, hands, and face) separately, called part-specific methods. There are several more simplifications. For example, many works estimate only 3D pose without shape because additional 3D shape estimation makes the problem much harder. In addition, most works assume a single person case and do not consider a multi-person case. Therefore, there are several ways to categorize current literature; 1) part-specific methods and expressive methods, 2) 3D human pose estimation methods and 3D human pose and shape estimation methods, and 3) methods for a single person and methods for multiple persons. The difficulty increases while the outputs of methods become richer by changing from part-specific to expressive, from 3D pose estimation to 3D pose and shape estimation, and from a single person case to multi-person case. This dissertation introduces three approaches towards expressive 3D multi-person pose and shape estimation from a single image; thus, the output can finally provide the richest information. The first approach is for 3D multi-person body pose estimation, the second one is 3D multi-person body pose and shape estimation, and the final one is expressive 3D multi-person pose and shape estimation. Each approach tackles critical limitations of previous state-of-the-art methods, thus bringing the literature closer to the real-world environment. First, a 3D multi-person body pose estimation framework is introduced. In contrast to the single person case, the multi-person case additionally requires camera-relative 3D positions of the persons. Estimating the camera-relative 3D position from a single image involves high depth ambiguity. The proposed framework utilizes a deep image feature with the camera pinhole model to recover the camera-relative 3D position. The proposed framework can be combined with any 3D single person pose and shape estimation methods for 3D multi-person pose and shape. Therefore, the following two approaches focus on the single person case and can be easily extended to the multi-person case by using the framework of the first approach. Second, a 3D multi-person body pose and shape estimation method is introduced. It extends the first approach to additionally predict accurate 3D shape while its accuracy significantly outperforms previous state-of-the-art methods by proposing a new target representation, lixel-based 1D heatmap. Finally, an expressive 3D multi-person pose and shape estimation method is introduced. It integrates the part-specific 3D pose and shape of the above approaches; thus, it can provide expressive 3D human pose and shape. In addition, it boosts the accuracy of the estimated 3D pose and shape by proposing a 3D positional pose-guided 3D rotational pose prediction system. The proposed approaches successfully overcome the limitations of the previous state-of-the-art methods. The extensive experimental results demonstrate the superiority of the proposed approaches in both qualitative and quantitative ways.์ธ๊ฐ„์€ ์šฐ๋ฆฌ์˜ ์ผ์ƒ์ƒํ™œ์—์„œ ๊ฐ€์žฅ ์ค‘์‹ฌ์ด ๋˜๊ณ  ํฅ๋ฏธ๋กœ์šด ๋Œ€์ƒ์ด๋‹ค. ๊ทธ์— ๋”ฐ๋ผ ๋ชจ์…˜ ์บก์ฒ˜, ์ธ๊ฐ„-์ปดํ“จํ„ฐ ์ธํ„ฐ๋ ‰์…˜ ๋“ฑ ๋งŽ์€ ์ธ๊ฐ„์ค‘์‹ฌ์˜ ๊ธฐ์ˆ ๊ณผ ํ•™๋ฌธ์ด ์‚ฐ์—…๊ณ„์™€ ํ•™๊ณ„์—์„œ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ธ๊ฐ„์˜ ์ •ํ™•ํ•œ 3D ๊ธฐํ•˜ (์ฆ‰, ์ธ๊ฐ„์˜ 3D ์ž์„ธ์™€ ํ˜•ํƒœ)๋ฅผ ๋ณต์›ํ•˜๋Š” ๊ฒƒ์€ ์ธ๊ฐ„์ค‘์‹ฌ ๊ธฐ์ˆ ๊ณผ ํ•™๋ฌธ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ถ€๋ถ„ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์นด๋ฉ”๋ผ์˜ ๋น ๋ฅธ ๋Œ€์ค‘ํ™”๋กœ ์ธํ•ด ๋‹จ์ผ ์ด๋ฏธ์ง€๋Š” ๋งŽ์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋„๋ฆฌ ์“ฐ์ด๋Š” ์ž…๋ ฅ์ด ๋˜์—ˆ๊ณ , ๊ทธ๋กœ ์ธํ•ด ๋งŽ์€ ๋‹จ์ผ ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜์˜ 3D ์ธ๊ฐ„ ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ • ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์†๊ณผ ๋ฐœ์„ ํฌํ•จํ•œ ์ „์‹ ์˜ 3D ์ž์„ธ์™€ ํ˜•ํƒœ๋Š” ์ธ๊ฐ„์˜ ์˜๋„์™€ ๋Š๋‚Œ์„ ํฌํ•จํ•œ ํ‘œํ˜„์ ์ด๊ณ  ํ’๋ถ€ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ „์‹ ์˜ 3D ์ž์„ธ์™€ ํ˜•ํƒœ๋ฅผ ๋ณต์›ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ์˜ค์ง ๊ทน์†Œ์ˆ˜์˜ ๋ฐฉ๋ฒ•๋งŒ์ด ์ด๋ฅผ ํ’€๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋˜์—ˆ๊ณ , ์ด๋ฅผ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ํ‘œํ˜„์ ์ธ ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ํ‘œํ˜„์ ์ธ 3D ์ž์„ธ์™€ ํ˜•ํƒœ๋ฅผ ํ•œ ๋ฒˆ์— ๋ณต์›ํ•˜๋Š” ๊ฒƒ ๋Œ€์‹ , ์‚ฌ๋žŒ์˜ ๋ชธ, ์†, ๊ทธ๋ฆฌ๊ณ  ์–ผ๊ตด์˜ 3D ์ž์„ธ์™€ ํ˜•ํƒœ๋ฅผ ๋”ฐ๋กœ ๋ณต์›ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ๋ถ€๋ถ„ ํŠน์œ  ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์˜ ๊ฐ„๋‹จํ™” ์ด์™ธ์—๋„ ๋ช‡ ๊ฐ€์ง€์˜ ๊ฐ„๋‹จํ™”๊ฐ€ ๋” ์กด์žฌํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋งŽ์€ ๋ฐฉ๋ฒ•์€ 3D ํ˜•ํƒœ๋ฅผ ์ œ์™ธํ•œ 3D ์ž์„ธ๋งŒ์„ ์ถ”์ •ํ•œ๋‹ค. ์ด๋Š” ์ถ”๊ฐ€์ ์ธ 3D ํ˜•ํƒœ ์ถ”์ •์ด ๋ฌธ์ œ๋ฅผ ๋” ์–ด๋ ต๊ฒŒ ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋˜ํ•œ, ๋Œ€๋ถ€๋ถ„์˜ ๋ฐฉ๋ฒ•์€ ์˜ค์ง ๋‹จ์ผ ์‚ฌ๋žŒ์˜ ๊ฒฝ์šฐ๋งŒ ๊ณ ๋ คํ•˜๊ณ  ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์˜ ๊ฒฝ์šฐ๋Š” ๊ณ ๋ คํ•˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ, ํ˜„์žฌ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค์€ ๋ช‡ ๊ฐ€์ง€ ๊ธฐ์ค€์— ์˜ํ•ด ๋ถ„๋ฅ˜๋  ์ˆ˜ ์žˆ๋‹ค; 1) ๋ถ€๋ถ„ ํŠน์œ  ๋ฐฉ๋ฒ• vs. ํ‘œํ˜„์  ๋ฐฉ๋ฒ•, 2) 3D ์ž์„ธ ์ถ”์ • ๋ฐฉ๋ฒ• vs. 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ • ๋ฐฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  3) ๋‹จ์ผ ์‚ฌ๋žŒ์„ ์œ„ํ•œ ๋ฐฉ๋ฒ• vs. ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์„ ์œ„ํ•œ ๋ฐฉ๋ฒ•. ๋ถ€๋ถ„ ํŠน์œ ์—์„œ ํ‘œํ˜„์ ์œผ๋กœ, 3D ์ž์„ธ ์ถ”์ •์—์„œ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ •์œผ๋กœ, ๋‹จ์ผ ์‚ฌ๋žŒ์—์„œ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์œผ๋กœ ๊ฐˆ์ˆ˜๋ก ์ถ”์ •์ด ๋” ์–ด๋ ค์›Œ์ง€์ง€๋งŒ, ๋” ํ’๋ถ€ํ•œ ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ๋‹จ์ผ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์˜ ํ‘œํ˜„์ ์ธ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ •์„ ํ–ฅํ•˜๋Š” ์„ธ ๊ฐ€์ง€์˜ ์ ‘๊ทผ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์ตœ์ข…์ ์œผ๋กœ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๊ฐ€์žฅ ํ’๋ถ€ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ ‘๊ทผ๋ฒ•์€ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์„ ์œ„ํ•œ 3D ์ž์„ธ ์ถ”์ •์ด๊ณ , ๋‘ ๋ฒˆ์งธ๋Š” ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์„ ์œ„ํ•œ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ •์ด๊ณ , ๊ทธ๋ฆฌ๊ณ  ๋งˆ์ง€๋ง‰์€ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์„ ์œ„ํ•œ ํ‘œํ˜„์ ์ธ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ •์„ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ฐ ์ ‘๊ทผ๋ฒ•์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์ด ๊ฐ€์ง„ ์ค‘์š”ํ•œ ํ•œ๊ณ„์ ๋“ค์„ ํ•ด๊ฒฐํ•˜์—ฌ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค์ด ์‹ค์ƒํ™œ์—์„œ ์“ฐ์ผ ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ ‘๊ทผ๋ฒ•์€ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์„ ์œ„ํ•œ 3D ์ž์„ธ ์ถ”์ • ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ๋‹จ์ผ ์‚ฌ๋žŒ์˜ ๊ฒฝ์šฐ์™€๋Š” ๋‹ค๋ฅด๊ฒŒ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์˜ ๊ฒฝ์šฐ ์‚ฌ๋žŒ๋งˆ๋‹ค ์นด๋ฉ”๋ผ ์ƒ๋Œ€์ ์ธ 3D ์œ„์น˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ์นด๋ฉ”๋ผ ์ƒ๋Œ€์ ์ธ 3D ์œ„์น˜๋ฅผ ๋‹จ์ผ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ๋†’์€ ๊นŠ์ด ๋ชจํ˜ธ์„ฑ์„ ๋™๋ฐ˜ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์‹ฌ์ธต ์ด๋ฏธ์ง€ ํ”ผ์ณ์™€ ์นด๋ฉ”๋ผ ํ•€ํ™€ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์นด๋ฉ”๋ผ ์ƒ๋Œ€์ ์ธ 3D ์œ„์น˜๋ฅผ ๋ณต์›ํ•œ๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์–ด๋–ค ๋‹จ์ผ ์‚ฌ๋žŒ์„ ์œ„ํ•œ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ • ๋ฐฉ๋ฒ•๊ณผ ํ•ฉ์ณ์งˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋‹ค์Œ์— ์†Œ๊ฐœ๋  ๋‘ ์ ‘๊ทผ๋ฒ•์€ ์˜ค์ง ๋‹จ์ผ ์‚ฌ๋žŒ์„ ์œ„ํ•œ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ •์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค. ๋‹ค์Œ์— ์†Œ๊ฐœ๋  ๋‘ ์ ‘๊ทผ๋ฒ•์—์„œ ์ œ์•ˆ๋œ ๋‹จ์ผ ์‚ฌ๋žŒ์„ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ์ฒซ ๋ฒˆ์งธ ์ ‘๊ทผ๋ฒ•์—์„œ ์†Œ๊ฐœ๋˜๋Š” ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์„ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‰ฝ๊ฒŒ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์˜ ๊ฒฝ์šฐ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์ ‘๊ทผ๋ฒ•์€ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์„ ์œ„ํ•œ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ • ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ฒซ ๋ฒˆ์งธ ์ ‘๊ทผ๋ฒ•์„ ํ™•์žฅํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์ถ”๊ฐ€๋กœ 3D ํ˜•ํƒœ๋ฅผ ์ถ”์ •ํ•˜๊ฒŒ ํ•œ๋‹ค. ๋†’์€ ์ •ํ™•๋„๋ฅผ ์œ„ํ•ด ๋ฆญ์…€ ๊ธฐ๋ฐ˜์˜ 1D ํžˆํŠธ๋งต์„ ์ œ์•ˆํ•˜๊ณ , ์ด๋กœ ์ธํ•ด ๊ธฐ์กด์— ๋ฐœํ‘œ๋œ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ํฐ ํญ์œผ๋กœ ๋†’์€ ์„ฑ๋Šฅ์„ ์–ป๋Š”๋‹ค. ๋งˆ์ง€๋ง‰ ์ ‘๊ทผ๋ฒ•์€ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์„ ์œ„ํ•œ ํ‘œํ˜„์ ์ธ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ ์ถ”์ • ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด๊ฒƒ์€ ๋ชธ, ์†, ๊ทธ๋ฆฌ๊ณ  ์–ผ๊ตด๋งˆ๋‹ค 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ๋ฅผ ํ•˜๋‚˜๋กœ ํ†ตํ•ฉํ•˜์—ฌ ํ‘œํ˜„์ ์ธ 3D ์ž์„ธ ๋ฐ ํ˜•ํƒœ๋ฅผ ์–ป๋Š”๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ์ด๊ฒƒ์€ 3D ์œ„์น˜ ํฌ์ฆˆ ๊ธฐ๋ฐ˜์˜ 3D ํšŒ์ „ ํฌ์ฆˆ ์ถ”์ •๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•จ์œผ๋กœ์จ ๊ธฐ์กด์— ๋ฐœํ‘œ๋œ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ํ›จ์”ฌ ๋†’์€ ์„ฑ๋Šฅ์„ ์–ป๋Š”๋‹ค. ์ œ์•ˆ๋œ ์ ‘๊ทผ๋ฒ•๋“ค์€ ๊ธฐ์กด์— ๋ฐœํ‘œ๋˜์—ˆ๋˜ ๋ฐฉ๋ฒ•๋“ค์ด ๊ฐ–๋Š” ํ•œ๊ณ„์ ๋“ค์„ ์„ฑ๊ณต์ ์œผ๋กœ ๊ทน๋ณตํ•œ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์  ๊ฒฐ๊ณผ๊ฐ€ ์ •์„ฑ์ , ์ •๋Ÿ‰์ ์œผ๋กœ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์˜ ํšจ์šฉ์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค.1 Introduction 1 1.1 Background and Research Issues 1 1.2 Outline of the Dissertation 3 2 3D Multi-Person Pose Estimation 7 2.1 Introduction 7 2.2 Related works 10 2.3 Overview of the proposed model 13 2.4 DetectNet 13 2.5 PoseNet 14 2.5.1 Model design 14 2.5.2 Loss function 14 2.6 RootNet 15 2.6.1 Model design 15 2.6.2 Camera normalization 19 2.6.3 Network architecture 19 2.6.4 Loss function 20 2.7 Implementation details 20 2.8 Experiment 21 2.8.1 Dataset and evaluation metric 21 2.8.2 Experimental protocol 22 2.8.3 Ablation study 23 2.8.4 Comparison with state-of-the-art methods 25 2.8.5 Running time of the proposed framework 31 2.8.6 Qualitative results 31 2.9 Conclusion 34 3 3D Multi-Person Pose and Shape Estimation 35 3.1 Introduction 35 3.2 Related works 38 3.3 I2L-MeshNet 41 3.3.1 PoseNet 41 3.3.2 MeshNet 43 3.3.3 Final 3D human pose and mesh 45 3.3.4 Loss functions 45 3.4 Implementation details 47 3.5 Experiment 48 3.5.1 Datasets and evaluation metrics 48 3.5.2 Ablation study 50 3.5.3 Comparison with state-of-the-art methods 57 3.6 Conclusion 60 4 Expressive 3D Multi-Person Pose and Shape Estimation 63 4.1 Introduction 63 4.2 Related works 66 4.3 Pose2Pose 69 4.3.1 PositionNet 69 4.3.2 RotationNet 70 4.4 Expressive 3D human pose and mesh estimation 72 4.4.1 Body part 72 4.4.2 Hand part 73 4.4.3 Face part 73 4.4.4 Training the networks 74 4.4.5 Integration of all parts in the testing stage 74 4.5 Implementation details 77 4.6 Experiment 78 4.6.1 Training sets and evaluation metrics 78 4.6.2 Ablation study 78 4.6.3 Comparison with state-of-the-art methods 82 4.6.4 Running time 87 4.7 Conclusion 87 5 Conclusion and Future Work 89 5.1 Summary and Contributions of the Dissertation 89 5.2 Future Directions 90 5.2.1 Global Context-Aware 3D Multi-Person Pose Estimation 91 5.2.2 Unied Framework for Expressive 3D Human Pose and Shape Estimation 91 5.2.3 Enhancing Appearance Diversity of Images Captured from Multi-View Studio 92 5.2.4 Extension to the video for temporally consistent estimation 94 5.2.5 3D clothed human shape estimation in the wild 94 5.2.6 Robust human action recognition from a video 96 Bibliography 98 ๊ตญ๋ฌธ์ดˆ๋ก 111Docto

    Code Prediction by Feeding Trees to Transformers

    Full text link
    We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3\%, the Deep3 system (Raychev et al 2016) by 14.1\%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4\%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a Facebook internal Python corpus. Our code and data preparation pipeline will be available in open source

    Hybrid Architectures for Object Pose and Velocity Tracking at the Intersection of Kalman Filtering and Machine Learning

    Get PDF
    The study of object perception algorithms is fundamental for the development of robotic platforms capable of planning and executing actions involving objects with high precision, reliability and safety. Indeed, this topic has been vastly explored in both the robotic and computer vision research communities using diverse techniques, ranging from classical Bayesian filtering to more modern Machine Learning techniques, and complementary sensing modalities such as vision and touch. Recently, the ever-growing availability of tools for synthetic data generation has substantially increased the adoption of Deep Learning for both 2D tasks, as object detection and segmentation, and 6D tasks, such as object pose estimation and tracking. The proposed methods exhibit interesting performance on computer vision benchmarks and robotic tasks, e.g. using object pose estimation for grasp planning purposes. Nonetheless, they generally do not consider useful information connected with the physics of the object motion and the peculiarities and requirements of robotic systems. Examples are the necessity to provide well-behaved output signals for robot motion control, the possibility to integrate modelling priors on the motion of the object and algorithmic priors. These help exploit the temporal correlation of the object poses, handle the pose uncertainties and mitigate the effect of outliers. Most of these concepts are considered in classical approaches, e.g. from the Bayesian and Kalman filtering literature, which however are not as powerful as Deep Learning in handling visual data. As a consequence, the development of hybrid architectures that combine the best features from both worlds is particularly appealing in a robotic setting. Motivated by these considerations, in this Thesis, I aimed at devising hybrid architectures for object perception, focusing on the task of object pose and velocity tracking. The proposed architectures use Kalman filtering supported by state-of-the-art Deep Neural Networks to track the 6D pose and velocity of objects from images. The devised solutions exhibit state-of-the-art performance, increased modularity and do not require training to implement the actual tracking behaviors. Furthermore, they can track even fast object motions despite the possible non-negligible inference times of the adopted neural networks. Also, by relying on data-driven Kalman filtering, I explored a paradigm that enables to track the state of systems that cannot be easily modeled analytically. Specifically, I used this approach to learn the measurement model of soft 3D tactile sensors and address the problem of tracking the sliding motion of hand-held objects

    Evaluating the differences and accuracies between GNSS applications using PPP

    Get PDF
    Global Navigation Satellite Systems (GNSS) are satellite systems with global coverage. There are currently several GNSS systems in operation today including the United States NAVSTAR Global Positioning System, Russian GLONASS, Chinese Beidou and the European Unionโ€™s Galileo system. The Galileo and Beidou systems are currently undergoing upgrading in order to achieve more sustainable and comprehensive worldwide exposure, ultimately providing users with a broader option of systems and wider more reliable coverage. In recent years, in addition to the GPS constellation, the ability to utilise extra satellites made available through the GLONASS and Beidou systems has enhanced the capabilities and possible applications of the precise point positioning (PPP) method. Precise Point Positioning has been used for the last decade as a cost-effective alternative to conventional DGPS-Differential GPS with an estimated precision adequate for many applications. PPP requires handling different types of errors using proper models. PPP precision varies with the use of observations from different satellite systems (GPS, GLONASS and mixed GPS/GLONASS/Beidou) and the duration of observations. However, the fundamental differences between GPS, GLONASS, Beidou and Galileo and the lack of a fully tested global tracking network of multi-Global Navigation Satellite Systems necessitate the evaluation of their combined use. More studies are required in order to confirm the reliability and accuracy of the results obtained by the various methods of PPP. This is outside the scope of this paper. This research paper will evaluate and analyse the accuracy and reliability between different GNSS systems using the Precise Point Positioning technique with emphasis on the function and performance of single systems compared with combined GNSS systems. A methodology was designed to ensure accurate and reliable results have been achieved. Solutions generated from identical data will be compared for bias, accuracy and reliability between single standalone GPS and combined GNSS systems. This study focused on the performance of these systems over a twenty four hour observation period, decimated into 1, 2, 6, 12 and 24 hours. The study found that the reliability and performance of GNSS systems over standalone GPS was insignificant over a twenty four hour period. In fact, where satellite availability and constellation are at a premium, standalone GPS systems can produce equivalent quality results compared with combined GNSS. Having said this, the combined GNSS systems achieved quicker convergence times than standalone systems. With limited access and availability to resources, in particular GNSS receivers, the results can be seen as preliminary testing enhancing the knowledge of GNSS users. Nonetheless, this dissertation covers a wide range of topics and field testing providing relevant reliable data on the accuracy, precision and performance of both standalone and combined Global Navigation Satellite Systems

    Implementation of a acceleration estimator based compensation scheme to increase load data accuracy for a robotic testing system for CPR-manikins

    Get PDF
    Laerdal Medical is a producer of Cardiopulmonary Resuscitation (CPR) training manikins, all of which undergo rigorous endurance and accuracy testing. This work proposes an acceleration estimator based compensation scheme for a industrial robot manipulator product testing system with the intention of increasing load data accuracy for the purpose of product review and calibration. As part of the compensation scheme four different acceleration estimators are implemented and compared. Results indicate that the compensation scheme increases the load data accuracy by 1.5 - 6 % of the reference value depending on compression depth and spring rate. However the accuracy goal of 0.4 [kg] is not reached. The work has also uncovered the presence of position error in the robot. Thus, further improvement to the compensation scheme and positional error compensation is required
    • โ€ฆ
    corecore