Abstract:
Currently, Environment perception, 3D objects detection and the distance of objects from
the camera is one of the hot topics in computer vision and in robotics, which is widely
explored by scientists to achieve maximum accuracy of detection for autonomous vehicles.
For reliable and safe driving, it is necessary that self-driving cars can perceive the
environmental surroundings accurately. 3D object detection and their distance estimation
are a challenging task because of different angles of moving vehicles and computational
resources required to process video data. Distance estimation from the camera is used in
all autonomous vehicles and robots for safe driving.
In this research, a two-stage deep learning architecture is proposed for 3D object detection,
their pose estimation and then the distance of objects using monocular cameras installed in
vehicles. In contrast to stoneworker methods which only regress 3D dimensions, we
propose a method in which using deep neural network we regress 2D bounding boxes,
geometric estimation and the distance from the camera and then use these estimations for
regressing accurate 3D object properties and estimate pose to construct the stable 3D
bounding box.
Our models is tested on the KITTI Dataset, which consists of images of vehicles in different
environments. The dataset contains separate repositories for training and testing purposes
(7481 and 7518 images, respectively) with main target classes (cars, pedestrians).In this
Thesis we discussed deep learning techniques for computer vision. More precisely, we are
focusing on the 3D bounding boxes and distance estimation from the scene by using only
single for autonomous vehicles and robots. In this chapter we present an introductory
approach for the problem and also present our contributions and objectives of this thesis.