Noise Reduction of Depth Cameras Images Based on Deep Neural Network
AbstractToday, infrared sensors (depth sensors) are widely used to control applications, games, information acquisition, dynamic and static 3D scenes. Despite the widespread use of these images, their quality is limited to low-quality images, as the infrared sensor does not have high resolution and the images produced by it have noise. Therefore, given the problems and the importance of using 3D images, the quality of these images should be improved in order to provide accurate images from depth cameras, so in this article, noise reduction of depth images using convolutional neural networks is considered. A convolutional neural network with a depth of 20 and three layers and a pre-trained neural network is used and simulation was done for two datasets of depth and color images, Middlebury and EURECOM Kinect Face. According to results, for EURECOM Kinect Face images, PSNR improvement is approx. 8 to 15 dB and for Middlebury images the PSNR improvement is about 5 to 14 dB.
 D. Csetverikov, I. Eichhardt, and Z. Jankó, "A brief survey of image-based depth upsampling," 2015.
 K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, "Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising," IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142-3155, 2017.
 A. Buades, B. Coll, and J.-M. Morel, "A non-local algorithm for image denoising," in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, vol. 2, pp. 60-65: IEEE.
 K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, "Image denoising by sparse 3-D transform-domain collaborative filtering," IEEE Transactions on image processing, vol. 16, no. 8, pp. 2080-2095, 2007.
 A. Buades, B. Coll, and J.-M. Morel, "Nonlocal image and movie denoising," International journal of computer vision, vol. 76, no. 2, pp. 123-139, 2008.
 J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Non-local sparse models for image restoration," in Computer Vision, 2009 IEEE 12th International Conference on, 2009, pp. 2272-2279: IEEE.
 J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng, "Patch group based nonlocal self-similarity prior learning for image denoising," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 244-252.
 U. Schmidt and S. Roth, "Shrinkage fields for effective image restoration," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2774-2781.
 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
 S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015.
 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
 S. A. Gudmundsson, H. Aanaes, and R. Larsen, "Fusion of stereo vision and time-of-flight imaging for improved 3d estimation," International Journal of Intelligent Systems Technologies and Applications, vol. 5, no. 3-4, pp. 425-433, 2008.
 J. Zhu, L. Wang, R. Yang, and J. E. Davis, "Reliability fusion of time-of-flight depth and stereo geometry for high quality depth maps," IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 7, pp. 1400-1414, 2011.
 B.-S. Lin, W.-R. Chou, C. Yu, P.-H. Cheng, P.-J. Tseng, and S.-J. Chen, "An effective spatial-temporal denoising approach for depth images," in 2015 IEEE International Conference on Digital Signal Processing (DSP), 2015, pp. 647-651: IEEE.
 S. MJ, "Temporal and Spatial Denoising of Depth Maps," Sensors (Basel). 2015 Jul 29, no. 8, 2015.
 H. C. Burger, C. J. Schuler, and S. Harmeling, "Image denoising: Can plain neural networks compete with BM3D?," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 2392-2399: IEEE.
 R. Min, N. Kose, and J.-L. Dugelay, "Kinectfacedb: A kinect database for face recognition," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 11, pp. 1534-1548, 2014.
 D. Scharstein and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," International journal of computer vision, vol. 47, no. 1-3, pp. 7-42, 2002.
 D. Scharstein and R. Szeliski, "High-accuracy stereo depth maps using structured light," in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, 2003, vol. 1, pp. I-I: IEEE.
 D. Scharstein and C. Pal, "Learning conditional random fields for stereo," in Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on, 2007, pp. 1-8: IEEE.
 H. Hirschmuller and D. Scharstein, "Evaluation of cost functions for stereo matching," in Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on, 2007, pp. 1-8: IEEE.
 Y. Chen and T. Pock, "Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1256-1272, 2017.
 K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026-1034.
 A. Vedaldi and K. Lenc, "Matconvnet: Convolutional neural networks for matlab," in Proceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 689-692: ACM.