109-2 專題演講－3/8(一) 10時美國華盛頓大學黃正能教授(原訂3/12的場次)

國立東華大學資訊工程學系暨研究所

109-2 專題演講－3/8(一) 10時美國華盛頓大學黃正能教授(原訂3/12的場次)

Post published:2021-02-25
Post category:演講公告 / 系務公告
Post author:Arna Wu (#5016)

演講題目：When 5G Meets with Big IoT Data for Coordinated Mining of 3D World

演講地點：理工二館第四講堂

主講人：美國華盛頓大學黃正能教授

演講摘要：

Thanks to the ultra-reliable low-latency communication (URLLC) capability of the emergent 5G mobile networks, the information derived from the roadside static surveillance or on-board moving IoT sensors (e.g., video cameras, Radars and Lidars), which can be jointly explored by the mobile edge computing (MEC) and real-time shared by all the local connected users for various smart city applications. To achieve this goal of coordinated mining of different modalities of IoT data, all of the detected/segmented and tracked human/vehicle objects need to be 3D localized in the world coordinate for effective 3D understanding of local dynamic evolutions. In this talk I will mainly talk about some challenges and potential solutions, more specifically, a robust tracking and 3D localization of detected objects, from either static/moving monocular video cameras, is proposed based on a variant of the Cascade R-CNN detector trained with triplet loss to obtain the accurate localization and the corresponding discriminating identity-aware features for tracking association, even with long-term occlusion, of each detected object in one-shot. When the cameras fail to reliably achieve these tasks due to poor lighting or adverse weather conditions, Radars and Lidars can offer more robust localization than the monocular cameras. However, the semantic information provided by the radio or point cloud data is limited and difficult to extract. In this talk, I will also introduce a radio object detection network (RODNet) to detect objects purely from radio signals captured by Radar based on an innovative cross-modal supervision framework, which utilizes the rich information extracted from the camera to teach object detection for Radar without tedious and laborious human labelling of ground truth on the Radar signals. Moreover, to compensate the disadvantage of Lidar detection on far-away small objects, effective integration of Lidar based detections, along with 2D object detections and 3D localization from monocular images based on 3D tracking associations, to achieve superior tracking and 3D localization performance. Finally, an efficient 3D human pose estimation for action description of detected human in natural monocular videos is also presented for finer-grained 3D scene understanding for smart city applications.

跨域自主學習時數報名(Link)