3/04 speaker:
本週邀請到國立海洋大學 資訊工程學系的鄭錫齊教授前來進行演講。
講題與摘要如下:
Topic: 機器視覺智慧應用及深度學習
Intelligent Computer Vision and Deep Learning
Abstract:
物件導向的場景分析在新世代萬物聯網的應用中扮演非常重要的角色,精確的物件偵測結果,利於建構符合人類視覺系統的影像物件及場景模型。結合大數據分析,用以解譯影像場景中各式各樣的影像物件彼此的互動模式,可用於建構諸如視訊監控、人及物體互動、自動車駕駛、行為偵測等等有趣的智慧型應用。然而,由於物件的外觀常因不同的觀察角度、變形、縮放、物件遮蔽、移動、光照條件等因素而有較大的外觀變化,在實體世界中進行物件偵測與辨識仍具諸多挑戰。為了解決這些難題,藉助最新開發的物件偵測與辨識演算法,專注於使用多通道的影像資料,發展強健的視覺及空間拓墣描述特徵,以利解決實體世界物件模型化的問題,逐漸成為一項重要的研究課題。
結合影像的多通道分量的訊息,可以顯著地提升影像物件的建模、檢測及分類的效能。當前較佳的物件辨識方法存在三項明顯缺點,首先,傳統方法經常使用局部貼片中利於辨識的特徵,忽略了局部貼片建構整體物件所需的拓樸特徵。其次,大部分已經存在的系統,依賴應用領域專家的經驗法則,事先定義所需截取的特徵,這種人為製作的特徵,缺乏特徵演化機制,其有效性常受限於應用領域變化因子,因而無法普遍應用於背景複雜的真實場景上。第三,以前的方法大都假定訊號通道資訊可各自獨立處理,經常忽略了影像貼片中各通道資訊的空間性關連及與物件模型之間的關係。為了解決這些問題,根基於深度學習的概念,本研究開發快速且有效的物件模型化演算法,以非監督式的方式從影像原始資料中,結合多通道資訊連結特性,自動學習、建構包含外觀、空間拓樸結構及時間軌跡資訊之強健影像物件特徵,並據以進行影像物件(場景)建模、偵測及分類。更進一步,我們計畫結合影像場景之特定應用大數據分析,建構相關的決策支援模型及以影像物件為主的智慧型物連網應用的關鍵技術。
關鍵字: 物連網;機器視覺;影像場景解譯;物件偵測;深度學習;影像大數據分析。
Keywords- internet of things; computer vision; object modeling; scene interpretation; deep learning; image-based big data analytics.
Object-based scene analytics plays an important role in the emerging smart applications of internet-of-things. In real-world images, objects are often deformed in visual appearance due to pose change, shape deformation, scaling, occlusion, motion, and lighting condition. Object detection and recognition in an image with cluttered background remains as a very challenging problem. To tackle these difficulties, using multi-channel (RGB-D) videos, recent approaches focus on the discovery of robust object descriptor which incorporates both discriminative visual features, spatial trajectories and topological features. These features facilitate precise object modeling in real-world images. To use learning algorithms for constructing concise object models is thus an important issue for object detection and recognition.
The state-of-the-art feature representations for image objects exist three deficiencies. Firstly, these feature representations focus on extracting discriminative visual features, but usually ignore some useful recognition cures such as the spatial relationship among local patches. Secondly, the hand-crafted features are domain dependent with their effectiveness heavily relying on expert experience. Thirdly, previous work typically assume that channel signals are mutually independent and ignore channel relationships. To address these problems, it is desirable to develop efficient and effective learning algorithms to automatically learn robust representations from image raw data. Inspired by the rapid progress of deep learning, in this work, we focus on the development of unsupervised object detection in videos. The detected objects associated with a pre-learned scene model facilitate smart object-based applications. The proposed object-based technique can be easily extended to incorporate image-based big data analytics for understanding real-world images.