Home | Research Experience | Selected Publications | Research Grants |
Research Recognition |
Teaching & Student Projects | Major Systems for Industries
Major Presentations Employment History | Other Professional Activities
 
 
 
 
 

 


Outlines of Major Research Experience

·        Perceptual IMAGE Quality evaluation

Mean square error (MSE) or peak signal-to-noise ratio (PSNR) is the traditional distortion/quality measure in visual processing tasks, in spite of their irrelevance with the perception of the human visual system (HVS) in many situations. In order to formulate a measure with closer alignment with the human perception, new perceptual models based upon relevant physiological and psychological knowledge have been developed via incorporating the HVS’ visibility threshold, visual attention, and spatio-temporal mechanism.  The common visual artifacts which are most disturbing to the HVS (e.g., blockiness, blurring, motion jitters) are also considered.

It is well known that the HVS cannot sense all changes in an image. Therefore it is advantageous to estimate the just-noticeable-difference (JND) and to make use of it in image processing algorithms and systems, for various benefits (e.g., resource saving or/and performance enhancement). JND models have been proposed for the subband domain with spatial and temporal CSF (Contrast Sensitivity Function), luminance adaptation and contrast masking, and for the pixel domain with luminance masking and activity masking.

The models have been validated with the psychophysical data from the publicly-accessible data bases and the large-scale (of totally 651 test sequences) subjective viewing experiments conducted during the execution of the different projects. The results of the visual attention model are in line with the findings of the eye tracker, and possible physiological link (e.g., Functional Magnetic Resonance Imaging (fMRI)).

·        IMAGE/VIDEO COMPRESSION

In image/video transmission, bandwidth is a scarce resource. To ensure the maximum perceptual coding quality, more bits can be assigned to the signal components with higher visual significance, while those below JND (just-noticeable-difference) can be discarded. A perceptual video coder can achieve “killing three birds with one stone”—computation reduction, improved perceptual coding quality, and higher signal fidelity.

With the automatic perceptual-ROI (region of interest) generation, the quantization steps and the coding scalability can be determined and the related scheme has been proposed to the SVC (Scalable Video Coding) standard. Better image coding can be achieved at low bit rates (<0.5 bpp), if adaptive down-sampling is used .

Improved rate control can be achieved for H.264 encoding, according to the perceptual cues, and via more accurate rate-distortion model considering header information and zero coefficients. For streaming of pre-coded video over different channels and for different decoders, the leaky bucket parameters have been determined.

Various strategies had been developed for motion-estimation and DCT/IDCT complexity regulation, for effective and efficient video compression, coding and system implementation, especially at low bitrates.

·        COMPUTER VISION AND PATTERN RECOGNITION

A complete model-based object recognition and location system had been developed consisting of image acquisition, feature extraction, hypothesis generation, and hypothesis verification and extension. It can be applied to industrial inspection due to its efficiency. A model-indexing module had been designed, tested and integrated into the system, in collaboration with DSO (Singapore). For Infrared (IR) face identification, faces are first detected and formalized, and features are extracted via PCA. The classification is performed by the RBF neural network. A blood-perfusion based model has been proposed, since it enables the transform of the sensory data, which are liable to ambient conditions, to the more fundamental features of physiological and thermodynamic nature. 

An aforementioned visual quality metric can be used to examine the acquired images. As the prerequisite of object recognition, pattern recognition and other content-based manipulations, automatic object segmentation techniques derived for both stationary and moving camera have been developed. A PC-based on-line segmentation system has been tested and delivered to the industrial customer. A new algorithm has been devised based upon disjoint set union, for marker-based watershed segmentation; it allows cost-effective software/firmware/hardware implementation.

·        embedded, Parallel and Real-time systems

H.324 and MPEG-4 videoconferencing solutions had been completed after integrating H.263/MPEG-4 video, G.723.1/728 speech, H.223 multiplexer and H.245 control on various DSPs (TI’s families and SHARCs), and delivered to a multinational company as the result of the industrial collaboration projects. A mini-navigation system, including RF module, DSPs, video camera, GPS, and various sensors, had been also investigated.

A large collection of firmware modules had been developed for different projects, running with different DSP/mC-based platforms for image processing, video segmentation and coding, and vision-based automation.  An MP3 player prototype had been built based on Xilinx’s FPGAs and a mC, for a customer project, while a system for wireless security with video object segmentation had been tested in both software and Verilog HDL.

Implementation with multiple processors/DSPs was completed for various algorithms and systems (video codecs and machine vision systems), aiming at operating efficiency or low power dissipation requirements. Approaches were explored for task division and dynamic load balancing for optimum performance. For real-time implementation, methodology had been investigated at algorithm-level for effective MIPS reduction. 

·        PRODUCT DESIGN & DEVELOPMENT

Multimedia processing involves intensive computation and large data volume. In order to facilitate R&D prototypes towards industrial specifications (e.g., performance, system cost, device size, power dissipation), the operational complexity and the memory requirement of the system need to be minimized.  Overall architecture, hardware/software (firmware) partition, and algorithm selection play global roles in shaping the system. Statistic memory allocation allows higher efficiency, and re-use of memory reduces the system cost, device size and power consumption (in hardwired or hardware/software-co-designed system). Code optimization and/or assembly programming of critical modules typically brings over 10x speed-up. Further firmware optimization includes   minimization of cache miss, nested calls, and amount of stack used in each call.

·        Biomedical SIGNAL Processing

The algorithms and the system (combination of software and hardware) were established for electrogastrographic (EGG) detection from the human abdominal surface for clinic use. Active filters were designed to reduce ambient noise (ECG, drifts, etc.), and adaptive canceling technique was then utilized to eliminate the disturbance from the respiration that has the similar spectrum with EGG. Discriminating analysis presented the characteristics of the detected signal of patients.

A software model was formed for analyzing the human upper skull in static magnetic field, in collaboration with GEC (UK). Automatic landmark identification and location were achieved in cephalogram images for surgical simulation. The project was funded by NUS Academic Research Fund and in collaboration with Government Dental Center (Singapore).

·        MULTIMEDIA Communication and Error Resilience

In collaboration with NTT DoCoMo, Inc., the H.324 and MPEG-4 videoconferencing systems with H.223 multiplexer and H.245 control had been tested via different means of connection (fixed phone line, ISDN, LAN and W-CDMA), and error resilience had been incorporated into video/audio codec development. Error concealment improves quality of a video decoder linked to a mobile network. A self-authentication and self-recovery system was built for images, in which the authentication and correction bits were watermarked onto the image itself, guided by local perceptual masking measures. A Unequal error protection (UEP) scheme had been realized for MPEG AAC (Advanced Audio Coding) codec, based upon concatenation of convolutional and Reed-Solomon Codes.

·        image restoration and audio FEATURE EXTRACTION

Images are reconstructed (demosaiced) from single-sensor data, making fuller use of inter-color correlation and edge information. Post-processing improves the quality of decompressed visual signal. Supper-resolution images can be reconstructed from video. Techniques have been proposed for efficient perceptual audio coding, and pitch extraction (for both audio and speech) towards content retrieval.