|
·
Perceptual IMAGE
Quality evaluation
Mean square error (MSE) or peak signal-to-noise ratio
(PSNR) is the traditional distortion/quality measure in visual
processing tasks, in spite of their irrelevance with the perception of
the human visual system (HVS) in many situations. In order to formulate
a measure with closer alignment with the human perception, new
perceptual models based upon relevant physiological and psychological
knowledge have been developed via incorporating the HVS’
visibility threshold, visual attention, and spatio-temporal mechanism.
The common visual artifacts which are most disturbing to the HVS
(e.g., blockiness, blurring, motion jitters) are also considered.
It is well known that the HVS cannot sense all changes
in an image. Therefore it is advantageous to estimate the
just-noticeable-difference (JND) and to make use of it in image
processing algorithms and systems, for various benefits (e.g., resource
saving or/and performance enhancement). JND models have been proposed
for the subband domain with spatial and temporal CSF (Contrast
Sensitivity Function), luminance adaptation and contrast masking, and
for the pixel domain with luminance masking and activity masking.
The models have been validated with the psychophysical
data from the publicly-accessible data bases and the large-scale (of
totally 651 test sequences) subjective viewing experiments conducted
during the execution of the different projects. The results of the
visual attention model are in line with the findings of the eye
tracker, and possible physiological link (e.g., Functional Magnetic
Resonance Imaging (fMRI)).
·
IMAGE/VIDEO
COMPRESSION
In image/video transmission, bandwidth is a scarce
resource. To ensure the maximum perceptual coding quality, more bits
can be assigned to the signal components with higher visual
significance, while those below JND (just-noticeable-difference) can be
discarded. A perceptual video coder can achieve “killing three
birds with one stone”—computation reduction, improved
perceptual coding quality, and higher signal fidelity.
With the automatic perceptual-ROI (region of interest)
generation, the quantization steps and the coding scalability can be
determined and the related scheme has been proposed to the
SVC (Scalable Video Coding) standard. Better image coding can be
achieved at low bit rates (<0.5 bpp), if adaptive
down-sampling is used .
Improved rate control can be achieved for H.264
encoding, according to the perceptual cues, and via more accurate
rate-distortion model considering header information and zero
coefficients. For streaming of pre-coded video over different channels
and for different decoders, the leaky bucket parameters have been
determined.
Various strategies had been developed for
motion-estimation and DCT/IDCT complexity regulation, for effective and
efficient video compression, coding and system implementation,
especially at low bitrates.
·
COMPUTER
VISION AND PATTERN RECOGNITION
A complete model-based object recognition and location
system had been developed consisting of image acquisition, feature
extraction, hypothesis generation, and hypothesis verification and
extension. It can be applied to industrial inspection due to its
efficiency. A model-indexing module had been designed, tested and
integrated into the system, in collaboration with DSO (Singapore). For
Infrared (IR) face identification, faces are first detected and
formalized, and features are extracted via PCA. The classification is
performed by the RBF neural network. A blood-perfusion based model has
been proposed, since it enables the transform of the sensory data,
which are liable to ambient conditions, to the more fundamental
features of physiological and thermodynamic nature.
An aforementioned visual quality metric can be used to
examine the acquired images. As the prerequisite of object recognition,
pattern recognition and other content-based manipulations, automatic
object segmentation techniques derived for both stationary and moving
camera have been developed. A PC-based on-line segmentation system has
been tested and delivered to the industrial customer. A new algorithm
has been devised based upon disjoint set union, for marker-based
watershed segmentation; it allows cost-effective
software/firmware/hardware implementation.
·
embedded, Parallel and Real-time
systems
H.324 and MPEG-4
videoconferencing solutions had been completed after integrating
H.263/MPEG-4 video, G.723.1/728 speech, H.223 multiplexer and H.245
control on various DSPs (TI’s families and SHARCs), and delivered
to a multinational company as the result of the industrial
collaboration projects. A mini-navigation system, including RF module,
DSPs, video camera, GPS, and various sensors, had been also
investigated.
A large collection of firmware
modules had been developed for different projects, running with
different DSP/mC-based
platforms for image processing, video segmentation and coding, and
vision-based automation. An MP3 player prototype had been built
based on Xilinx’s FPGAs and a mC,
for a customer project, while a system for wireless security with video
object segmentation had been tested in both software and Verilog HDL.
Implementation with multiple
processors/DSPs was completed for various algorithms and systems (video
codecs and machine vision systems), aiming at operating efficiency or
low power dissipation requirements. Approaches were explored for task
division and dynamic load balancing for optimum performance. For
real-time implementation, methodology had been investigated at
algorithm-level for effective MIPS reduction.
·
PRODUCT
DESIGN & DEVELOPMENT
Multimedia processing involves
intensive computation and large data volume. In order to facilitate
R&D prototypes towards industrial specifications (e.g.,
performance, system cost, device size, power dissipation), the
operational complexity and the memory requirement of the system need to
be minimized. Overall architecture, hardware/software (firmware)
partition, and algorithm selection play global roles in shaping the
system. Statistic memory allocation allows higher efficiency, and
re-use of memory reduces the system cost, device size and power
consumption (in hardwired or hardware/software-co-designed system).
Code optimization and/or assembly programming of critical modules
typically brings over 10x speed-up. Further firmware
optimization includes minimization of cache miss, nested
calls, and amount of stack used in each call.
·
Biomedical SIGNAL
Processing
The algorithms and the system
(combination of software and hardware) were established for
electrogastrographic (EGG) detection from the human abdominal surface
for clinic use. Active filters were designed to reduce ambient noise
(ECG, drifts, etc.), and adaptive
canceling technique was then utilized to eliminate the disturbance from
the respiration that has the similar spectrum with EGG. Discriminating
analysis presented the characteristics of the detected signal of
patients.
A software model was formed for
analyzing the human upper skull in static magnetic field, in
collaboration with GEC (UK). Automatic landmark identification and
location were achieved in cephalogram images for surgical simulation.
The project was funded by NUS Academic Research Fund and in
collaboration with Government Dental Center (Singapore).
·
MULTIMEDIA
Communication and Error Resilience
In collaboration with NTT
DoCoMo, Inc., the H.324 and MPEG-4 videoconferencing systems with H.223
multiplexer and H.245 control had been tested via different means of
connection (fixed phone line, ISDN, LAN and W-CDMA), and error
resilience had been incorporated into video/audio codec development.
Error concealment improves quality of a video decoder linked to a
mobile network. A self-authentication and self-recovery system was
built for images, in which the authentication and correction bits were
watermarked onto the image itself, guided by local perceptual masking
measures. A Unequal error protection (UEP) scheme had been realized for
MPEG AAC (Advanced Audio Coding) codec, based upon concatenation of
convolutional and Reed-Solomon Codes.
·
image restoration
and audio FEATURE EXTRACTION
Images are reconstructed
(demosaiced) from single-sensor data, making fuller use of inter-color
correlation and edge information. Post-processing improves the quality
of decompressed visual signal. Supper-resolution images can be
reconstructed from video. Techniques have been proposed for efficient
perceptual audio coding, and pitch extraction (for both audio and
speech) towards content retrieval.
|