Objective Non-Reference Visual Quality Assessment of Image/Videos
Applications of Visual Quality Assessment (VQA):
- Image/Video Coding;
- Image/Video Watermarking;
- Image/Video Denoising;
- Image/Video Artifact removal;
- Image/Video Error Protection;
- Image/Video Synthesis/Rendering.
Categories of VQA:
- Subjective rating:
- MOS (Mean opinion score);
- DMOS (Difference Mean opinion score);
- Subjective tests:
- DSCQS (Double stimulus continuous quality scale);
- DSIS (Double stimulus continuous Impairment scale);
- SSCQE (Single stimulus continuous Quality Evaluation);
- PC(Pair Comparison), ACR(Absolute Category Rating).
- Objective methods:
- Full Reference;
- Reduced Reference;
- Non Reference.
Fig. 1, Subjective methods by MOS
Objective VQA Metrics
- Signal fidelity measures;
- Perceptual visual quality metrics (PVQMs);
- Vision-based modeling;
- Subband Decomposition;
- Luma Adaptation;
- Visual masking.
- Signal-driven approach.
- Statistical features;
- structural similarity.
- VQEG: ITU.
Fig. 2, Vision-based Modeling for VQM
Non Reference VQM
- Contrast/Sharpness: a bandpass-filtered/lowpass-filtered image;
- Luminance, color, motion, texture.
- Blockiness: dicontinuties at block boundaries(horiz. and vert.), harmonic analysis;
- Profiles along vert. and horiz. and local peaks detection;
- Luminance adaptation and texture masking as well.
- Noise: noise engergy after denoising;
- Blurring: contrast decrease on edges;
- Color Bleeding: color contrast higher at the edges between areas;
- Ringing/Mosquito: ratio for deviation of the noise spectrum after edge preserving filter;
- Jerkiness: frame dropping/freeze measure.
Human Visual System (HVS)
- Visual masking: masking (visibility is reduced by the background) effect in HVS;
- Physiological & Psychological mechanism.
- Luminance adaptation;
- Pattern masking;
- Contrast masking;
- Noise masking;
- Edge masking;
- Texture masking (semi-local, entropy or activity masking).
- Contrast sensitivity function (CSF);
- Visual Attention model (saliency map);
- Mono/Multi-channel model:
- Frequency-based signal decomposition to create a spatial-frequency hierarchy: steerable pyramid, wavelet, cortex filter, QMF, HOP, ...
Fig. 3, Campbell and Robson chart
Fig. 4, Spatial CSF
Fig. 5, Velocity CSF
Fig. 6, Spatial-Temporal CSF
Fig. 7, Luminance Adaptation
Cortex Filter [Watson'87]
- Gabor function: Gaussian by a sinusoid, to provide a good albeit preliminary description of V1 simple cell receptive fields;
- Cortex transform mimics HVS's critical band (Gabor shape);
- Radial freq. selectivity is symmetric on a log freq. axis with bandwidths nearly constant at one octave.
- Orientation selectivity is symmetric about a center peak angle with tuning bandwidths varying w.r.t. the radial freq.;
- Scalability means identical processing for each resolution/layer.
- Cortex filter models the space frequency localization aspects of HVS by splitting the original image spectrum into many spatial images;
- Mesa filter: a 2d low pass filter with a blurry disc at frequency domain and a impulse response as a Gaussian by a Bessel function;
- Difference of Mesa (dom) filter for radial frequency selectivity: subtraction of a smaller disk from a large one;
- Bisection filter: bisect the frequency space, created by a 2d step function convolved with scaled Gaussian;
- Fan filter for orientation selectivity: a set of fan shaped regions in frequency space, created by repeated bisection filters;
- Cortex filter is multiplying the dom filter with the fan filter to get a small band of 2d spatial frequency, resembling the 2d Gabor function;
- DCT-cortex filter mapping: How dct coeff. contribute to cortex band;
- Threshold elevation (masking) model for texture energy (spatial details).
Fig. 8, Typical cortex transform decomposition in frequency domain
Fig. 9, Cortex Filter from dom filter and fan filter
Fig. 10, Cortex Filter response (right) similar to Gabor function (left) of a cortical receptive field
Fig. 11, DCT Cortex Filter Mapping
Fig. 12, A Simple Threshold Elevation Model
Visual Difference Predictor (VDP) [Daly'93, Bradley'99, Ninassi'08]
- Extended from the Cortex filter;
- Hanming function in transition for mesa filter;
- Polar degree in dom filter;
- Gaussian baseband;
- Unuse high frequency residual.
- Conditions of a patial-frequency hierarchy:
- invertible, unit frequency response, orientation sensitivity to at least 4 directions;
- minimal overlap btw adjacent frequency channels;
- shift invariant, orthogonal;
- limited spatial extent, linear phase.
- Masking: phase coherent/incoherent masking, mutual masking;
- Distortions: blurring and contouring (banding) in addition.
- Noise: minimal threshold on WT coeff.
- Activities: smooth or highly textured.
- Wavelet Visible Difference Predictor: Approximate correspondence b.t.w. HVS and WT, more comput. efficient than cortex transform;
- WQA: Wavelet-based Quality Assessment proposed by Thomson;
- Separable wavelets: weaker match to HVS;
- Non-separable wavelets: sampling pattern alternates by level (rectangular or quincunx), but no increase of regularity;
- HVS wavelets: rotated of -45 degree and scaled after separable WD (input sampling rectangularly, but output sampling quincunx).
- CSF for spatial frequency and orientation over the DWT subband;
- Contrast masking;
- Semi-local masking;
- Error pooling.
- How DWT-based VQA is applied for image coding?
- Codestream quality layer in JPEG 2000 [Zeng'02].
- Multiscale geometric analysis:
Wavelet, curvelet, bandelet, contourlet.
Fig. 13, DWT-based subband detection (noise) threshold
Fig. 14, DWT-based subband threshold elevation model
Fig. 15, Spatial frequency dependent DWT (separable) and its rotation to match HVS better.
Fig. 16, DWT-based CSF and rotated DWT-based CSF
Fig. 17, WQA Flowchart
- Visibility threshold: contrast value for which a visual cell responds;
- Visual Masking: sensitivity of human eyes for natural pictures can be modulated (inc./dec. ) by stimulus (background).
- Sub-band decomposition:
- intra-channel (same orientation) masking, intra-component (same frequency) masking;
- inter-channel (different orientation) masking and inter-component (different frequency) masking.
JND (Just-Noticeable-Distortion) Model
- Visibility threshold can be detected by majority (75%): minimally-noticeable-distortion (MND) profile;
- Edge detection & Block type classification.
- Edge density for plain, edge, texture;
- Texture masking
- Spatial and disorder degree (gradient distribution).
- DWT domain (others: Laplacian pyramid...).
- DWT coeff: LL, HL, LH, HH;
- DCT domain
- DCT coeff: DC, L(low freq), E(edge), H(high freq.);
- Edge energy: L+E;
- Texture energy: E+H;
- Intra-band masking: coef direct comparing;
- Inter-band masking: coef. combination and block classification (see the figure) followed by low-medium-high masking for smooth-edge-texture block resp.;
- Luminance adaptation (Weber's law).
- Mean luminance in the block;
- Contrast masking
- structural and textural masking.
- Pixel-domain decomposition by edge preserving filters (separation of structure and texture).
- Foveation (peripheral vision): distance to calculate weight.
- JND for video: take into account temporal contrast sensitivity function (TCSF), eye movement and object motion (local).
Fig. 18, JND in Psycho Curve
Fig. 19, DCT-based Block Classification
Fig. 20, Block Classification and JND Masking Results
Note: more results are shown in the following webpage.
DVQ (Digital Visual Quality) Model [Watson'02]
- DCT domain;
- Local contrast (AC/DC).
- Convert to JND (just-noticeable-difference) by the spatial CSF.
- Contrast masking (DCT coeff.);
- Temporal masking (or apply different spatial CSF matrix in JND).
- Gamma function in display.
- Spatial/frequency error.
- Minkowski metric (beta-norm).
- Weighted pooling of distortion.
- Application: How to design DCT Quantization matrix for images?
- To provide optimal visual quality: RDO(rate distortion optimization);
- Threshold for DCT qunatization error: adjusted by contrast, luminance, texture masking etc.
Fig. 21, Watson Digital Visual Quality model
Fig. 22, Temporal, spatial and orientation components of DCT threshold model
Visual Attention (Saliency Map) Model
- Pre-attentive (bottom-up);
- Itti model
- Context-Aware Saliency Detection [Goferman'10]
- Frequency-tuned Salient Region Detection [Achanta'09]
- Attentive (top-down):
- Face/skin detection, fg/bg segmentation, Identified objects...
- How saliency maps are used in visual quality?
- A simple way: weighted distortion.
Fig. 23, Context Saliency Results
Note: more results are shown in the following webpage.
Structural Similarity Metric (SSIM) [Zhou'04]
- Luminance: mean;
- Contrast: standard deviation
- Structure: normalized by standard deviation
- Similarity metric
- Single/Multi scale (MS)-SSIM;
- Speed-SSIM: visual speed perception.
VSNR (Visual SNR) [Chandler'07]
- Wavelet domain: DWT is an efficient space-frequency decomposition method;
- Detect distortion by threshold and then model it with VSNR;
- Near-threshold and suprathreshold stimuli in the HVS;
- Contrast threshold: wavelet-based visual masking and visual summation (LL, LH, HL, HH);
- Perceived contrast and global precedence (integrate edges in a coarse to fine scale fashion): SNR in a wavelet decomposition;
Other Video Quality Metrics
- VQM (Video Quality Metric):
- NTIA: consists of Television model, General model, Video Conferencing model;
- blurring, block distortion, jerky/unnatural motion, noise;
- PVQM (Perceptual Quality Metric):
- edginess, temporal decorrelation, color error;
- PQSM (Perceptual Quality Significance Map):
- model visual attention, eye fixation/movement with motion, luma, contrast, texture, skin/face detection;
- MOVIE (Motion-based Video Integrity Evaluation);
- optic flow to guide spatial-temporal filtering by Gabor filterbanks;
- Motion estimation and compensation;
- Opticom: PEVQ from region of interests;
- Psytechnics: video registration, spatial-tmeporal distortion analysis.
Machine Learning-based Image/Video VQA
- Feature extraction and dimension reduction;
- Training data collection: Full Reference;
- Good or bad quality identified by classifier: NN, SVM, Adaboost, ...;
- Quality score predicted by regression: NN, SVR, ...;
- Quality score estimated by statistical modeling: SVD,...
3-D Image/Video VQA
1. Campbell and Robson, Application of Fourier analysis to the visibility of gratings, J. of Physiology 197, 551-566, 1968.
2. F. van Nes, M. Bouman, Spatial modulation transfer in the human eye, J. of the Optical Society of America 57, 401-406, 1967.
3. X. Zhang, W. Lin, P. Xue, Just Noticeable Difference Estimation with Pixels in Images, J. of Visual Comm. & Image Represent., vol. 19, no. 1, 2008.
4. A.B. Watson, The cortex transform: rapid computation of simulated neural images. Comput. Vis. Graph. Imaging Proc. 39 (3) (1987) 311-327
5. A.B. Watson, J. Hu, J.F. McGowan III, DVQ: a digital video quality metric based on human vision, J. Electron. Imaging 10 (1) (2001) 20-29.
6. Weisi Lin, C.-C. Jay Kuo, Perceptual visual quality metrics: A survey, J. Vis. Commun. Image R., 22 (2011) 297-312.
7. H.Y. Tong, A.N. Venetsanopoulos. A perceptual model for jpeg applications based on block classification, texture masking, and luminance masking, IEEE Int. Conf. Image Processing (ICIP), 1998.
8. Kelly D. H. Motion and vision. ii. stabilized spatiotemporal threshold surface. J. of Optical Society of America 69 (1979), 1340-1349.
9. Zhou Wang, Alan Bovik, Ligang Lu, Video Quality Assessment Based on Structural Distortion Measurement, IEEE Signal Processing: Image Communication, Vol 19, No 2. pp. 121-132, February 2004.
10. T D Tran, R Safranek, A locally adaptive masking threshold model for image coding, IEEE ICASSP, 1996.
11. R. Achanta, S. Hemami, F. Estrada and S. Suesstrunk, Frequency-tuned Salient Region Detection, IEEE CVPR, 2009.
12. S. Goferman, L. Zelnik-Manor, A. Tal, Context-Aware Saliency Detection, CVPR 2010.
13. D. Chandler, S. Hemami, VSNR: a wavelet-based visual SNR for natural images, IEEE T-IP, 16(9), 2007.
14. H Sheihk, A Bovik, Image information and visual quality,IEEE-IP, 15(2), 2006.
15. S Daly, The Visual Difference Predictor: An algorithm for assessment of image fidelity,Digital Images and Human Vision, MIT press, pp179-206, 1993.
16. A Hekstra et. al, PVQM-a perceptual video quality measure,Signal Processing and Image Communication, 17(1), 2002.
17. Z Lu et. al, PQSM-based RR and NR video quality metrics,SSPIE, vol.5150, 2003.
18. A Watson, DCT quantization matrices visually optimized for individual images, Human, Vision, Visual Processing and Digital Display IV, SPIE, vol. 1913-14, 1993.
19. W. Zeng, S. Daly, and S. Lei, An overview of the visual optimization tools in JPEG2000, Signal Processing: Image Communication 17(1), pp. 85-104, 2002.
20. A. Ninassi, O. Le Meur, P. Le Callet, D. Barba, On The Performance of Human Visual System Based Image Quality Assessment Metric Using Wavelet Domain, SPIE Human Vision and Electronic Imaging XIII Conference HVEI 2008, 27-31, January 2008.
21. A. Bradley, A wavelet visible difference predictor, IEEE T-IP, 8(5), May, 199.