Linear Spatial Pyramid Matching using Sparse Coding for Image Classification (CVPR'09)

Jianchao Yang, Kai Yu, Yihong Gong, Thomas Huang

Abstract

Image

Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n^2~n^3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scale-up the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial {max pooling}, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.

Matlab Codes

Description

This page contains the Matlab codes implementing the ScSPM algorithm described in CVPR'09 paper "Linear Spatial Pyramid Matching using Sparse Coding for Image Classification".

Our algorithm is composed of the following parts:
a. SIFT descriptor extraction.
b. Sparse coding. We integrated Honglak Lee's Matlab codes for dictionary learning.
d. Linear SVM classification. We implemented an efficient linear SVM with squared hinge loss function.

Combining sparse coding with spatial max pooling, the algorithm leads to state-of-the-art performance on Caltech 101 based on SIFT descriptor. The most encouraging part of this algorithm is that the extracted feature favors linear model, and thus can be easily scaled up to large dataset. Currently, the sparse coding process is kind of slow. However, it can be easily speeded up by parallel computing or approximation using neural networks to real-application level.

Download

Matlab codes
Paper

Feedback

Email me if you have any questions.


Return to Jianchao Yang's homepage.