Locality-constrained Linear Coding for Image Classification (CVPR'10)

Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang, and Yihong Gong

[Download the Matlab Package ]


The traditional SPM approach based on bag-of-features (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simply but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of VQ coding in traditional SPM. LLC utilizes the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation. With linear classifier, the proposed approach performs remarkably better than the traditional nonlinear SPM, achieving state-of-the-art performance on several benchmarks. Compared with the sparse coding strategy, the objective function used by LLC has an analytical solution. In addition, the paper proposes a fast approximated LLC method by first performing a K-nearest neighbor search and then solving a small constrainted least square fitting problem, bearing computational complexity of O(M+K^2). Hence even with very large codebooks, our system can still process multiple frames per second. This efficiency significantly adds to the practical values of LLC for real applications.


Our locality-constrained linear coding (LLC) algorithm follows the standard descriptor coding + spatial pooling strategy to extract features from the image:

The advantages of this method are: simple, fast and scalable, and competitive with ScSPM [1] with linear classifiers. The following table shows the comparison of our algorithm with SPM [3] and ScSPM [1] in the literature.

Table 1: Comparison of LLC with SPM and ScSPM
Algorithms SPM ScSPM LLC
Local descriptors SIFT descriptors extracted from a dense grid on the image SIFT descriptors extracted from a dense grid on the image SIFT descriptors extracted from a dense grid on the image
Descriptor coding Vector quantization Sparse coding LLC
Sparse codes pooling Multiscale spatial average pooling (histogram) Multiscale spatial max pooling Multiscale spatial max pooling
Classifier Nonlinear (chi-square or intersection kernel) Linear Linear
Speed Fast Relatively slow Fast

Experiment Results

Experiment results are reported based on three widely used datasets: Caltech 101, Caltech 256, and PASCAL VOC 2007. Only a single descriptor, the dHoG feature (similar to SIFT, but faster), is used throughout the experiments. In our setup, the dHoG features were extracted from patches densely located by every 8 pixels on the image, under three scales, 16x16, 25x25, and 31x31 respectively. The dimension of each dHoG descriptor is 128. Please refer to the paper for more details.

Caltech 101 Dataset

We train a codebok with 2048 bases, and used 4x4, 2x2, and 1x1 sub-regions for spatial max pooling. The images are resized to be no larger than 300x300 with preserved aspect ratio. For classification, we use one-vs-all trained linear SVM.

Table 2: Classification Results on Caltech 101
Training Images 5 10 15 20 25 30
Zhang 46.6 55.8 59.1 62.0 - 66.20
SPM - - 56.40 - - 64.60
Griffin 44.2 54.5 59.0 63.3 65.80 67.60
Boiman - - 65.00 - - 70.40
Jain - - 61.00 - - 69.10
ScSPM - - 67.00 - - 73.20
Ours 51.15 59.77 65.43 67.74 70.16 73.44

Caltech 256 Dataset

The experiment settings are the same with Caltech 101, except that we use a larger codebook, containing 4096 bases. We do find that using a larger codebook helps the performance from what we have tried.

Table 3: Classification Results on Caltech 256
Training Images 15 30 45 60
Griffin 28.30 34.10 - -
Gemert - 27.17 - -
ScSPM 27.73 34.02 37.46 40.14
Ours 34.36 41.19 45.31 47.68


The results are reported using the standard Average Precision (AP) for this challenge competition. Note that our system only use one single descriptor and linear SVM.

Table 4: Classification Results on PASCAL 2007
Object class aero bicyc bird boat bottle bus car cat chair cow
Obj.+Contex 80.2 61.0 49.8 69.6 21.0 66.8 80.7 51.1 51.4 35.9
Best PASCAL'07 77.5 63.6 56.1 71.9 33.1 60.6 78.0 58.8 53.5 42.6
Ours 74.8 65.2 50.7 70.9 28.7 68.8 78.5 61.7 54.3 48.6
Object class table dog horse mbike person plant sheep sofa train tv average
Obj.+contex 62.0 38.6 69.0 61.4 84.6 28.7 53.5 61.9 81.7 59.5 58.4
Best of PASCAL'07 54.9 45.8 77.5 64.0 85.9 36.3 44.7 50.9 79.2 53.2 59.4
Ours 51.8 44.1 76.6 66.9 83.5 30.8 44.6 53.4 78.2 53.5 59.3



Matlab codes


Email me if you have any questions, or if you find any bugs with the codes.


[1] Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. Linear spatial pyramid matching using sparse coding for image classification. CVPR'09.

[2] Kai Yu, Tong Zhang, and Yihong Gong. Nonlinear learning using local coordinate coding. NIPS'09.

[3] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. CVPR'06.

Return to Jianchao Yang's homepage.