Heterogeneous Feature Machines for Visual Recognition





Abstract

With the recent efforts made by computer vision researchers, more and more types of features have been designed to describe various aspects of visual characteristics. Modeling such heterogeneous features has become an increasingly critical issue. In this paper, we propose a machinery called the Heterogeneous Feature Machine (HFM) to effectively solve visual recognition tasks in need of multiple types of features. Our HFM builds a kernel logistic regression model based on similarities that combine different features and distance metrics. Different from existing approaches that use a linear weighting scheme to combine different features, HFM does not require the weights to remain the same across different samples, and therefore can effectively handle features of different types with different metrics. To prevent the model from overfitting, we employ the so-called group LASSO constraints to reducemodel complexity. In addition, we propose a fast algorithm based on co-ordinate gradient descent to efficiently train a HFM. The power of the proposed scheme is demonstrated across a wide variety of visual recognition tasks including scene, event and action recognition.

Citation
Liangliang Cao, Jiebo Luo, Feng Liang, and Thomas S. Huang
Heterogeneous Feature Machines for Visual Recognition
IEEE Proc. Int'l Conf. Computer Vision (ICCV), 2009 [pdf] [bib]


Motivation
In recent years, more and more features have been designed to describe different aspects of visual characteristics, such as Color Moment, SIFT, HOG, MSER, GIST, Shape context, LBP, etc. These features demand different metrics, including:
  • Euclidian distance
  • Chi-square distance
  • Pyramid matching kernel
In addition, The importance of these heterogeneous features differs from sample to sample. For example:
  • Color features are dominant from a distant viewpoint
  • Shape features are significant for close-up photos
To handle this problem, we look for a model which
  • provides flexible weighting scheme for heterogeneous features
  • finds effective samples (like support vectors)


Related work

Kernel Machines minimize the objective function with the empirical loss and a regularization term:
According to the Representer theorem, the solution takes the following form (when the regularization term is a monotone function of the RKHS norm)
SVM is an example of Kernel machines, with L2 Regularization term and hinge loss. SVM has the following properties
  • Leads to sparse selection of samples (supporting vectors)
  • Not trivial to select kernel functions
  • Traditional kernel machine is based on single features
In Multiple Kernel Machines (MKL), the classifer is modeled as
Drawbacks: The weights are the same across all the samples, would fail to describe possible nonlinear relationships.

Localized MKL employs a parameterized function of x to combine the weight
Drawbacks: It is extremely difficulty to find a parameterized function for multiple heterogeneous features with different metrics and coordinate system.


Our Model
We build such a model by
  • employing many more weighting coefficients
  • using logistic loss so that gradient-based approach might be applied.
  • introducing group lasso regularization which prefers sparse selection of groups
We formulate our classifier as

which minimizes the objective function:

We named our model “Heterogeneous Feature Machine” (HFM).
HFM is significantly different from classical Kernel Machines:
  • No longer in RKHS due to the group lasso norm
  • Unlike SVM, the logistic loss in HFM is associated with continuous gradients everywhere.
  • Number of parameters increases significantly:
          MKL: O(M+N) vs. HFM O(M*N)
To estimate the parameters, we first approximate the cost function by its quadratic expansion

which can be minimized effectively using block Co-ordinate Gradient Descent method [1][2].


Experimental Results
HFM vs. MKL on a toy dataset:
We test our HFM and MKL model on UCI liver repository, of which we randomly select 70% for training and 30% for testing and repeat the experiments 20 times. Our model is consistently better than recent MKL models (SILP and SimpleMKL) and our computational time is comparable to the best one.
There are 91 kernels in UCI liver, where SimpleMKL selects the sparse representation over 91 kernel matrices. In visual recognition tasks, the number of features is usually much smaller (=5~6 in this paper). When searching a sparse representation over limited kernels is no longer plausible, we use HFM to fuse multiple features.
HFM for visual recognition from images:
We test our algorithms on two image datasets:
  • Li and Fei-Fei’s Princeton sports events dataset[3] (bocce, croquet, polo, rowing, snowboarding, badminton, sailing, and rock climbing).
  • Jain’s Flickr sports event dataset[4] (baseball, basketball, football, soccer, and tennis).
    • We use HFM with five features: Gist, HOG, LBP, Color Moment, Spatial pyramid matching using SIFT.
      The results show that our approach outperforms the state-of-the-art on these two datasets. Our HFM model also beat other feature fusion approaches such as Bayesian Net, and Random forest.

HFM for visual recognition from videos:
We also applied HFM for Ke’s CMU video event dataset. There are 5 categories: jumping, one hand wave, pickup, push button, two hand wave. We employ 4 video features (Efro’s motion feature, motion history, Laptev’s STIP-HOG , STIP-HOF) for HFM. Our approach significantly improves the accuracy of using single features.


References
  1. A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. “SimpleMKL”, JMLR, 2008.
  2. M Gnen, and E Alpaydin. “Localized multiple kernel learning”, ICML, 2008.
  3. L.-J. Li and L. Fei-Fei. "What, where and who? classifying event by scene and object recognition", ICCV, 2007.
  4. V. Jain, A. Singhal, and J. Luo. "Selective hidden random fields: Exploiting domain specific saliency for event classification", CVPR, 2008.
  5. P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming B, 117(1-2), 2009.
  6. L. Meier, S. van de Geer, and P. Buhlmann. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B, 70(1):53–71, 2008.