|
|
| Abstract |
| With the recent efforts made by computer vision researchers, more and more types of features have been designed to describe various aspects of visual characteristics. Modeling such heterogeneous features has become an increasingly critical issue. In this paper, we propose a machinery called the Heterogeneous Feature Machine (HFM) to effectively solve visual recognition tasks in need of multiple types of features. Our HFM builds a kernel logistic regression model based on similarities that combine different features and distance metrics. Different from existing approaches that use a linear weighting scheme to combine different features, HFM does not require the weights to remain the same across different samples, and therefore can effectively handle features of different types with different metrics. To prevent the model from overfitting, we employ the so-called group LASSO constraints to reducemodel complexity. In addition, we propose a fast algorithm based on co-ordinate gradient descent to efficiently train a HFM. The power of the proposed scheme is demonstrated across a wide variety of visual recognition tasks including scene, event and action recognition. |
| Citation |
|
Liangliang Cao,
Jiebo Luo,
Feng Liang,
and Thomas S. Huang
Heterogeneous Feature Machines for Visual Recognition IEEE Proc. Int'l Conf. Computer Vision (ICCV), 2009 [pdf] [bib] |
| Motivation |
In recent years, more and more features have been designed to describe different aspects of visual characteristics, such as
Color Moment, SIFT, HOG, MSER, GIST, Shape context, LBP, etc.
These features demand different metrics, including:
|
|
| Related work |
Kernel Machines minimize the objective function with the empirical loss and a regularization term:
|
In Multiple Kernel Machines (MKL), the classifer is modeled as
Localized MKL employs a parameterized function of x to combine the weight |
| Our Model |
We build such a model by
which minimizes the objective function: We named our model “Heterogeneous Feature Machine” (HFM). HFM is significantly different from classical Kernel Machines:
which can be minimized effectively using block Co-ordinate Gradient Descent method [1][2]. |
|
| Experimental Results |
|
We test our HFM and MKL model on UCI liver repository, of which we randomly
select 70% for training and 30% for testing and repeat the experiments 20 times.
Our model is consistently better than recent MKL models (SILP and SimpleMKL) and
our computational time is comparable to the best one. There are 91 kernels in UCI liver, where SimpleMKL selects the sparse representation over 91 kernel matrices. In visual recognition tasks, the number of features is usually much smaller (=5~6 in this paper). When searching a sparse representation over limited kernels is no longer plausible, we use HFM to fuse multiple features. |
|
We test our algorithms on two image datasets:
The results show that our approach outperforms the state-of-the-art on these two datasets. Our HFM model also beat other feature fusion approaches such as Bayesian Net, and Random forest. |
|
|
|
| We also applied HFM for Ke’s CMU video event dataset. There are 5 categories: jumping, one hand wave, pickup, push button, two hand wave. We employ 4 video features (Efro’s motion feature, motion history, Laptev’s STIP-HOG , STIP-HOF) for HFM. Our approach significantly improves the accuracy of using single features. |
|
| References |