原文:Efficient Multiple Feature Fusion With Hashing for Hyperspectral Imagery Classification: A Comparative Study


  • I. Introduction
  • II. MFH Framework
  • III. Feature Hashing
  • IV. Experiment Setting
    • A. Data Sets
    • B. Multiple Feature Extraction
    • C. EvaluatedMethods
    • D. Evaluation Criterion
  • V. Results and Analysis
    • A. Experiment 1: Indian Pines Data Set
    • B. Experiment 2: University of Pavia Data Set With Given Training Samples
    • C. Experiment 3: Salinas Data Set
    • D. Experiment 4: Houston Data Set With Given Training Samples
  • VI. Disscussion
  • VII. Conclusion And Feature Works

I. Introduction

The existing methods for multiple feature fusion are mainly
focused on improving their classification accuracy without con- sidering the computational and storage cost. However,


As a powerful technique to obtain compact features and fast nearest neighbor search, hashing has not been introduced in remote sensing processing until very recently, where it is adopted for large-scale remote sensing image retrieval. To the best of our knowledge, it has not been used in hyperspectral image classification.


The main contributions of our work are summarized as follows.

  1. We propose an MFH framework to use hash technique in fusing multiple features for hyperspectral image classification and show encouraging results.
  2. We conduct an extensive performance evaluation of different hashing methods on fusing multiple features for classification on four popular hyperspectral data sets. Based on the evaluation results, we supply with an indepth discussion on the advantages, disadvantages, and availability of different hashing methods in this task.
  3. We conduct comparative experiments with five classical subspace-based dimension reduction methods and six different multiple feature fusion methods. Experiments show that, when equipped with a proper hashing learning strategy, the proposed MFH method can achieve comparable or even competitive performance. Meanwhile, the obtained binary features require much less storage and classification time.


  1. 提出了一种MFH(Multiple feature Fision Hashing)框架,将哈希技术引用到多特征融合的高光谱图像分类中;
  2. 在4个流行高光谱数据集上进行了性能评估,探讨哈希方法在本课题中的优缺点;
  3. 对五种经典的基于子空间的降维方法和六种不同的多特征融合方法进行了对比实验。

II. MFH Framework

The proposed MFH framework can be divided into three steps: 1) perform feature extraction in the hyperspectral image via efficient approaches, and concatenate multiple features into a long feature vector for each pixel; 2) perform hashing learning on these feature vectors with or without class label information, and map the original float-type feature vectors into compact binary codes; and 3) perform classification with the obtained binary codes, and output the final classification results. The flowchart of MFH is shown in Fig. 1.

Fig. 1 Flowchart of the proposed MFH framework.

Fig. 1 Flowchart of the proposed MFH framework.


  1. 对高光谱图像进行特征提取,并将每个像素的多个特征拼接成一个长特征向量
  2. 对特征向量进行哈希学习,将浮点特征映射为二进制哈希码;
  3. 利用得到的哈希码进行分类,输出分类结果。


III. Feature Hashing


  • LSH
  • KLSH
  • SH


  • KSH
  • FastHash
  • CCA-Based ITQ:当标签信息可用时,可以用典型相关分析(CCA)[46]代替PCA,从而产生基于CCA的ITQ算法(CCA-ITQ)。如[45]所示,CCA-ITQ是非常有效的,可以显著提高图像检索的性能。

[45] Y. Gong and S. Lazebnik, “Iterative quantization: A Procrustean approach to learning binary codes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 817–824.
[46] H. Hotelling, “Relations between two sets of variates,” Biometrika, vol. 28, no. 3/4, pp. 321–377, Dec. 1936.

IV. Experiment Setting

A. Data Sets

Indian Pines: This data set was captured by the Airborne Visible/Infrared Imaging Spectrometry (AVIRIS) sensor over a mixed agricultural/forested region in Northwest Indiana, on June 12, 1992. This data set has a spatial size of 145×145 pixels and 220 spectral bands with a spatial resolution of 20 m/pixel. It has 16 land-cover classes, whose sizes of labeled samples disproportionately range from 20 to 2468 pixels. In the experi- ments, we remove 20 noisy bands (104–108, 150–163, and 220) due to water absorption and use the remaining 200 bands.

Indian Pine :该数据集具有145*145像素的空间大小和220个光谱波段,空间分辨率为20m/piexl。该数据集包含16个土地覆盖类别,可用于分类的样本分布极不均匀,数量从20到2468个不等。实验中去掉了20个(104-108,150-163和220)被水吸收的噪声频段,使用了剩下的200个频段。

University of Pavia: This data set was acquired by the Reflective Optics System Imaging Spectrometry (ROSIS), covering an urban area of the University of Pavia, Italy. Originally, the ROSIS sensor provided 115 bands from 0.43 to 0.86 μm. After removing the 12 most noisy bands, the remaining 103 bands are used for experiments. The spatial size of this data set is 610 × 340 pixels, and its spatial resolution is 1.3 m/pixel. There are 9 classes, with sizes of labeled samples ranging from 1026 to 18686.

University of Pavia:该数据集包含115个波段,去除12个噪声最大的波段后,剩下的103个波段用于实验。该数据集的空间大小为610*340像素,空间分辨率为1.3m/pixel。包含9个类别,样本像素数量从1026到18686个不等。

Salinas: This data set was captured by the AVIRIS sensor over Salinas Valley, CA, USA, with a spatial resolution of 3.7 m/pixel. This data set has a spatial size of 512 × 217 with 224 spectral bands. In our experiments, 20 water absorption bands (108–112, 154–167, and 224) are discarded. This data set has 16 classes, whose sample sizes range from 916 to 11271.

Salinas:该数据集空间分辨率大小为3.7m/pixel,包含有224个波段,空间大小为512*217的像素。实验中忽略了20个被水吸收的波段(108–112, 154–167和224)。该数据集包含16个类别,样本像素数量从916到11271个不等。

Houston: This data set was initially distributed in the 2013 IEEE Geoscience and Remote Sensing Data Fusion Contest, which includes an urban hyperspectral data set and light detection and ranging (LiDAR) derived digital surface model. Both are geographically referenced and at the same spatial resolution (2.5 m). The hyperspectral data set has 144 bands in the 380–1050-nm spectral region. There are 15 classes of interest selected by the organizers.

Houston:该数据集包括一个城市高光谱数据集和一个由光探测和测距(LiDAR)得出的数字地表模型。两者的空间分辨率均为2.5m/pixel。高光谱数据集在380-1050 nm的光谱区域中有144个波段。实验中选取了15个感兴趣的类别。

B. Multiple Feature Extraction

Four kinds of commonly used features in hyperspectral image processing are extracted for each pixel, including the following: 1) the original spectral feature (denoted as Spectral); 2) the EMP feature (denoted as EMP) [19]; 3) the EAP feature (denoted as EAP) [20], [21]; and 4) the Gabor filtering feature (denoted as Gabor).


  1. 原始光谱特征(Spectral),记为 x s p e ∈ R d 1 x_{spe}\in\mathbb{R}^{d_1} xspeRd1,其中 d 1 d_1 d1为高光谱图像的波段数。
  2. 扩展的形态学剖面(EMP),记为 x e m p ∈ R d 2 x_{emp}\in\mathbb{R}^{d_2} xempRd2
  3. 扩展形态学属性剖面(EAP),记为 x e a p ∈ R d 3 x_{eap}\in\mathbb{R}^{d_3} xeapRd3
  4. Gabor滤波特征(Gabor),记为 x g a b o r ∈ R d 4 x_{gabor}\in\mathbb{R}^{d_4} xgaborRd4

提取了这些特征后,简单地将它们连接成一个融合的高维多个特征向量 x m u l t i = [ x s p e , x e m p , x e a p , x g a b o r ] ∈ R 1 × D ( D = ∑ k = 1 4 d k ) x_{multi}=[x_{spe},x_{emp},x_{eap},x_{gabor}]\in\mathbb{R}^{1\times D}(D=\sum_{k=1}^4d_k) xmulti=[xspe,xemp,xeap,xgabor]R1×D(D=k=14dk),并将他们作为后续哈希学习的输入。

C. EvaluatedMethods



D. Evaluation Criterion


  1. 总体准确率(OA),即分类正确的样本的数量除以测试样本的数量;
  2. kappa统计量( κ \kappa κ),用于一致性检验和衡量分类的效果;
  3. 每类分类精度。

分类问题中,最常见的评价指标是acc,它能够直接反映分正确的比例,同时计算非常简单。但是实际的分类问题种,各个类别的样本数量往往不太平衡。在这种不平衡数据集上如不加以调整,模型很容易偏向大类别而放弃小类别(eg: 正负样本比例1:9,直接全部预测为负,acc也有90%。但正样本就完全被“抛弃”了)。此时整体acc挺高,但是部分类别完全不能被召回。



  1. 学习哈希函数的训练时间,以秒为单位;
  2. 从串联的多个特征中提取哈希码的时间,以微秒为单位;
  3. 任意两个特征(浮点型向量或二进制码)之间距离计算的平均时间,以纳秒为单位。

V. Results and Analysis

A. Experiment 1: Indian Pines Data Set



多特征提取为: x s p e ∈ R 200 x_{spe}\in\mathbb{R}^{200} xspeR200 x e m p ∈ R 45 x_{emp}\in\mathbb{R}^{45} xempR45 x e a p ∈ R 180 x_{eap}\in\mathbb{R}^{180} xeapR180
x g a b o r ∈ R 40 x_{gabor}\in\mathbb{R}^{40} xgaborR40 x m u l t i = [ x s p e , x e m p , x e a p , x g a b o r ] ∈ R 1 × 465 x_{multi}=[x_{spe},x_{emp},x_{eap},x_{gabor}]\in\mathbb{R}^{1\times 465} xmulti=[xspe,xemp,xeap,xgabor]R1×465

B. Experiment 2: University of Pavia Data Set With Given Training Samples

C. Experiment 3: Salinas Data Set

D. Experiment 4: Houston Data Set With Given Training Samples

VI. Disscussion


VII. Conclusion And Feature Works

In this paper, we have proposed an MFH framework and have given a comparative evaluation on several existing hashing methods for hyperspectral imagery classification. The main characteristics of this work lie in the following aspects. First, the hashing technique has been introduced into multiple feature fusion for generating compact binary feature representation. Second, the classification experiments conducted on four real hyperspectral data sets have demonstrated that the obtained compact binary codes cannot only preserve similarity in the original data space but also allow more economical subsequent processing and meanwhile can achieve a comparable or better performance. Finally, along with the powerful features ex- tracted on hyperspectral images, the feature hashing in multiple feature fusion is very effective and efficient as expected.

As future work, more investigations on the MFH can bemainly explored in two aspects: theory and application. From the perspective of theory, the first improvement is to propose more flexible fusion schemes to take advantage of comple- mentary but vital information from multiple types of features. Another possible improvement is to develop more efficient hashing methods to obtain more compact and discriminative binary codes. From the viewpoint of application, with the greater development of imaging technologies, large volumes of huge data in remote sensing have been captured and stored. How to effectively and efficiently explore the large-scale big remote sensing data urgently needs to be studied. One possible application is remote sensing data compression. With the effi- cient MFH, the large-scale data can be compressed into binary codeswithout significant loss of information,which will largely reduce the storage amount. Another possible exploration is fast retrieval for the near-duplicate spectrum or similar objects. Owing to the compact binary codes, the nearest neighbor search or approximate nearest neighbor search would be very efficient, thus largely decreasing the time complexity.





请登录后再发布评论,和谐社会,请文明发言,谢谢合作! 立即登录 注册会员