volume 37 issue 9 pages 1904-1916

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Publication typeJournal Article
Publication date2015-09-01
scimago Q1
wos Q1
SJR3.910
CiteScore35.0
Impact factor18.6
ISSN01628828, 21609292, 19393539
Computational Theory and Mathematics
Artificial Intelligence
Applied Mathematics
Software
Computer Vision and Pattern Recognition
Abstract
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 $\times$ 224) input image. This requirement is “artificial” and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 $\times$ faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.
Found 
Found 

Top-30

Journals

50
100
150
200
250
300
350
400
450
500
IEEE Access
462 publications, 5.1%
Sensors
314 publications, 3.46%
Lecture Notes in Computer Science
266 publications, 2.93%
Remote Sensing
245 publications, 2.7%
Applied Sciences (Switzerland)
195 publications, 2.15%
Multimedia Tools and Applications
148 publications, 1.63%
Electronics (Switzerland)
141 publications, 1.56%
IEEE Transactions on Geoscience and Remote Sensing
124 publications, 1.37%
Scientific Reports
113 publications, 1.25%
Neurocomputing
110 publications, 1.21%
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
86 publications, 0.95%
IEEE Transactions on Instrumentation and Measurement
82 publications, 0.9%
Expert Systems with Applications
78 publications, 0.86%
Journal of Physics: Conference Series
74 publications, 0.82%
Engineering Applications of Artificial Intelligence
69 publications, 0.76%
Communications in Computer and Information Science
68 publications, 0.75%
IEEE Transactions on Image Processing
66 publications, 0.73%
Computers and Electronics in Agriculture
62 publications, 0.68%
IEEE Transactions on Circuits and Systems for Video Technology
56 publications, 0.62%
Neural Computing and Applications
55 publications, 0.61%
Signal, Image and Video Processing
55 publications, 0.61%
Pattern Recognition
55 publications, 0.61%
IEEE Transactions on Pattern Analysis and Machine Intelligence
55 publications, 0.61%
IEEE Transactions on Intelligent Transportation Systems
52 publications, 0.57%
Lecture Notes in Electrical Engineering
52 publications, 0.57%
IET Image Processing
51 publications, 0.56%
Journal of Real-Time Image Processing
45 publications, 0.5%
Biomedical Signal Processing and Control
45 publications, 0.5%
Applied Intelligence
44 publications, 0.49%
50
100
150
200
250
300
350
400
450
500

Publishers

500
1000
1500
2000
2500
3000
3500
Institute of Electrical and Electronics Engineers (IEEE)
3233 publications, 35.66%
Springer Nature
1503 publications, 16.58%
MDPI
1314 publications, 14.49%
Elsevier
1311 publications, 14.46%
Association for Computing Machinery (ACM)
184 publications, 2.03%
IOP Publishing
147 publications, 1.62%
Wiley
135 publications, 1.49%
Taylor & Francis
124 publications, 1.37%
Hindawi Limited
119 publications, 1.31%
Frontiers Media S.A.
116 publications, 1.28%
Institution of Engineering and Technology (IET)
105 publications, 1.16%
SPIE-Intl Soc Optical Eng
96 publications, 1.06%
SAGE
76 publications, 0.84%
Public Library of Science (PLoS)
36 publications, 0.4%
World Scientific
28 publications, 0.31%
AIP Publishing
27 publications, 0.3%
Tech Science Press
25 publications, 0.28%
Hans Publishers
25 publications, 0.28%
American Society of Civil Engineers (ASCE)
23 publications, 0.25%
PeerJ
22 publications, 0.24%
Optica Publishing Group
21 publications, 0.23%
IGI Global
19 publications, 0.21%
Oxford University Press
18 publications, 0.2%
EDP Sciences
15 publications, 0.17%
Cambridge University Press
12 publications, 0.13%
IOS Press
11 publications, 0.12%
Cold Spring Harbor Laboratory
11 publications, 0.12%
Emerald
10 publications, 0.11%
Institute of Electronics, Information and Communications Engineers (IEICE)
10 publications, 0.11%
500
1000
1500
2000
2500
3000
3500
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
9.1k
Share
Cite this
GOST |
Cite this
GOST Copy
He K. et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015. Vol. 37. No. 9. pp. 1904-1916.
GOST all authors (up to 50) Copy
He K., Zhang X., Ren S., Sun J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015. Vol. 37. No. 9. pp. 1904-1916.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1109/TPAMI.2015.2389824
UR - https://doi.org/10.1109/TPAMI.2015.2389824
TI - Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
T2 - IEEE Transactions on Pattern Analysis and Machine Intelligence
AU - He, Kaiming
AU - Zhang, Xiangyu
AU - Ren, Shaoqing
AU - Sun, Jian
PY - 2015
DA - 2015/09/01
PB - Institute of Electrical and Electronics Engineers (IEEE)
SP - 1904-1916
IS - 9
VL - 37
PMID - 26353135
SN - 0162-8828
SN - 2160-9292
SN - 1939-3539
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2015_He,
author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
title = {Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = {2015},
volume = {37},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
month = {sep},
url = {https://doi.org/10.1109/TPAMI.2015.2389824},
number = {9},
pages = {1904--1916},
doi = {10.1109/TPAMI.2015.2389824}
}
MLA
Cite this
MLA Copy
He, Kaiming, et al. “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, Sep. 2015, pp. 1904-1916. https://doi.org/10.1109/TPAMI.2015.2389824.