This paper presents a deep learning-based image annotation method that segments images into multiple regions and identifies target objects in each region. The proposed algorithm first uses a deep convolutional neural network to extract image features, then uses a spatial pooling layer to aggregate the features, and finally uses a regression layer to classify the features. Experimental results show that the proposed method achieves good performance in the object detection task.