Image captioning is a process of automatically describing an image with one or more natural language sentences. In recent years, image captioning has witnessed rapid progress, from initial template-based models to the current ones, based on deep neural networks. An overview of issues and recent image captioning research, with a particular emphasis on models that use the deep encoder-decoder architecture will be given. The advantages and disadvantages of different approaches, along with reviewing some of the most commonly used evaluation metrics and datasets will be discussed.
Keywords – image captioning, encoder-decoder, attention mechanism, deep neural networks