Volume- 10
Issue- 6
Year- 2023
DOI: 10.55524/ijirem.2023.10.6.8 | DOI URL: https://doi.org/10.55524/ijirem.2023.10.6.8 Crossref
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/by/4.0)
Article Tools: Print the Abstract | Indexing metadata | How to cite item | Email this article | Post a Comment
Indrani Vasireddy , G.HimaBindu, Ratnamala.B
In the ever-evolving digital landscape, this paper presents an innovative Image Caption Generator that seamlessly merges Vision Transformers (ViT) and GPT-2. By combining the strengths of computer vision and natural language processing (NLP), our paper aims to extract significant image features using ViT and generate contextual, human-like descriptions through GPT-2. The resultant system boasts an intuitive interface, allowing users to effortlessly receive coherent captions for uploaded images. This ground breaking technology holds immense potential for the visually impaired community, enhancing image-based content accessibility and overall user experiences.
The primary objective of our image caption generator paper is to develop a sys-tem that automates the generation of descriptive and coherent textual captions for images. This endeavor involves the integration of computer vision and NLP techniques, enabling the system to analyze the content of an image and produce relevant and meaningful textual descriptions. The broader goal is to improve the accessibility of visual content, enhance image search capabilities, and facilitate applications such as automated content tagging. Furthermore, the paper addresses the needs of visually impaired individuals by providing assistive technology that interprets and communicates image content effectively.
This paper exemplifies the symbiotic relationship between computer vision and NLP, illustrating how their integration can pave the way for transformative AI applications. The resulting synergy not only contributes to the development of advanced image captioning systems but also opens avenues for innovative applications across diverse domains. The conference presentation will delve into the technical aspects of our approach, showcasing the significance of this integration and its potential impact on the future of AI applications.
Associate Professor, Department of Computer Science and Engineering, Geethanjali College of Engineering, Hyderabad, India
No. of Downloads: 53 | No. of Views: 1016
Anshita Kesharwani, Kaptan Singh, Amit Saxena.
April 2024 - Vol 11, Issue 2
Niyati Agarwal , Dipti Ranjan Tiwari .
April 2024 - Vol 11, Issue 2
Sandeep Kumar Jaiswal , Rohit Agrawal.
April 2024 - Vol 11, Issue 2