Institute of Electrical and Electronics Engineers (IEEE)

, страницы 1522-1529

Integrating Optical Characteristic Recognition with Conversational AI: A Multimodal Chatbot Featuring Speech and Poster Generation

Тип публикации: Proceedings Article

Дата публикации: 2025-05-21

Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/icaiss61471.2025.11041931

Скопировать DOI

Краткое описание

Artificial Intelligence (AI) has progressed so far in human computer interaction that it is much more natural and interesting. Optical Character Recognition (OCR) conjointly with Conversational AI is capable of processing visual alongside the textual input and generating intelligent and context aware responses, and therefore the work on a multimodal chatbot system is introduced in this paper. The proposed system extracts text from images, Natural Language Processing (NLP) processes user queries, and enhances the interaction by speech output through text to speech synthesis. In particular, this chatbot doesn’t accept speech as input modality but tries to translate text response to speech to make the interface more accessible for visually impaired users. Additionally, there is a poster generation module in the system for visual summarization of the conversations and the extracted content. The chatbot uses state of the art deep learning models and language frameworks to handle real time processing, grammatical accuracy in real time and also across different scenarios. From education, assistive technologies, customer support and all the possibilities in between, the applications take advantage of multimodal, voice enriched and visually enhanced communication to include users.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.

Войти с ORCID

Метрики

Издатель

Institute of Electrical and Electronics Engineers (IEEE)

Ошибка в публикации?