International Journal of Information Technology

Combining audio and visual speech recognition using LSTM and deep convolutional neural network

R Shashidhar ¹

S Patilkulkarni ¹

S B Puneeth ²

Hide authors affiliations Show authors affiliations: 2 affiliations

Department of Electronics and Communication Engineering, JSS Science and Technology University, Mysore, India

University of Mysore

JSS Science and Technology University

Departments of Electronics and Communication Engineering, Presidency University, Bangalore, India |

Publication type: Journal Article

Publication date: 2022-02-24

Springer Nature

International Journal of Information Technology

scimago Q2

SJR: 0.717

CiteScore: 8.5

Impact factor: —

ISSN: 25112104, 25112112

DOI: 10.1007/s41870-022-00907-y

Copy DOI

Computer Science Applications

Electrical and Electronic Engineering

Computational Theory and Mathematics

Information Systems

Computer Networks and Communications

Artificial Intelligence

Applied Mathematics

Abstract

Human speech is bimodal, whereas audio speech relates to the speaker's acoustic waveform. Lip motions are referred to as visual speech. Audiovisual Speech Recognition is one of the emerging fields of research, particularly when audio is corrupted by noise. In the proposed AVSR system, a custom dataset was designed for English Language. Mel Frequency Cepstral Coefficients technique was used for audio processing and the Long Short-Term Memory (LSTM) method for visual speech recognition. Finally, integrate the audio and visual into a single platform using a deep neural network. From the result, it was evident that the accuracy was 90% for audio speech recognition, 71% for visual speech recognition, and 91% for audiovisual speech recognition, the result was better than the existing approaches. Ultimately model was skilled at enchanting many suitable decisions while forecasting the spoken word for the dataset that was used.

Found

1 citation

Karpov Alexey

DSc in Engineering, professor

115 publications, 1 230 citations

h-index: 17

ITMO University

Saint-Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences

1 citation

Waqas Muhammad

🤝

PhD in Engineering, lecturer, fellow of the Pakistan Academy of Sciences

51 publications, 1 112 citations, 34 reviews

h-index: 18

King Mongkut's University of Technology Thonburi

Top-30

Journals

	2 4 6 8 10 12
International Journal of Information Technology	International Journal of Information Technology, 12, 25.53% International Journal of Information Technology 12 publications, 25.53%
Neural Computing and Applications	Neural Computing and Applications, 2, 4.26% Neural Computing and Applications 2 publications, 4.26%
Expert Systems with Applications	Expert Systems with Applications, 2, 4.26% Expert Systems with Applications 2 publications, 4.26%
IEEE Access	IEEE Access, 2, 4.26% IEEE Access 2 publications, 4.26%
Cognitive Computation	Cognitive Computation, 1, 2.13% Cognitive Computation 1 publication, 2.13%
Frontiers in Earth Science	Frontiers in Earth Science, 1, 2.13% Frontiers in Earth Science 1 publication, 2.13%
Computer Vision and Image Understanding	Computer Vision and Image Understanding, 1, 2.13% Computer Vision and Image Understanding 1 publication, 2.13%
Lecture Notes in Electrical Engineering	Lecture Notes in Electrical Engineering, 1, 2.13% Lecture Notes in Electrical Engineering 1 publication, 2.13%
IETE Journal of Research	IETE Journal of Research, 1, 2.13% IETE Journal of Research 1 publication, 2.13%
Data Science and Management	Data Science and Management, 1, 2.13% Data Science and Management 1 publication, 2.13%
Arabian Journal for Science and Engineering	Arabian Journal for Science and Engineering, 1, 2.13% Arabian Journal for Science and Engineering 1 publication, 2.13%
Vibrational Spectroscopy	Vibrational Spectroscopy, 1, 2.13% Vibrational Spectroscopy 1 publication, 2.13%
ETRI Journal	ETRI Journal, 1, 2.13% ETRI Journal 1 publication, 2.13%
Computers and Geosciences	Computers and Geosciences, 1, 2.13% Computers and Geosciences 1 publication, 2.13%
MethodsX	MethodsX, 1, 2.13% MethodsX 1 publication, 2.13%
IEEE Transactions on Geoscience and Remote Sensing	IEEE Transactions on Geoscience and Remote Sensing, 1, 2.13% IEEE Transactions on Geoscience and Remote Sensing 1 publication, 2.13%
Biomedical Signal Processing and Control	Biomedical Signal Processing and Control, 1, 2.13% Biomedical Signal Processing and Control 1 publication, 2.13%
IEEE Transactions on Mobile Computing	IEEE Transactions on Mobile Computing, 1, 2.13% IEEE Transactions on Mobile Computing 1 publication, 2.13%
Big Data and Cognitive Computing	Big Data and Cognitive Computing, 1, 2.13% Big Data and Cognitive Computing 1 publication, 2.13%
Smart Sensors, Measurement and Instrumentation	Smart Sensors, Measurement and Instrumentation, 1, 2.13% Smart Sensors, Measurement and Instrumentation 1 publication, 2.13%
Mathematical Methods in the Applied Sciences	Mathematical Methods in the Applied Sciences, 1, 2.13% Mathematical Methods in the Applied Sciences 1 publication, 2.13%
International Journal of Machine Learning and Cybernetics	International Journal of Machine Learning and Cybernetics, 1, 2.13% International Journal of Machine Learning and Cybernetics 1 publication, 2.13%
	2 4 6 8 10 12

Publishers

	2 4 6 8 10 12 14 16 18 20
Springer Nature	Springer Nature, 19, 40.43% Springer Nature 19 publications, 40.43%
Institute of Electrical and Electronics Engineers (IEEE)	Institute of Electrical and Electronics Engineers (IEEE), 14, 29.79% Institute of Electrical and Electronics Engineers (IEEE) 14 publications, 29.79%
Elsevier	Elsevier, 8, 17.02% Elsevier 8 publications, 17.02%
Wiley	Wiley, 2, 4.26% Wiley 2 publications, 4.26%
Frontiers Media S.A.	Frontiers Media S.A., 1, 2.13% Frontiers Media S.A. 1 publication, 2.13%
Taylor & Francis	Taylor & Francis, 1, 2.13% Taylor & Francis 1 publication, 2.13%
MDPI	MDPI, 1, 2.13% MDPI 1 publication, 2.13%
Cold Spring Harbor Laboratory	Cold Spring Harbor Laboratory, 1, 2.13% Cold Spring Harbor Laboratory 1 publication, 2.13%
	2 4 6 8 10 12 14 16 18 20

We do not take into account publications without a DOI.
Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST |

Cite this

GOST Copy

Shashidhar R., Patilkulkarni S., Puneeth S. B. Combining audio and visual speech recognition using LSTM and deep convolutional neural network // International Journal of Information Technology. 2022.

GOST all authors (up to 50) Copy

Shashidhar R., Patilkulkarni S., Puneeth S. B. Combining audio and visual speech recognition using LSTM and deep convolutional neural network // International Journal of Information Technology. 2022.

RIS |

Cite this

RIS Copy

TY - JOUR

DO - 10.1007/s41870-022-00907-y

UR - https://doi.org/10.1007/s41870-022-00907-y

TI - Combining audio and visual speech recognition using LSTM and deep convolutional neural network

T2 - International Journal of Information Technology

AU - Shashidhar, R

AU - Patilkulkarni, S

AU - Puneeth, S B

PY - 2022

DA - 2022/02/24

PB - Springer Nature

SN - 2511-2104

SN - 2511-2112

ER -

BibTex

Cite this

BibTex (up to 50 authors) Copy

@article{2022_Shashidhar,

author = {R Shashidhar and S Patilkulkarni and S B Puneeth},

title = {Combining audio and visual speech recognition using LSTM and deep convolutional neural network},

journal = {International Journal of Information Technology},

year = {2022},

publisher = {Springer Nature},

month = {feb},

url = {https://doi.org/10.1007/s41870-022-00907-y},

doi = {10.1007/s41870-022-00907-y}

}

Publisher

Springer Nature

Journal

International Journal of Information Technology

scimago Q2

SJR

0.717

CiteScore

8.5

Impact factor

—

ISSN

25112104 (Print)

25112112 (Electronic)