Combining audio and visual speech recognition using LSTM and deep convolutional neural network
Publication type: Journal Article
Publication date: 2022-02-24
scimago Q2
SJR: 0.717
CiteScore: 8.5
Impact factor: —
ISSN: 25112104, 25112112
Computer Science Applications
Electrical and Electronic Engineering
Computational Theory and Mathematics
Information Systems
Computer Networks and Communications
Artificial Intelligence
Applied Mathematics
Abstract
Human speech is bimodal, whereas audio speech relates to the speaker's acoustic waveform. Lip motions are referred to as visual speech. Audiovisual Speech Recognition is one of the emerging fields of research, particularly when audio is corrupted by noise. In the proposed AVSR system, a custom dataset was designed for English Language. Mel Frequency Cepstral Coefficients technique was used for audio processing and the Long Short-Term Memory (LSTM) method for visual speech recognition. Finally, integrate the audio and visual into a single platform using a deep neural network. From the result, it was evident that the accuracy was 90% for audio speech recognition, 71% for visual speech recognition, and 91% for audiovisual speech recognition, the result was better than the existing approaches. Ultimately model was skilled at enchanting many suitable decisions while forecasting the spoken word for the dataset that was used.
Found
Nothing found, try to update filter.
Found
Nothing found, try to update filter.
Top-30
Journals
|
2
4
6
8
10
12
|
|
|
International Journal of Information Technology
12 publications, 25.53%
|
|
|
Neural Computing and Applications
2 publications, 4.26%
|
|
|
Expert Systems with Applications
2 publications, 4.26%
|
|
|
IEEE Access
2 publications, 4.26%
|
|
|
Cognitive Computation
1 publication, 2.13%
|
|
|
Frontiers in Earth Science
1 publication, 2.13%
|
|
|
Computer Vision and Image Understanding
1 publication, 2.13%
|
|
|
Lecture Notes in Electrical Engineering
1 publication, 2.13%
|
|
|
IETE Journal of Research
1 publication, 2.13%
|
|
|
Data Science and Management
1 publication, 2.13%
|
|
|
Arabian Journal for Science and Engineering
1 publication, 2.13%
|
|
|
Vibrational Spectroscopy
1 publication, 2.13%
|
|
|
ETRI Journal
1 publication, 2.13%
|
|
|
Computers and Geosciences
1 publication, 2.13%
|
|
|
MethodsX
1 publication, 2.13%
|
|
|
IEEE Transactions on Geoscience and Remote Sensing
1 publication, 2.13%
|
|
|
Biomedical Signal Processing and Control
1 publication, 2.13%
|
|
|
IEEE Transactions on Mobile Computing
1 publication, 2.13%
|
|
|
Big Data and Cognitive Computing
1 publication, 2.13%
|
|
|
Smart Sensors, Measurement and Instrumentation
1 publication, 2.13%
|
|
|
Mathematical Methods in the Applied Sciences
1 publication, 2.13%
|
|
|
International Journal of Machine Learning and Cybernetics
1 publication, 2.13%
|
|
|
2
4
6
8
10
12
|
Publishers
|
2
4
6
8
10
12
14
16
18
20
|
|
|
Springer Nature
19 publications, 40.43%
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
14 publications, 29.79%
|
|
|
Elsevier
8 publications, 17.02%
|
|
|
Wiley
2 publications, 4.26%
|
|
|
Frontiers Media S.A.
1 publication, 2.13%
|
|
|
Taylor & Francis
1 publication, 2.13%
|
|
|
MDPI
1 publication, 2.13%
|
|
|
Cold Spring Harbor Laboratory
1 publication, 2.13%
|
|
|
2
4
6
8
10
12
14
16
18
20
|
- We do not take into account publications without a DOI.
- Statistics recalculated weekly.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
47
Total citations:
47
Citations from 2024:
30
(63.83%)
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Shashidhar R., Patilkulkarni S., Puneeth S. B. Combining audio and visual speech recognition using LSTM and deep convolutional neural network // International Journal of Information Technology. 2022.
GOST all authors (up to 50)
Copy
Shashidhar R., Patilkulkarni S., Puneeth S. B. Combining audio and visual speech recognition using LSTM and deep convolutional neural network // International Journal of Information Technology. 2022.
Cite this
RIS
Copy
TY - JOUR
DO - 10.1007/s41870-022-00907-y
UR - https://doi.org/10.1007/s41870-022-00907-y
TI - Combining audio and visual speech recognition using LSTM and deep convolutional neural network
T2 - International Journal of Information Technology
AU - Shashidhar, R
AU - Patilkulkarni, S
AU - Puneeth, S B
PY - 2022
DA - 2022/02/24
PB - Springer Nature
SN - 2511-2104
SN - 2511-2112
ER -
Cite this
BibTex (up to 50 authors)
Copy
@article{2022_Shashidhar,
author = {R Shashidhar and S Patilkulkarni and S B Puneeth},
title = {Combining audio and visual speech recognition using LSTM and deep convolutional neural network},
journal = {International Journal of Information Technology},
year = {2022},
publisher = {Springer Nature},
month = {feb},
url = {https://doi.org/10.1007/s41870-022-00907-y},
doi = {10.1007/s41870-022-00907-y}
}