An Efficient Optimal Reconstruction Based Speech Separation Based on Hybrid Deep Learning Technique

Received: 11 Feb 2022, Revised: 13 Feb 2022, Accepted: 26 Mar 2022, Available online: 29 Mar 2022, Version of Record: 29 Mar 2022

Yannam Vasantha Koteswararao; C.B. Rama Rao

Abstract


Conventional single-channel speech separation has two long-standing issues. The first issue, over-smoothing, is addressed, and estimated signals are used to expand the training data set. Second, DNN generates prior knowledge to address the problem of incomplete separation and mitigate speech distortion. To overcome all current issues, we suggest employing an efficient optimal reconstruction-based speech separation (ERSS) to overcome those problems using a hybrid deep learning technique. First, we propose an integral fox ride optimization (IFRO) algorithm for spectral structure reconstruction with the help of multiple spectrum features: time dynamic information, binaural and mono features. Second, we introduce a hybrid retrieval-based deep neural network (RDNN) to reconstruct the spectrograms size of speech and noise directly. The input signals are sent to Short Term Fourier Transform (STFT). STFT converts a clean input signal into spectrograms then uses a feature extraction technique called IFRO to extract features from spectrograms. After extracting the features, using the RDNN classification algorithm, the classified features are converted to softmax. ISTFT then applies to softmax and correctly separates speech signals. Experiments show that our proposed method achieves the highest gains in SDR, SIR, SAR STIO, and PESQ outcomes of 10.9, 15.3, 10.8, 0.08, and 0.58, respectively. The Joint-DNN-SNMF obtains 9.6, 13.4, 10.4, 0.07, and 0.50, comparable to the Joint-DNN-SNMF. The proposed result is compared to a different method and some previous work. In comparison to previous research, our proposed methodology yields better results.
Illustration of proposed ERSS using a hybrid deep learning technique.

Illustration of proposed ERSS using a hybrid deep learning technique.
… 
Speech separation performances of various metrics using existing and suggested techniques.

Speech separation performances of various metrics using existing and suggested techniques.
… 
gSDR matched and unmatched noise.

gSDR matched and unmatched noise.
… 
SAR matched and unmatched noise.

SAR matched and unmatched noise.
… 
gSIR, matched and unmatched noise.

+3
gSIR, matched and unmatched noise.
… 



Description



   

Indexed in scopus

https://www.scopus.com/authid/detail.uri?authorId=57221477786
      

Article metrics

10.31763/DSJ.v5i1.1674 Abstract views : | PDF views :

   

Cite

   

Full Text

Download

Conflict of interest


“Authors state no conflict of interest”


Funding Information


This research received no external funding or grants


Peer review:


Peer review under responsibility of Defence Science Journal


Ethics approval:


Not applicable.


Consent for publication:


Not applicable.


Acknowledgements:


None.