基于权重比例分配的码书映射语音转换算法

英文篇名：Voice Conversion Algorithm Using Codebook Mapping by Weight Proportion Assignment
作者：鲍静益 ; 徐宁
英文作者：BAO Jingyi;XU Ning;School of Electrical and Information Engineering,Changzhou Institute of Technology;College of IoT Engineering,Hohai University;
关键词：语音转换 ; 码书映射 ; 阈值 ; 比例分配 ; 高效映射
英文关键词：voice conversion;;mapping of codebook;;threshold;;proportion assignment;;efficient mapping
中文刊名：CZGB
英文刊名：Journal of Changzhou Institute of Technology
机构：常州工学院电气信息工程学院;河海大学物联网工程学院;
出版日期：2019-04-28
出版单位：常州工学院学报
年：2019
期：v.32;No.156
基金：国家自然科学基金项目(61401148)
语种：中文;
页：CZGB201902002
页数：6
CN：02
ISSN：32-1598/T
分类号：13-18

摘要

针对传统的基于码书映射的语音转换算法转换语音与目标语音相似度不高的问题,提出一种新的基于权重比例分配的码书映射语音转换方法。在训练阶段,通过设置不同权重阈值将码书权重矩阵稀疏化,分析大于阈值的码书权重值对转换语音影响的大小,根据不同权重值对转换语音的影响比例不同这一特点,训练码书权重的映射关系。在转换阶段,利用训练阶段得到的映射关系,将源码书权重映射得到目标码书权重,利用该权重和目标码书合成转换语音。对传统算法和本文算法进行仿真对比实验,从主观和客观两方面进行测试。实验结果表明,基于权重比例分配的码书映射的语音转换算法不仅可以有效地改善转换语音的音质,同时也大大提高了转换语音与目标语音的个性特征的相似度。
Aiming at the phenomenon of the low similarity between the converted and the target speech of the traditional codebook-based mapping algorithm,a new voice conversion algorithm using codebook mapping by weight proportion assignment is proposed. In the training stage,by setting the weight threshold,the codebook weight matrix tends to be sparse. Effects of the code weights which are greater than the threshold are analyzed. Since different weight values have different influences on the performance of voice conversion,the mapping relationships between source and target sparse weight have been trained. In the conversion stage,the weights of the source codebook are mapped using the mapping relationship to obtain the weights of the target codebook,after which the target weights and eigen-spectrum are used to synthesize the converted speech. Our algorithm is compared with the classical algorithms in terms of the subjective and objective evaluations. Experimental results show that our proposed method cannot only improve the speech quality effectively,but also greatly improve the similarity between the converted and the target speech.

引文

[1]D’SOUZAKevin,TALELE K T V. Voice conversion using Gaussian Mixture Models[C]//2015 IEEE International Conference on Communication, Information&Computing Technology(ICCICT). Mumbai,2015:1-5.
    [2]ABE M,NAKAMURA S,SHIKANO K,et al. Voice conversion through vector quantization[C]//ICASSP-88.,IEEE International Conference on Acoustics,Speech,and Signal Processing.New York,1988:655-658.
    [3]ZHAO Guanlong,GUTIERREZ-OSUNA Ricardo. Exemplar selection methods in voice conversion[C]//2017 IEEE International Conference on Acoustics,Speech,and Signal Processing.New Orleans,2017:5525-5529.
    [4]AIHARA Ryo,TAKIGUCHI Tetsuya,ARIKI Yasuo. Parallel dictionary learning for voice conversion using discriminative graph-embedded non-negative matrix factorization[C]//Interspeech2016. San Francisco,2016:292-296.
    [5]MOHAMMADI Seyed Hamidreza,KAIN Alexander. An overview of voice conversion systems[J]. Speech Communication,2017,88:65-82.
    [6]MING Huaiping,HUANG Dongyan,XIE Lei,et al. Exemplarbased sparse representation of timbre and prosody for voice conversion[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP). Shanghai,2016.
    [7]XU Ning,SHAO Xi,YANG Zhen. A novel voice morphing system using bi-GMM for high quality transformation[C]//IEEE. 2008 Ninth ACIS International Conference on Software Engineering,Artificial Intelligence,Networking,and Parallel/Distributed Computing. Phuket,2008:485-489.
    [8]PERCYBROOKS Winston,MOORE Elliot,MCMILLAN Correy.Phoneme independent HMM voice conversion[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP). Vancouver,Canada,2013:6925-6929.
    [9]NAKASHIKA Toru,TAKIGUCHI Tetsuya,ARIKI Yasuo.Voice conversion using RNN pre-trained by recurrent temporal restricted Boltzmann machines[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(3):580-587.
    [10]ARSLAN L M. Speaker transformation algorithm using segmental codebooks(STASC)[J]. Speech Communication,1999,28(3):211-226.
    [11]AIHARA Ryo,NAKASHIKA Toru,TAKIGUCHI Tetsuya,et al. Voice conversion based on Non-negative matrix factorization using phoneme-categorized dictionary[C]//2014 IEEE International Conference on Acoustic,Speech and Signal Processing(ICASSP). Florence,2014:7894-7898.
    [12]TURK Oytun,ARSLAN Levent M. Robust processing techniques for voice conversion[J]. Computer Speech&Language,2006,20(4):441-467.