This is the sample site for the DART paper accepted in the Audio Imagination workshop of NeurIPS 2024. Below, you can find audio samples from this paper.
For code please refer to: https://github.com/amaai-lab/DART
These samples were used for evaluating audio quality in the listening test.
Utterance 1: The eastern heavens were equally spectacular.
Utterance 2: Philip did not pursue the subject.
Utterance 3: At the same time spears and arrows began to fall among the invaders.
Utterance 4: Men who endure it call it living death.
Utterance 5: In the crib the baby sat up and began to prattle.
Utterance 6: In the bohemian club of san francisco there are some crack sailors.
Utterance 7: Everything was working smoothly better than I had expected.
Ground Truth | MLVAE-Tacotron | Fastspeech2-GE2E | Fastspeech2-GST | Fastspeech2-GST-GE2E | DARTscratch | DART without VQ | DART |
Speaker: ABA (Arabic) Speaker: EBVS (Spanish) Speaker: HKK (Korean) Speaker: LXC (Chinese) Speaker: NCC (Chinese) Speaker: SVBI (Hindi) Speaker: THV (Vietnamese) |
These speakers had their accent converted to the target accent.
Utterance 1: This piece of cake is so yummy, I can't wait to bake another one.
Utterance 2: Without you, I would not be able to do it.
Utterance 3: I will go inside and tell the truth.
Utterance 4: And you always come to that shop to order the same meal.
Note that ground truth reference is a different sentence...
Source Ground Truth | DART no conversion (for reference) | MLVAE-Tacotron | Fastspeech2-GST | Fastspeech2-GST-GE2E | DARTscratch | DART without VQ | DART64 | DART128 | DART512 |
Speaker:ABA (Arabic) Accent: Vietnamese Speaker:NCC (Chinese) Accent: Arabic Speaker:SVBI (Hindi) Accent: Chinese Speaker:THV (Vietnamese) Accent: Hindi |