DialogueAgents:

A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue

Abstract. Speech synthesis is crucial for human-computer interaction, enabling natural and intuitive communication. However, existing datasets involve high construction costs due to manual annotation and suffer from limited character diversity, contextual scenarios, and emotional expressiveness. To address these issues, we propose DialogueAgent, a novel hybrid agent-based speech synthesis framework, which integrates three specialized agents—a script writer, a speech synthesizer, and a dialogue critic—to collaboratively generate dialogues. Grounded in a diverse character pool, the framework iteratively revises scripts and synthesizes speech based on speech review, boosting emotional expressiveness and paralinguistic features of the synthesized dialogues. Using DialogueAgent, we contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset covering diverse topics. Extensive experiments demonstrate the effectiveness of our framework and the high quality of the MultiTalk dataset.

This page is intended solely for the purpose of research demonstration.

Overview

Chinese Samples

Scripts Framework Synthesis
{
    "id": "0",
    "topic": "讨论家庭聚会的安排",
    "conversation": [
        {
            "speaker": "张志强",
            "text": "大家好,今天我们来聊聊下周的家庭聚会吧。我觉得应该有一个相对详细的安排。",
            "prompt": "Friendly"
        },
        {
            "speaker": "王娟",
            "text": "我同意,但我们也要考虑每个人的时间安排,特别是小雪,她可能有其他计划。",
            "prompt": "Thoughtful"
        },
        {
            "speaker": "李小雪",
            "text": "没错妈妈,其实...[breath]我和小美在昨天已经约好了周末去逛街。要不我们调整一下时间?",
            "prompt": "Hesitant"
        },
        {
            "speaker": "王娟",
            "text": "我觉得可以,那你记得提前和小美沟通一下。",
            "prompt": "Agreeing"
        },
        {
            "speaker": "李小雪",
            "text": "好的妈妈,没问题。",
            "prompt": "Affirmative"
        }
    ]
}
                            
{
	"id": "1",
	"topic": "选择大学专业的讨论",
	"conversation": [
		{
			"speaker": "李小雪",
			"text": "妈妈,我还在纠结选哪个专业。你觉得我应该选什么?",
			"prompt": "Worried"
		},
		{
			"speaker": "王娟",
			"text": "选择你喜欢的,但也要考虑未来的就业前景。[breath]",
			"prompt": "Caring"
		},
		{
			"speaker": "张志强",
			"text": "小雪,选专业是很重要的,一定不要盲目跟随潮流。[laughter]",
			"prompt": "Advisory"
		},
		{
			"speaker": "李小雪",
			"text": "那我该如何权衡呢?[sigh]",
			"prompt": "Confused"
		},
		{
			"speaker": "王娟",
			"text": "找到你的兴趣所在,同时还要结合现实考虑。",
			"prompt": "Encouraging"
		},
		{
			"speaker": "李小雪",
			"text": "好的妈妈,那我再综合这两个方面好好考虑一下。[breath] 谢谢妈妈。",
			"prompt": "Thoughtful"
		}
	]
}											
											
{
	"id": "2",
	"topic": "朋友之间的争议",
	"conversation": [
		{
			"speaker": "李小雪",
			"text": "小美,最近你和王冰之间好像有点不对劲,能给我说说吗?",
			"prompt": "Curious"
		},
		{
			"speaker": "小美",
			"text": "哎,也没什么大事,就是工作上有些分歧。他好像总是倾向于持不同意见。[breath]",
			"prompt": "Frustrated"
		},
		{
			"speaker": "李小雪",
			"text": "你们有尝试过沟通吗?或者你觉得有什么障碍?",
			"prompt": "Concerned"
		},
		{
			"speaker": "小美",
			"text": "当然有尝试,不过他说话好像总是冷冰冰的,让我不太好接受。[breath]",
			"prompt": "Troubled"
		},
		{
			"speaker": "李小雪",
			"text": "或许找个更静下来的时机,你们可以心平气和地谈谈。[breath]",
			"prompt": "Empathetic"
		},
		{
			"speaker": "小美",
			"text": "是的,我希望这次我们能慢慢理解彼此的想法,找到一个双方都能接受的解决方案。[laughter]",
			"prompt": "Hopeful"
		}
	]
}
							

English Samples

Scripts Framework Synthesis
{
	"id": 0,
	"topic": "Discussing the importance of financial literacy for young adults.",
	"conversation": [
		{
			"speaker": "James Carter",
			"text": "You know, financial literacy is so crucial for young adults today. Wouldn't you agree, Mark?",
			"prompt": "Engaging"
		},
		{
			"speaker": "Mark Williams",
			"text": "Absolutely, James! [breath] Understanding budgeting and saving early can really set them up for success in the future. It's so important!",
			"prompt": "Agreeable"
		},
		{
			"speaker": "James Carter",
			"text": "Right! [breath] But how do we make these concepts engaging for them? Maybe we could use practical examples and interactive tools.",
			"prompt": "Curious"
		},
		{
			"speaker": "Mark Williams",
			"text": "We could let them manage a mock portfolio, perhaps. [breath] Real-life simulations could boost their interest and understanding.",
			"prompt": "Innovative"
		},
		{
			"speaker": "James Carter",
			"text": "That's a solid idea, Mark. [laughter] Those hands-on experiences can make all the difference!",
			"prompt": "Encouraged"
		}
	]
}
                            
{
	"id": 1,
	"topic": "Contemplating the impact of art on mental health.",
	"conversation": [
		{
			"speaker": "Michael Bennett",
			"text": "Art can be such a healing process, don’t you think, David?",
			"prompt": "thoughtful"
		},
		{
			"speaker": "David Johnson",
			"text": "Absolutely, Michael! It’s like a therapy session without words. [breath] Ever tried painting to relax?",
			"prompt": "enthusiastic"
		},
		{
			"speaker": "Michael Bennett",
			"text": "Yes, and it’s amazing how colors can express emotions we can’t put into words.",
			"prompt": "reflective"
		},
		{
			"speaker": "David Johnson",
			"text": "Exactly! Plus, it’s a great way to disconnect from the digital world for a bit.",
			"prompt": "agreeable"
		},
		{
			"speaker": "Michael Bennett",
			"text": "True. Everyone should find their creative outlet to improve their mental well-being.",
			"prompt": "supportive"
		}
	]
}											
											
{
	"id": 2,
	"topic": "Debating the role of discipline in education.",
	"conversation": [
		{
			"speaker": "Sarah Miller",
			"text": "Discipline is key in education, but how strict should we be, Emily?",
			"prompt": "Curious"
		},
		{
			"speaker": "Emily Davis",
			"text": "I feel discipline should guide, not restrain. Encouraging self-regulation is vital.",
			"prompt": "Thoughtful"
		},
		{
			"speaker": "Sarah Miller",
			"text": "I see your point. But what about maintaining order in the classroom?",
			"prompt": "Inquisitive"
		},
		{
			"speaker": "Emily Davis",
			"text": "Establishing clear expectations and mutual respect can naturally lead to order.",
			"prompt": "Confident"
		},
		{
			"speaker": "Sarah Miller",
			"text": "Yes, that balance can foster a positive learning environment.",
			"prompt": "Agreeable"
		}
	]
}
							

Ethics Statement

Given the ability of DialogueAgents to synthesize speech while preserving the speaker's identity, potential risks of misuse, such as deceiving voice recognition systems or impersonating specific individuals, may arise. In our experiments, we operate under the assumption that users willingly agree to be the designated speaker for speech synthesis. In the event of the model's application to unknown speakers in real-world scenarios, it is imperative to establish a protocol ensuring explicit consent from speakers for the utilization of their voices. Additionally, implementing a synthetic speech detection model is recommended to mitigate the potential for misuse.