Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks

Abstract

Author(s): Mahmood Hegazy*

Large Language Models (LLMs) excel in natural language generation but often confidently produce incorrect responses, especially in tasks like mathematical reasoning. Chain-of-thought prompting, self-verification, and multi-agent debate are among the strategies proposed to improve the reasoning and factual accuracy of LLMs. Building on multi-agent debate framework, we find that multi-agent debate helps at any model scale, and that diversity of thought elicits stronger reasoning in debating LLMs. Across various model sizes, performance on mathematical reasoning tasks benefits most when diverse trained models are used. Remarkably, after 4 rounds of debate, a diverse set of medium-capacity models (Gemini-Pro, Mixtral 7B × 8, and PaLM 2-M) outperforms GPT-4 on the GSM-8K benchmark, scoring 91% accuracy. By comparison, when 3 instances of Gemini-Pro are used, performance only reaches 82%. Finally, this diverse set of medium-capacity models sets a new state-of-the-art performance on the ASDiv benchmark (94%). These results underscore the idea that the future of AI is agentic, with diverse cooperating agents yielding emergent capabilities beyond even the most powerful individual models.

References Journals of Health Journals of Neurology Medication Articles Blog | Casino Sites in Netherlands Blog | Casino Sites in China Orthopedic Articles Urology Articles Real Estate in Arabic Country Blog | Casino Sites in Australia Blog | Best Betting Sites in Ukraine Real Estate in Australia Real Estate in Germany Blog | Car Insurance in USA Real Estate in Greece Blog | Casino Sites in Indonesia Real Estate in Indonesia Blog - Find Lawyer in Virginia Real Estate in India Blog | Casino Sites in India Real Estate in Malaysia