TR2025-116
Audio Signal Processing in the Artificial Intelligence Era: Challenges and Directions
-
- "Audio Signal Processing in the Artificial Intelligence Era: Challenges and Directions", Journal of the Audio Engineering Society, August 2025.BibTeX TR2025-116 PDF
- @article{Steinmetz2025aug,
- author = {Steinmetz, Christian and Uhle, Christian and Everardo, Flavio and Mitcheltree, Christopher and McElveen, J. Keith and Jot, Jean-Marc and Wichern, Gordon},
- title = {{Audio Signal Processing in the Artificial Intelligence Era: Challenges and Directions}},
- journal = {Journal of the Audio Engineering Society},
- year = 2025,
- month = aug,
- url = {https://www.merl.com/publications/TR2025-116}
- }
,
- "Audio Signal Processing in the Artificial Intelligence Era: Challenges and Directions", Journal of the Audio Engineering Society, August 2025.
-
MERL Contact:
-
Research Areas:
Abstract:
Artificial intelligence ( AI ) has seen significant advancement in recent years, leading to increasing interest in integrating these techniques to solve both existing and emerging problems in audio engineering. In this paper, we investigate current trends in the application of AI for audio engineering, outlining open problems and applications in the research field. We begin by providing an overview of AI-based algorithm development in the context of audio, discussing problem selection and taxonomy. We then explore human-centric AI challenges and how they relate to audio engineering, including ethics, trustworthiness, explainability, and interaction, emphasizing the need for ethically sound and human-centered AI systems. Subsequently, we examine technical challenges that arise when applying modern AI techniques to audio, including robust generalization, audio quality, high sample rates, and real-time processing with low latency. Finally, we outline applications of AI in audio engineering, covering the development of machine learning-powered audio effects, synthesizers, and automated mixing systems, as well as spatial audio, speech enhancement, dialog separation and music generation. We emphasize the need for a balanced approach that integrates human-centric concerns with technological advancements, advocating for responsible and effective application of AI.