Post-Training Latent Dimension Reduction in Neural Audio Coding

This work addresses the problem of latent space quantization in neural audio coding. A covariance analysis of latent space is performed on several pre-trained audio coding models (Lyra V2, EnCodec, AudioDec). It is proposed to truncate latent space dimension using a fixed linear transform. The Karhunen-Lo`eve transform (KLT) is applied on learned residual vector quantization (RVQ) codebooks. The proposed method is applied in a backward-compatible way to EnCodec, and we show that quantization complexity and codebook storage are reduced (by 43.4%), with no noticeable difference in subjective AB tests.

Mots clés

Neural audio coding Vector quantization Latent space Karhunen-Loève transform

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

EUSIPCO_2024_V3.pdf (555.1 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Thomas Muller : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04488929

Soumis le : lundi 4 mars 2024-17:43:23

Dernière modification le : lundi 16 septembre 2024-17:52:04

Archivage à long terme le : mercredi 5 juin 2024-20:59:17

Dates et versions

hal-04488929 , version 1 (04-03-2024)

Identifiants

HAL Id : hal-04488929 , version 1

Citer

Thomas Muller, Stéphane Ragot, Pierrick Philippe, Pascal Scalart. Post-Training Latent Dimension Reduction in Neural Audio Coding. 2024. ⟨hal-04488929⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC IRISA-D3 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

230 Consultations

205 Téléchargements