Tutorials

Step-by-step tutorials are available in the tutorials/ folder of the nnAudio2 repository. Each tutorial is a self-contained Jupyter notebook that can be run locally.

Part 1

Computing Mel spectrograms with nnAudio2 — loading audio, initialising the MelSpectrogram layer, and visualising the output.

Part 2

Training a linear keyword spotter with trainable basis functions — embedding nnAudio2 inside a LightningModule, enabling trainable_mel and trainable_STFT, and training on Google Speech Commands.

Part 3

Evaluation and visualisation — loading a saved checkpoint, running the test set, and plotting the learned Mel filterbank and STFT kernels.

Part 4

Using more complex non-linear models — swapping the linear classifier for a BC-ResNet while keeping the nnAudio2 front-end unchanged.

Part 5

Fast & differentiable audio features with HuggingFace — benchmarking librosa, torchaudio, and nnAudio2 on MPS/GPU; integrating MelSpectrogram as the first layer of a HuggingFace Trainer-compatible model; enabling trainable_mel=True and visualising filterbank adaptation. Demonstrates a +28 % relative accuracy improvement on Google Speech Commands v0.02 (35-class) over a fixed mel baseline.

To run the tutorials, install the dependencies listed in tutorials/requirements.txt and open the notebooks in Jupyter:

pip install -r tutorials/requirements.txt
jupyter notebook tutorials/