nnAudio¶
nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras.
Other GPU audio processing tools are torchaudio and tf.signal. But they are not using the neural network approach, and hence the Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is still very difficult to install under the Windows environment due to sox
. nnAudio is a more compatible audio processing tool across different operating systems since it relies mostly on PyTorch convolutional neural network. The name of nnAudio comes from torch.nn
Installation¶
pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation
or
pip install nnAudio==0.3.0
Documentation¶
Comparison with other libraries¶
Feature |
|||||||
---|---|---|---|---|---|---|---|
Trainable |
✅ |
❌ |
✅ |
❌ |
❌ |
✅ |
❌ |
Differentiable |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
Linear frequency STFT |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
Logarithmic frequency STFT |
✅ |
❌ |
✅ |
❌ |
❌ |
❌ |
❌ |
Inverse STFT |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
Griffin-Lim |
✅ |
❌ |
❌ |
✅ |
✅ |
❌ |
✅ |
Mel |
✅ |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
MFCC |
✅ |
❌ |
❌ |
✅ |
✅ |
❌ |
✅ |
CQT |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
Gammatone |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
CFP1 |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
GPU support |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
✅: Fully support ☑️: Developing (only available in dev version) ❌: Not support
1 Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music