nnAudio2 2.0.0

Welcome to nnAudio2 2.0.0. A big shout out to Miguel Pérez who made this new update possible. Please feel free to check out his github repositories too.

This new version restructured the coding style, making things more modular and pythonic. In terms of functionalities, everything remains the same. In the future releases, nnAudio2.Spectrogram will be replaced by nnAudio.features (see also features().)

VQT() is finally avaliable in version 2.0.0 thanks to Hao Hao Tan!

Reminder: if you use nnAudio2, please cite The paper describing its release.

Quick Start

from nnAudio2 import features
from scipy.io import wavfile
import torch
sr, song = wavfile.read('./Bach.wav') # Loading your audio
x = song.mean(1) # Converting Stereo  to Mono
x = torch.tensor(x, device='cuda:0').float() # casting the array into a PyTorch Tensor

spec_layer = features.STFT(n_fft=2048, freq_bins=None, hop_length=512,
                              window='hann', freq_scale='linear', center=True, pad_mode='reflect',
                              fmin=50,fmax=11025, sr=sr) # Initializing the model

spec = spec_layer(x) # Feed-forward your waveform to get the spectrogram

For inverse STFT, use the standard uniform-bin configuration with freq_scale='no'. The non-uniform linear, log, and log2 frequency scales should be treated as analysis-only.

nnAudio2 is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras.

Other GPU audio processing tools are torchaudio and tf.signal. But they are not using the neural network approach, and hence the Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is still very difficult to install under the Windows environment due to sox. nnAudio2 is a more compatible audio processing tool across different operating systems since it relies mostly on PyTorch convolutional neural network. The name of nnAudio2 comes from torch.nn.

The implementation details for nnAudio2 have also been published in IEEE Access, people who are interested can read the paper.

The source code for nnAudio2 can be found in GitHub.

Citation

Indices and tables