nnAudio2.features.mel.MFCC
- class nnAudio2.features.mel.MFCC(sr=22050, n_mfcc=20, norm='ortho', verbose=True, ref=1.0, amin=1e-10, top_db=80.0, **kwargs)
Bases:
ModuleThis function is to calculate the Mel-frequency cepstral coefficients (MFCCs) of the input signal. This algorithm first extracts Mel spectrograms from the audio clips, then the discrete cosine transform is calcuated to obtain the final MFCCs. Therefore, the Mel spectrogram part can be made trainable using
trainable_melandtrainable_STFT. It only support type-II DCT at the moment. Input signal should be in either of the following shapes.(len_audio)(num_audio, len_audio)(num_audio, 1, len_audio)
The correct shape will be inferred autommatically if the input follows these 3 shapes. Most of the arguments follow the convention from librosa. This class inherits from
nn.Module, therefore, the usage is same asnn.Module.- Parameters:
sr (int) – The sampling rate for the input audio. It is used to calculate the correct
fminandfmax. Setting the correct sampling rate is very important for calculating the correct frequency.n_mfcc (int) – The number of Mel-frequency cepstral coefficients
norm (string) – The default value is ‘ortho’. Normalization for DCT basis
**kwargs – Other arguments for Melspectrogram such as n_fft, n_mels, hop_length, and window
- Returns:
MFCCs – It returns a tensor of MFCCs. shape =
(num_samples, n_mfcc, time_steps).- Return type:
torch.tensor
Examples
>>> spec_layer = Spectrogram.MFCC() >>> mfcc = spec_layer(x)
Methods
__init__Initialize internal Module state, shared by both nn.Module and ScriptModule.
Return the extra representation of the module.
Convert a batch of waveforms to MFCC.
- extra_repr() str
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x)
Convert a batch of waveforms to MFCC.
- Parameters:
x (torch tensor) –
Input signal should be in either of the following shapes.
(len_audio)(num_audio, len_audio)
3.
(num_audio, 1, len_audio)It will be automatically broadcast to the right shape