Problem Formulation

Given a spectrogram \(𝑋\in \mathbb{R}^{F \times T}\), we want to obtain a posteriorgram \(Y_{\text{post}}\in [0,1]^{88\times T}\) via a neural network model \(Y_{\text{post}}=f(X)\). The piano roll \(Y\in \{0,1\}^{88\times T}\) can be obtained via applying a hard threshold on \(Y_{\text{post}}\).

../../../_images/amt_formulation.png

Fig. 1 The goal of AMT is to transform audio data into symbolic representations.