Audio

Sonification

Overview

Neural networks consist of layers, each of which holds features that activate in response to certain patterns in the input. For image-based tasks, networks have been studied using feature visualization, which produces interpretable images that stimulate the response of each feature map individually. Visualization methods help us understand and interpret what networks “see.” In particular, they elucidate the layer-dependent semantic meaning of features, with shallow features representing edges and deep features representing objects. While this approach has been quite effective for vision models, our understanding of networks for processing auditory inputs, such as automatic speech recognition models, is more limited because their inputs are not visual.

Continue reading