Chapter 8 Dive into Deep Learning 8.1 Linear Neural Networks
eectively has a size of 1×1. A 1×1convolution kernel operates on each pixel individually, without
considering its neighbors.
Let’s break down the implications of this:
1. Independent Operation on Each Pixel: A 1×1convolution operates on each pixel independently. For
an input with Cin channels, each pixel is represented by a Cin-dimensional vector. e 1×1convolution
transforms this Cin-dimensional vector into a Cout-dimensional vector, where Cout is the number of output
channels.
2. MLP for Each Set of Channels: e transformation from Cin to Cout for each pixel can be viewed as a
fully connected layer (or MLP) applied independently to each spatial location. is is because the weights of
the 1×1convolution kernel can be seen as the weights of a fully connected layer that connects each input
channel to each output channel.
3. Network in Network (NiN): e idea of using 1×1convolutions to implement an MLP for each set of
channels led to the Network in Network (NiN) architecture. In NiN, multiple 1×1convolutional layers are
stacked, allowing for deep per-pixel transformations. is introduces additional non-linearity for each pixel
without changing the spatial dimensions.
In summary, a 1×1convolution kernel eectively applies an MLP to each pixel independently, transform-
ing the channels at each spatial location. is concept is central to the Network in Network (NiN) architecture,
which uses 1×1convolutions to introduce additional non-linearities in the network without altering the spa-
tial dimensions.
5. What happens with convolutions when an object is at the boundary of an image? When an object
is at the boundary of an image, and we apply convolutions, several issues and considerations arise:
1. Loss of Information: e primary issue is that for pixels on the boundary, there aren’t enough neighbor-
ing pixels to apply the convolution kernel fully. is means that these boundary pixels might not be processed
adequately, leading to potential loss of information about objects at the image boundary.
2. Padding: To address the boundary issue, one common approach is to add padding around the image.
Padding involves adding extra pixels around the boundary of the image. ere are dierent types of padding:
- Zero Padding: Adding pixels with a value of zero. - Reect Padding: e boundary pixels are reected.
For instance, if the last row of an image is [a, b, c], it becomes [a, b, c, c, b, a]aer reect padding. - Replicate
Padding: e boundary pixels are replicated. Using the previous example, it becomes [a, b, c, c, c, c].
Padding ensures that every pixel, including those on the boundary, has a full neighborhood of pixels to
apply the convolution kernel.
3. Valid vs. Same Convolutions: - Valid Convolution: No padding is used. e resulting feature map aer
convolution is smaller than the input image because the kernel can only be applied to positions where it ts
entirely within the image boundaries. - Same Convolution: Padding is added such that the output feature
map has the same spatial dimensions as the input image.
4. Edge Eects: Even with padding, objects at the boundary might be aected dierently than objects in
the center of the image. is is because the articial values introduced by padding might not represent the
actual image content. is can lead to edge artifacts in the output feature map.
5. Dilated Convolutions: In dilated convolutions, where the kernel elements are spaced out by introducing
gaps, the boundary eects can be even more pronounced, especially if the dilation rate is high.
6. Pooling Layers: e boundary eects can also impact pooling layers (like max-pooling or average-
pooling). If the pooling window doesn’t t entirely within the image or feature map boundaries, padding
might be needed, or the pooling window might be truncated.
6. Prove that the convolution is symmetric, f∗g=g∗fTo prove that convolution is symmetric, we’ll
start with the denition of the convolution operation for two functions f(t)and g(t):
55