ResNet 顯著改變了如何在深度網(wǎng)絡(luò)中參數(shù)化函數(shù)的觀點(diǎn)。DenseNet(密集卷積網(wǎng)絡(luò))在某種程度上是對(duì)此的邏輯延伸 (Huang et al. , 2017)。DenseNet 的特點(diǎn)是每一層都連接到所有前面的層的連接模式和連接操作(而不是 ResNet 中的加法運(yùn)算符)以保留和重用早期層的特征。要了解如何得出它,讓我們稍微繞道數(shù)學(xué)。
import torch from torch import nn from d2l import torch as d2l
from mxnet import init, np, npx from mxnet.gluon import nn from d2l import mxnet as d2l npx.set_np()
import jax from flax import linen as nn from jax import numpy as jnp from d2l import jax as d2l
import tensorflow as tf from d2l import tensorflow as d2l
8.7.1. 從 ResNet 到 DenseNet
回憶一下函數(shù)的泰勒展開式。對(duì)于這一點(diǎn)x=0 它可以寫成
(8.7.1)f(x)=f(0)+x?[f′(0)+x?[f″(0)2!+x?[f?(0)3!+…]]].
關(guān)鍵是它將函數(shù)分解為越來越高階的項(xiàng)。同樣,ResNet 將函數(shù)分解為
(8.7.2)f(x)=x+g(x).
也就是說,ResNet分解f分為一個(gè)簡(jiǎn)單的線性項(xiàng)和一個(gè)更復(fù)雜的非線性項(xiàng)。如果我們想捕獲(不一定要添加)兩個(gè)術(shù)語以外的信息怎么辦?一種這樣的解決方案是 DenseNet (Huang等人,2017 年)。
圖 8.7.1 ResNet(左)和 DenseNet(右)在跨層連接中的主要區(qū)別:加法的使用和連接的使用。
如圖 8.7.1所示,ResNet 和 DenseNet 的主要區(qū)別在于后者的輸出是 連接的(表示為[,]) 而不是添加。結(jié)果,我們從x在應(yīng)用越來越復(fù)雜的函數(shù)序列后,它的值:
(8.7.3)x→[x,f1(x),f2([x,f1(x)]),f3([x,f1(x),f2([x,f1(x)])]),…].
最后,將所有這些功能組合在 MLP 中,再次減少特征數(shù)量。就實(shí)現(xiàn)而言,這非常簡(jiǎn)單:我們不是添加術(shù)語,而是將它們連接起來。DenseNet 這個(gè)名字源于變量之間的依賴圖變得非常密集這一事實(shí)。這種鏈的最后一層與前面的所有層緊密相連。密集連接如圖 8.7.2所示 。
圖 8.7.2 DenseNet 中的密集連接。注意維度如何隨著深度增加。
構(gòu)成 DenseNet 的主要組件是密集塊和 過渡層。前者定義輸入和輸出如何連接,而后者控制通道的數(shù)量,使其不會(huì)太大,因?yàn)閿U(kuò)展 x→[x,f1(x),f2([x,f1(x)]),…] 可以是相當(dāng)高維的。
8.7.2. 密集塊
DenseNet 使用改進(jìn)的 ResNet 的“批量歸一化、激活和卷積”結(jié)構(gòu)(參見第 8.6 節(jié)中的練習(xí) )。首先,我們實(shí)現(xiàn)這個(gè)卷積塊結(jié)構(gòu)。
def conv_block(num_channels): return nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.LazyConv2d(num_channels, kernel_size=3, padding=1))
def conv_block(num_channels): blk = nn.Sequential() blk.add(nn.BatchNorm(), nn.Activation('relu'), nn.Conv2D(num_channels, kernel_size=3, padding=1)) return blk
class ConvBlock(nn.Module): num_channels: int training: bool = True @nn.compact def __call__(self, X): Y = nn.relu(nn.BatchNorm(not self.training)(X)) Y = nn.Conv(self.num_channels, kernel_size=(3, 3), padding=(1, 1))(Y) Y = jnp.concatenate((X, Y), axis=-1) return Y
class ConvBlock(tf.keras.layers.Layer): def __init__(self, num_channels): super(ConvBlock, self).__init__() self.bn = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.ReLU() self.conv = tf.keras.layers.Conv2D( filters=num_channels, kernel_size=(3, 3), padding='same') self.listLayers = [self.bn, self.relu, self.conv] def call(self, x): y = x for layer in self.listLayers.layers: y = layer(y) y = tf.keras.layers.concatenate([x,y], axis=-1) return y
密集塊由多個(gè)卷積塊組成,每個(gè)卷積塊使用相同數(shù)量的輸出通道。然而,在前向傳播中,我們?cè)谕ǖ谰S度上連接每個(gè)卷積塊的輸入和輸出。惰性評(píng)估允許我們自動(dòng)調(diào)整維度。
class DenseBlock(nn.Module):
def __init__(self, num_convs, num_channels):
super(DenseBlock, self).__init__()
layer = []
for i in range(num_convs):
layer.append(conv_block(num_channels))
self.net = nn.Sequential(*layer)
def forward(self, X):
for blk in self.net:
Y = blk(X)
# Concatenate input and output of each block along the channels
X = torch.cat((X, Y), dim=1)
return X
class DenseBlock(nn.Block):
def __init__(self, num_convs, num_channels):
super().__init__()
self.net = nn.Sequential()
for _ in range(num_convs):
self.net.add(conv_block(num_channels))
def forward(self, X):
for blk in self.net:
Y = blk(X)
# Concatenate input and output of each block along the channels
X = np.concatenate((X, Y), axis=1)
return X
class DenseBlock(nn.Module):
num_convs: int
num_channels: int
training: bool = True
def setup(self):
layer = []
for i in range(self.num_convs):
layer.append(ConvBlock(self.num_channels, self.training))
self.net = nn.Sequential(layer)
def __call__(self, X):
return self.net(X)
class DenseBlock(tf.keras.layers.Layer):
def __init__(self, num_convs, num_channels):
super(DenseBlock, self).__init__()
self.listLayers = []
for _ in range(num_convs):
self.listLayers.append(ConvBlock(num_channels))
def call(self, x):
for layer in self.listLayers.layers:
x = layer(x)
return x
在下面的示例中,我們定義了一個(gè)DenseBlock具有 10 個(gè)輸出通道的 2 個(gè)卷積塊的實(shí)例。當(dāng)使用 3 個(gè)通道的輸入時(shí),我們將得到一個(gè)輸出3+10+10=23渠道。卷積塊通道數(shù)控制輸出通道數(shù)相對(duì)于輸入通道數(shù)的增長(zhǎng)。這也稱為增長(zhǎng)率。
blk = DenseBlock(2, 10) X = torch.randn(4, 3, 8, 8) Y = blk(X) Y.shape
torch.Size([4, 23, 8, 8])
blk = DenseBlock(2, 10) X = np.random.uniform(size=(4, 3, 8, 8)) blk.initialize() Y = blk(X) Y.shape
(4, 23, 8, 8)
blk = DenseBlock(2, 10) X = jnp.zeros((4, 8, 8, 3)) Y = blk.init_with_output(d2l.get_key(), X)[0] Y.shape
(4, 8, 8, 23)
blk = DenseBlock(2, 10) X = tf.random.uniform((4, 8, 8, 3)) Y = blk(X) Y.shape
TensorShape([4, 8, 8, 23])
8.7.3. 過渡層
由于每個(gè)密集塊都會(huì)增加通道的數(shù)量,因此添加太多通道會(huì)導(dǎo)致模型過于復(fù)雜。過渡層用于控制模型的復(fù)雜性。它通過使用一個(gè)減少通道的數(shù)量1×1卷積。此外,它通過步幅為 2 的平均池將高度和寬度減半。
def transition_block(num_channels):
return nn.Sequential(
nn.LazyBatchNorm2d(), nn.ReLU(),
nn.LazyConv2d(num_channels, kernel_size=1),
nn.AvgPool2d(kernel_size=2, stride=2))
def transition_block(num_channels):
blk = nn.Sequential()
blk.add(nn.BatchNorm(), nn.Activation('relu'),
nn.Conv2D(num_channels, kernel_size=1),
nn.AvgPool2D(pool_size=2, strides=2))
return blk
class TransitionBlock(nn.Module):
num_channels: int
training: bool = True
@nn.compact
def __call__(self, X):
X = nn.BatchNorm(not self.training)(X)
X = nn.relu(X)
X = nn.Conv(self.num_channels, kernel_size=(1, 1))(X)
X = nn.avg_pool(X, window_shape=(2, 2), strides=(2, 2))
return X
class TransitionBlock(tf.keras.layers.Layer):
def __init__(self, num_channels, **kwargs):
super(TransitionBlock, self).__init__(**kwargs)
self.batch_norm = tf.keras.layers.BatchNormalization()
self.relu = tf.keras.layers.ReLU()
self.conv = tf.keras.layers.Conv2D(num_channels, kernel_size=1)
self.avg_pool = tf.keras.layers.AvgPool2D(pool_size=2, strides=2)
def call(self, x):
x = self.batch_norm(x)
x = self.relu(x)
x = self.conv(x)
return self.avg_pool(x)
將具有 10 個(gè)通道的過渡層應(yīng)用于前面示例中的密集塊的輸出。這將輸出通道的數(shù)量減少到 10,并將高度和寬度減半。
blk = transition_block(10) blk(Y).shape
torch.Size([4, 10, 4, 4])
blk = transition_block(10) blk.initialize() blk(Y).shape
(4, 10, 4, 4)
blk = TransitionBlock(10) blk.init_with_output(d2l.get_key(), Y)[0].shape
(4, 4, 4, 10)
blk = TransitionBlock(10) blk(Y).shape
TensorShape([4, 4, 4, 10])
8.7.4. DenseNet 模型
接下來,我們將構(gòu)建一個(gè) DenseNet 模型。DenseNet 首先使用與 ResNet 中相同的單卷積層和最大池化層。
class DenseNet(d2l.Classifier):
def b1(self):
return nn.Sequential(
nn.LazyConv2d(64, kernel_size=7, stride=2, padding=3),
nn.LazyBatchNorm2d(), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
class DenseNet(d2l.Classifier):
def b1(self):
net = nn.Sequential()
net.add(nn.Conv2D(64, kernel_size=7, strides=2, padding=3),
nn.BatchNorm(), nn.Activation('relu'),
nn.MaxPool2D(pool_size=3, strides=2, padding=1))
return net
class DenseNet(d2l.Classifier):
num_channels: int = 64
growth_rate: int = 32
arch: tuple = (4, 4, 4, 4)
lr: float = 0.1
num_classes: int = 10
training: bool = True
def setup(self):
self.net = self.create_net()
def b1(self):
return nn.Sequential([
nn.Conv(64, kernel_size=(7, 7), strides=(2, 2), padding='same'),
nn.BatchNorm(not self.training),
nn.relu,
lambda x: nn.max_pool(x, window_shape=(3, 3),
strides=(2, 2), padding='same')
])
class DenseNet(d2l.Classifier):
def b1(self):
return tf.keras.models.Sequential([
tf.keras.layers.Conv2D(
64, kernel_size=7, strides=2, padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.ReLU(),
tf.keras.layers.MaxPool2D(
pool_size=3, strides=2, padding='same')])
然后,類似于 ResNet 使用的由殘差塊組成的四個(gè)模塊,DenseNet 使用四個(gè)密集塊。與 ResNet 類似,我們可以設(shè)置每個(gè)密集塊中使用的卷積層數(shù)。這里,我們?cè)O(shè)置為4,與8.6節(jié)中的ResNet-18模型一致。此外,我們將密集塊中卷積層的通道數(shù)(即增長(zhǎng)率)設(shè)置為 32,因此每個(gè)密集塊將添加 128 個(gè)通道。
在 ResNet 中,每個(gè)模塊之間的高度和寬度通過步長(zhǎng)為 2 的殘差塊減少。這里,我們使用過渡層將高度和寬度減半,并將通道數(shù)減半。與 ResNet 類似,在最后連接一個(gè)全局池化層和一個(gè)全連接層以產(chǎn)生輸出。
@d2l.add_to_class(DenseNet)
def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4),
lr=0.1, num_classes=10):
super(DenseNet, self).__init__()
self.save_hyperparameters()
self.net = nn.Sequential(self.b1())
for i, num_convs in enumerate(arch):
self.net.add_module(f'dense_blk{i+1}', DenseBlock(num_convs,
growth_rate))
# The number of output channels in the previous dense block
num_channels += num_convs * growth_rate
# A transition layer that halves the number of channels is added
# between the dense blocks
if i != len(arch) - 1:
num_channels //= 2
self.net.add_module(f'tran_blk{i+1}', transition_block(
num_channels))
self.net.add_module('last', nn.Sequential(
nn.LazyBatchNorm2d(), nn.ReLU(),
nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(),
nn.LazyLinear(num_classes)))
self.net.apply(d2l.init_cnn)
@d2l.add_to_class(DenseNet)
def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4),
lr=0.1, num_classes=10):
super(DenseNet, self).__init__()
self.save_hyperparameters()
self.net = nn.Sequential()
self.net.add(self.b1())
for i, num_convs in enumerate(arch):
self.net.add(DenseBlock(num_convs, growth_rate))
# The number of output channels in the previous dense block
num_channels += num_convs * growth_rate
# A transition layer that halves the number of channels is added
# between the dense blocks
if i != len(arch) - 1:
num_channels //= 2
self.net.add(transition_block(num_channels))
self.net.add(nn.BatchNorm(), nn.Activation('relu'),
nn.GlobalAvgPool2D(), nn.Dense(num_classes))
self.net.initialize(init.Xavier())
@d2l.add_to_class(DenseNet)
def create_net(self):
net = self.b1()
for i, num_convs in enumerate(self.arch):
net.layers.extend([DenseBlock(num_convs, self.growth_rate,
training=self.training)])
# The number of output channels in the previous dense block
num_channels = self.num_channels + (num_convs * self.growth_rate)
# A transition layer that halves the number of channels is added
# between the dense blocks
if i != len(self.arch) - 1:
num_channels //= 2
net.layers.extend([TransitionBlock(num_channels,
training=self.training)])
net.layers.extend([
nn.BatchNorm(not self.training),
nn.relu,
lambda x: nn.avg_pool(x, window_shape=x.shape[1:3],
strides=x.shape[1:3], padding='valid'),
lambda x: x.reshape((x.shape[0], -1)),
nn.Dense(self.num_classes)
])
return net
@d2l.add_to_class(DenseNet)
def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4),
lr=0.1, num_classes=10):
super(DenseNet, self).__init__()
self.save_hyperparameters()
self.net = tf.keras.models.Sequential(self.b1())
for i, num_convs in enumerate(arch):
self.net.add(DenseBlock(num_convs, growth_rate))
# The number of output channels in the previous dense block
num_channels += num_convs * growth_rate
# A transition layer that halves the number of channels is added
# between the dense blocks
if i != len(arch) - 1:
num_channels //= 2
self.net.add(TransitionBlock(num_channels))
self.net.add(tf.keras.models.Sequential([
tf.keras.layers.BatchNormalization(),
tf.keras.layers.ReLU(),
tf.keras.layers.GlobalAvgPool2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(num_classes)]))
8.7.5. 訓(xùn)練
由于我們?cè)谶@里使用更深的網(wǎng)絡(luò),在本節(jié)中,我們將輸入的高度和寬度從 224 減少到 96 以簡(jiǎn)化計(jì)算。
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
trainer = d2l.Trainer(max_epochs=10) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) with d2l.try_gpu(): model = DenseNet(lr=0.01) trainer.fit(model, data)
8.7.6. 總結(jié)與討論
構(gòu)成 DenseNet 的主要組件是密集塊和過渡層。對(duì)于后者,我們需要在組成網(wǎng)絡(luò)時(shí)通過添加再次縮小通道數(shù)量的過渡層來控制維數(shù)。在跨層連接方面,不同于ResNet將輸入和輸出相加,DenseNet是在通道維度上拼接輸入和輸出。雖然這些連接操作重用特征來實(shí)現(xiàn)計(jì)算效率,但不幸的是它們會(huì)導(dǎo)致大量的 GPU 內(nèi)存消耗。因此,應(yīng)用 DenseNet 可能需要更高效的內(nèi)存實(shí)現(xiàn),這可能會(huì)增加訓(xùn)練時(shí)間 (Pleiss等人,2017 年)。
8.7.7. 練習(xí)
為什么我們?cè)谶^渡層使用平均池而不是最大池?
DenseNet 論文中提到的優(yōu)點(diǎn)之一是其模型參數(shù)比 ResNet 小。為什么會(huì)這樣?
DenseNet 被詬病的一個(gè)問題是它的高內(nèi)存消耗。
真的是這樣嗎?嘗試將輸入形狀更改為 224×224憑經(jīng)驗(yàn)查看實(shí)際的 GPU 內(nèi)存消耗。
你能想到減少內(nèi)存消耗的替代方法嗎?您需要如何更改框架?
實(shí)施 DenseNet 論文(Huang等人,2017 年)表 1 中提供的各種 DenseNet 版本。
應(yīng)用 DenseNet 思想設(shè)計(jì)基于 MLP 的模型。將其應(yīng)用于第 5.7 節(jié)中的房?jī)r(jià)預(yù)測(cè)任務(wù)。
-
連接網(wǎng)絡(luò)
+關(guān)注
關(guān)注
0文章
2瀏覽量
868
發(fā)布評(píng)論請(qǐng)先 登錄
【Milk-V Duo 開發(fā)板免費(fèi)體驗(yàn)】學(xué)習(xí):基于Duo開發(fā)板的Densenet圖像分類
使用加權(quán)密集連接卷積網(wǎng)絡(luò)的深度強(qiáng)化學(xué)習(xí)方法說明
基于PyTorch的深度學(xué)習(xí)入門教程之使用PyTorch構(gòu)建一個(gè)神經(jīng)網(wǎng)絡(luò)
基于PyTorch的深度學(xué)習(xí)入門教程之PyTorch重點(diǎn)綜合實(shí)踐
PyTorch教程8.2之使用塊的網(wǎng)絡(luò)(VGG)
PyTorch教程8.7之密集連接網(wǎng)絡(luò)(DenseNet)
PyTorch教程8.8之設(shè)計(jì)卷積網(wǎng)絡(luò)架構(gòu)
PyTorch教程之循環(huán)神經(jīng)網(wǎng)絡(luò)
PyTorch教程14.11之全卷積網(wǎng)絡(luò)

PyTorch教程-8.7. 密集連接網(wǎng)絡(luò) (DenseNet)
評(píng)論