残差网络介绍
ResNet(Residual Network)是由微软研究院所提出的一种深度卷积神经网络结构,该网络的特别之处在于采用了“残差学习(residual learning)”的方法,解决了深层次神经网络的退化问题。
在普通的卷积神经网络中,我们每经过一个卷积层,其输出就在网络中被传递下去,直到到达最后的全连接层。但在残差网络中,每个卷积层的输出不仅仅和下一层的输入相连通,还和下一层的输出相连通,这种连接方式被称为“残差连接(residual connection)”。
ResNet实现MNIST数据集
在本文中,我们将使用TensorFlow实现一个基于ResNet的模型来对MNIST手写数字进行分类。
模型搭建
首先,我们需要定义一个残差模块函数:
import tensorflow as tf
def residual_block(input_layer, output_channel):
input_channel = input_layer.get_shape().as_list()[-1]
if input_channel * 2 == output_channel:
increase_dim = True
strides = [2, 2]
elif input_channel == output_channel:
increase_dim = False
strides = [1, 1]
else:
raise ValueError('Output and input channel does not match in residual blocks!!!')
conv1 = tf.layers.conv2d(input_layer, output_channel, [3, 3], strides=strides, padding='same',
activation=tf.nn.relu, name='conv1')
conv2 = tf.layers.conv2d(conv1, output_channel, [3, 3], strides=[1, 1], padding='same',
activation=tf.nn.relu, name='conv2')
if increase_dim:
pooled_input = tf.layers.average_pooling2d(input_layer, [2, 2], strides=[2, 2], padding='valid')
padded_input = tf.pad(pooled_input, [[0, 0], [0, 0], [0, 0], [input_channel // 2, input_channel // 2]])
else:
padded_input = input_layer
output = conv2 + padded_input
return output
然后,我们将其用于搭建模型,如下所示:
def res_net(x, num_blocks, num_classes):
conv0 = tf.layers.conv2d(x, 16, [3, 3], strides=[1, 1], padding='same', activation=tf.nn.relu, name='conv0')
res_input = conv0
for i in range(num_blocks):
with tf.variable_scope('block{}'.format(i)):
res_output = residual_block(res_input, 16)
res_input = res_output
bn = tf.layers.batch_normalization(res_output, axis=3)
relu = tf.nn.relu(bn)
out = tf.layers.average_pooling2d(relu, [7, 7], strides=[7, 7], padding='same')
flatten = tf.layers.flatten(out)
logits = tf.layers.dense(flatten, num_classes)
return logits
在这个模型中,我们先做一次$3\times3$卷积作为输入层,然后经过若干个残差块,最后通过全局平均池化、全连接层得到输出。
模型训练与评估
接下来,我们将使用mnist数据集来训练模型:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
learning_rate = 0.01
training_epochs = 10
batch_size = 128
display_step = 1
tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
x_reshaped = tf.reshape(x, [-1, 28, 28, 1])
logits = res_net(x_reshaped, 1, 10)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples / batch_size)
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
_, c = sess.run([optimizer, cross_entropy], feed_dict={x: batch_xs, y: batch_ys})
avg_cost += c / total_batch
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch + 1), "cost=", \
"{:.9f}".format(avg_cost))
print("Optimization Finished!")
print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))
在上面的代码中,我们对模型进行10个epoch的训练,使用Adam优化器进行梯度下降,以交叉熵为损失函数,最后计算测试集上的准确率作为模型的评估指标。
参数调整
为了提升模型性能,我们可以针对不同的参数进行调整。例如,可以调整模型深度、宽度、学习率等超参数:
learning_rate = 0.05
training_epochs = 20
batch_size = 256
tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
x_reshaped = tf.reshape(x, [-1, 28, 28, 1])
logits = res_net(x_reshaped, 3, 10)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples / batch_size)
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
_, c = sess.run([optimizer, cross_entropy], feed_dict={x: batch_xs, y: batch_ys})
avg_cost += c / total_batch
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch + 1), "cost=", \
"{:.9f}".format(avg_cost))
print("Optimization Finished!")
print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))
我们将学习率调整为0.05、训练epoch调整为20、batch_size调整为256,此时测试准确率相对于之前提高了约1%。