keras 实现轻量级网络ShuffleNet教程

一、ShuffleNet介绍

ShuffleNet是一种轻量级的卷积神经网络结构,具有低延迟和高准确率的优点。相比于传统的网络结构,它采用了一种新的模块化思想——Channel Shuffle,可以减少参数和计算量,同时提高准确率。

Channel Shuffle的思想就是把通道进行混洗,图片中不同的通道可以组合起来变成新的通道,从而减少参数量和计算量。在ShuffleNet中,使用了两种模块化结构:shuffle unit和 point-wise group convolution。

1. Shuffle Unit

Shuffle Unit是ShuffleNet最核心的模块,它也是通过Channel Shuffle的方式来减少参数量和计算量。Shuffle Unit的结构包括三个部分:

分组卷积

通道混洗

点积卷积

其中分组卷积和点积卷积的作用和传统的卷积神经网络一样,不同的是通道混洗的步骤,通道混洗的实现分为两步:第一步是把不同的通道分成几个小组,同时每组中的通道数目相等;第二步就是把每个小组中的数据进行混洗后再合并到一起。

2. Point-wise Group Convolution

Point-wise Group Convolution是指采用1x1卷积来提取特征,同时把1x1卷积分成几个小组进行计算,从而减少参数量和计算量。

二、Keras实现ShuffleNet

在Keras中实现ShuffleNet的网络结构,需要借助于TensorFlow backend。下面将给出一份ShuffleNet的代码实现。

from keras import backend as K

from keras.layers import Input, Conv2D, Concatenate, DepthwiseConv2D, BatchNormalization, ZeroPadding2D, Add, ReLU, GlobalAveragePooling2D

from keras.models import Model

def channel_shuffle(x, groups):

height, width, in_channels = x.shape.as_list()[1:]

channels_per_group = in_channels // groups

# reshape

x = K.reshape(x, [-1, height, width, groups, channels_per_group])

# transpose。将x变成(?, height, width, channels_per_group, groups)的形状。

x = K.permute_dimensions(x, (0, 1, 2, 4, 3))

# reshape。将x变成(?, height, width, in_channels)的形状。

x = K.reshape(x, [-1, height, width, in_channels])

return x

def shuffle_unit(inputs, out_channels, strides=2, bottleneck_ratio=1, groups=1, stage=1, block=1):

prefix = f'stage{stage}_block{block}_'

bottleneck_channels = int(out_channels * bottleneck_ratio)

if K.image_data_format() == 'channels_last': # tf

bn_axis = 3

else:

bn_axis = 1

# 分组卷积和通道混洗。就是分为group组,然后把每组的通道数平均分给每个数据。

x = Conv2D(bottleneck_channels, (1, 1), strides=1, padding='same',

use_bias=False, name=prefix + 'gconv1')(inputs)

x = BatchNormalization(axis=bn_axis, name=prefix + 'bn1')(x)

x = ReLU(name=prefix + 'relu1')(x)

x = channel_shuffle(x, groups)

# 深度卷积和点积卷积。深度卷积(DepthwiseConv2D)和点积卷积(Conv2D with 1x1 filter)。

x = DepthwiseConv2D((3, 3), strides=strides, padding='same', use_bias=False,

name=prefix + 'dwconv')(x)

x = BatchNormalization(axis=bn_axis, name=prefix + 'bn2')(x)

x = Conv2D(out_channels, (1, 1), strides=1, padding='same',

use_bias=False, name=prefix + 'gconv2')(x)

x = BatchNormalization(axis=bn_axis, name=prefix + 'bn3')(x)

x = ReLU(name=prefix + 'relu2')(x)

if strides == 2:

inputs = ZeroPadding2D(padding=((0, 1), (0, 1)), name=prefix + 'zeropad')(inputs)

input_channels = inputs.shape.as_list()[bn_axis]

if strides == 2 and input_channels != out_channels:

inputs = Conv2D(out_channels, (1, 1), strides=1, padding='same',

use_bias=False, name=prefix + 'skip_conv')(inputs)

inputs = BatchNormalization(axis=bn_axis, name=prefix + 'skip_bn')(inputs)

out = Add(name=prefix + 'add')([x, inputs])

return out

def shuffle_netV2(input_shape=(224, 224, 3), scale_factor=1.0, num_classes=1000):

inputs = Input(shape=input_shape)

if K.image_data_format() == 'channels_last': # tf

bn_axis = 3

else:

bn_axis = 1

first_filters = int(24 * scale_factor)

x = Conv2D(first_filters, (3, 3), strides=2, padding='same',

use_bias=False, name='conv1')(inputs)

x = BatchNormalization(axis=bn_axis, name='bn1')(x)

x = ReLU(name='relu1')(x)

x = MaxPooling2D(pool_size=3, strides=2, padding='same', name='maxpool1')(x)

x = shuffle_unit(x, out_channels=int(48 * scale_factor), strides=2, groups=3, stage=2, block=1)

x = shuffle_unit(x, out_channels=int(96 * scale_factor), strides=2, groups=3, stage=2, block=2)

x = shuffle_unit(x, out_channels=int(192 * scale_factor), strides=2, groups=3, stage=2, block=3)

x = shuffle_unit(x, out_channels=int(384 * scale_factor), strides=2, groups=3, stage=3, block=1)

x = shuffle_unit(x, out_channels=int(576 * scale_factor), strides=2, groups=3, stage=3, block=2)

x = shuffle_unit(x, out_channels=int(960 * scale_factor), strides=2, groups=3, stage=3, block=3)

x = Conv2D(int(1024 * scale_factor), (1, 1), strides=1, padding='same',

use_bias=False, name='conv_last')(x)

x = BatchNormalization(axis=bn_axis, name='conv_last_bn')(x)

x = ReLU(name='conv_last_relu')(x)

x = GlobalAveragePooling2D(name='global_avg_pool')(x)

x = Dense(num_classes, activation='softmax', name='fc')(x)

model = Model(inputs=inputs, outputs=x, name='ShuffleNetV2')

return model

三、总结

通过本文可以看出,ShuffleNet是一种轻量级的卷积神经网络结构,具有低延迟和高准确率的优点。通过使用Channel Shuffle和Point-wise Group Convolution技术,可以减少参数和计算量,同时提高准确率。在Keras中实现ShuffleNet的过程中,需要借助于TensorFlow backend,使用卷积层和BatchNormalization层就可以实现ShuffleNet的结构。

后端开发标签