YOLOV5超参数介绍以及优化策略

1. YOLOV5超参数介绍

在深度学习训练中,超参数的设置非常重要。在YOLOV5的训练过程中,虽然有默认设置,但是根据自己的需求调整超参数可以大幅提高模型的精度。

1.1 Model

Model size是指网络的大小。该参数主要有三个值可选:small, medium, large。具体可通过修改yolov5/models/yolo.py的块数进行设置。下面是修改块数获得三个版本模型的代码:

# small

depth_multiple = 0.33 # 1.0 for large model,small是large的1/3

width_multiple = 0.50

backbone = [ # backbone要这样写才能让blocks生效

nn.Conv2d(3, 32, 3, 1, 1), # 0

nn.Conv2d(32, 64, 3, 2, 1), # 1-P1/2

nn.CELU(), # 2

SPP(), # 3

nn.Conv2d(320, 64, 1, 1), # 4

nn.Conv2d(64, 64, 3, 1, 1), # 5

nn.CELU(), # 6

nn.Conv2d(64, 32, 1, 1), # 7

], #含有8个块(block)

# medium

depth_multiple = 0.67 # 1.0 for large model,medium是large的2/3

width_multiple = 0.75

backbone = [ # backbone要这样写才能让blocks生效

nn.Conv2d(3, 32, 3, 1, 1), # 0

nn.Conv2d(32, 64, 3, 2, 1), # 1-P1/2

nn.CELU(), # 2

nn.Conv2d(64, 64, 3, 2, 1), # 3-P2/4

nn.CELU(), # 4

SPP(), # 5

nn.Conv2d(768, 128, 1, 1), # 6

nn.CELU(), # 7

nn.Conv2d(128, 64, 1, 1), # 8

], #含有9个块(block)

# large

depth_multiple = 1.0 # 1.0 for large model

width_multiple = 1.0

backbone = [ # backbone要这样写才能让blocks生效

nn.Conv2d(3, 32, 3, 1, 1), # 0

nn.Conv2d(32, 64, 3, 2, 1), # 1-P1/2

nn.CELU(), # 2

nn.Conv2d(64, 64, 3, 2, 1), # 3-P2/4

nn.CELU(), # 4

nn.Conv2d(64, 128, 3, 2, 1), # 5-P3/8

nn.CELU(), # 6

SPP(), # 7

nn.Conv2d(1536, 256, 1, 1), # 8

nn.CELU(), # 9

nn.Conv2d(256, 128, 1, 1), # 10

], #含有11个块(block)

其中,depth_multiple是层数的缩放因子,在保持模型像素大小不变的前提下增加总体的网络长度。而width_multiple是通道数的缩放因子,在保持模型的深度不变的前提下增加每层的通道数。

1.2 Train

Batch_size表示每一次输入的样本数量。第一次训练时可以设置为2,之后可以根据实际情况调整,训练中内存不够时可能需要减小batch size来保证模型正常训练。如果增加batch size,也可能会显著提高模型的精度。

Image_size表示训练使用的图像尺寸,一般情况下,YOLOv5训练时使用416 * 416,如果出现OOM等问题,可以降低图像尺寸。

Epochs表示训练的次数。YOLOV5默认的Epochs是300,但根据不同的数据集大小和模型大小可能会有所不同。通过减少Epoch或者使用预训练的方式,可以加速模型训练。

1.3 Test

iou_threshold是指IOU的阈值,当IOU大于该值时表示两个框重合,这时会把得分小的框去除掉。一般默认值为0.45即可。

Confidence_threshold是指置信度的阈值,该值越大,框的数量就越少,模型输出的精度则会相应提高。但是同时,精度也会受到一定程度的影响。一般默认值为0.25即可。

以上就是YOLOV5常用的超参数介绍。

2. 优化策略

针对以上介绍的超参数,我们可以通过以下优化策略获得更好的模型精度。

2.1 学习率

学习率是指在训练时,每一次梯度下降时调整的步长。学习率通常设置为小数,常规情况下默认初始值是0.001,在训练过程中可根据实际情况调整,调整范围为0.0001 - 0.1。

使用官方提供的learning rate scheduler,不同的stage采用不同的lr,代码如下:

hyp['lr0'] = 0.01 # lr0-初次训练,60Epochs

hyp['lrf'] = 1e-4 # final learning rate (下降到lrf)

hyp['momentum'] = 0.937 # SGD momentum

hyp['weight_decay'] = 0.0005 # optimizer weight decay,参数太大时会欠收敛

break_epochs = 0 #Epochs of break (early stopping),提前终止训练

warmup_epochs = min(round(float(hyp['warmup_epochs']) * epochs), max(3, round(0.05 * epochs))) # 热身,最大3-5%的epochs,其中调整lr

scheduler = GPTLR(optimizer, warmup_epochs=warmup_epochs, final_lr=hyp['lrf'], epochs=epochs,

n_batches=len(dataloader)), #官方learning rate scheduler

optimizer.zero_grad()

for epoch in range(start_epoch, epochs): # 循环epoch,分别进行训练

# Prints mAP after each epoch

if epoch == start_epoch or epoch % print_interval == 0:

results, maps = val.run(data_dict, batch_size=batch_size,

imgsz=imgsz_test,

model=model,

single_cls=single_cls,

dataloader=val_loader,

save_dir=save_dir,

plot_imgs=plot_imgs and epoch == epochs - 1,

dataset=test_set,

conf=conf_thres,

iou=iou_thres,

save_json=save_json,

verbose=verbose and not plot_imgs)

# Write epoch results

with open(results_file, 'a') as f:

f.write(s + '\n')

2.2 CutMix

CutMix指的是将多张不同图像的一部分割裂到另一张图像中,以合成新的训练数据。割裂的位置通过生成随机坐标决定。

在训练过程中增加套路,可以提高模型鲁棒性,同时也提高模型的学习效率。CutMix的代码如下:

#----------------------------------------------------------#

# util.py中

#----------------------------------------------------------#

class CutMix(object):

def __init__(self, cutmix_prob=1., cutmix_alpha=1., cutmix_beta=1.):

self.cutmix_prob = cutmix_prob

self.cutmix_alpha = cutmix_alpha

self.cutmix_beta = cutmix_beta

def forward(self, x, y):

r = np.random.rand(1)

if self.cutmix_prob > 0 and r < self.cutmix_prob:

if self.cutmix_beta > 0:

lam = np.random.beta(self.cutmix_alpha, self.cutmix_beta)

else:

lam = self.cutmix_alpha # default cutmix

rand_index = torch.randperm(x.size()[0]).cuda()

target_a = y

target_b = y[rand_index]

bbx1, bby1, bbx2, bby2 = rand_bbox(x.size(), lam)

x[:, :, bbx1:bbx2, bby1:bby2] = x[rand_index, :, bbx1:bbx2, bby1:bby2]

# compute the target area

lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (x.size()[-1] * x.size()[-2]))

return x, target_a, target_b, lam

else:

return x, y, y, 1.

def backward(self, criterion, pred, y, target_a, target_b, lam):

if self.cutmix_prob > 0 and lam < 1:

return lam * criterion(pred, target_a) + (1 - lam) * criterion(pred, target_b)

else:

return criterion(pred, y)

def rand_bbox(size, lam):

# cutmix算法

W = size[2]

H = size[3]

cut_rat = np.sqrt(1. - lam)

cut_w = np.int(W * cut_rat)

cut_h = np.int(H * cut_rat)

# uniform

cx = np.random.randint(W)

cy = np.random.randint(H)

bbx1 = np.clip(cx - cut_w // 2, 0, W)

bby1 = np.clip(cy - cut_h // 2, 0, H)

bbx2 = np.clip(cx + cut_w // 2, 0, W)

bby2 = np.clip(cy + cut_h // 2, 0, H)

return bbx1, bby1, bbx2, bby2

2.3 MixUp

MixUp是在模型训练时将不同图像按一定的比例混合,生成新的训练数据,提高模型的精度。

MixUp和CutMix相同,在训练代码实现中进行检测,如果需要使用MixUp或者CutMix时,将数据进行预处理即可。

2.4 Soft NMS

Soft NMS将NMS算法改进,NMS常用于目标检测中,因为目标检测中往往同一个目标会被检测出多次。所以,在NMS算法中,将两者的iou值即重叠度比较高的框做出一个iof值,如果iof值较大,会将得分小的那个框删除。而Soft NMS通过降低得分范围,在NMS执行过程中,根据处理框的置信度不同,适当降低阈值(confidence threshold)而非直接删除框,又因为较低阈值的参数范围较大,因此保留的框的数量会较多,使较小的目标也能被检测到。

对于Soft NMS,在yolov5中的代码实现如下:

class NMS(nn.Module):

def __init__(self, anchors, num_classes, conf_thres=0.1, nms_thres=0.6, ):

super(NMS, self).__init__()

self.anchors = torch.Tensor(anchors) / self.stride # scaling anchors

self.num_anchors = self.anchors.shape[0] # number of anchors

self.num_classes = num_classes # number of classes

self.ignore_thres = conf_thres

self.obj_thresh = conf_thres

self.nms_thresh = nms_thres

self.label_smooth_eps = 0.15 # label smoothing

self._build_extensions()

def forward(self, p, img_size, augment=False):

return non_max_suppression(p, self.conf_thres, self.nms_thresh, self.num_classes, self.anchors.to(p.device),

img_size, augment=augment, label_smoothing=self.label_smooth_eps)

def _build_extensions(self):

# ----------- NMS ----------

def xx(ltrb, conf, min_wh=None, multi_label=True, classes=None): # NMS globally across classes

default_anchors = [12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192,

243, 459, 401] # P3-P7

conf = conf.sigmoid().squeeze() if conf is not None else None

# if multi_label and conf is not None and conf.ndim > 1:

# pred, conf = pred[conf > self.conf_thres], conf[conf > self.conf_thres]

# if not len(conf): # no boxes

# return []

l, t, r, b = ltrb.unbind(1)

areas = (r - l) * (b - t)

if min_wh is not None: # limit anchor boxes to ignore areas with < min_wh pixels

mask = areas >= min_wh

l, t, r, b = l[mask], t[mask], r[mask], b[mask]

if conf is not None:

conf = conf[mask]

# if multi_label:

# pred = pred[mask]

areas = areas[mask]

if not len(areas): # no boxes

return []

if conf is None:

conf = torch.ones((len(l),), device=l.device)

# compute sort order and IoU

order = torch.argsort(conf) # order index by objectness

l = l[order]

t = t[order]

r = r[order]

b = b[order]

areas = areas[order]

# if multi_label:

# pred = pred[order]

if classes is not None: # filter by class

class_indices = (classes[order] == classes[:, None]).max(-1)[0] # binary mask

l = l[class_indices]

t = t[class_indices]

r = r[class_indices]

b = b[class_indices]

# if multi_label:

# pred = pred[class_indices]

areas = areas[class_indices]

iou = box_iou((l, t, r, b), (l, t, r, b)).tril_(diagonal=-1)

iou, indices = iou.sort(descending=True)