1. YOLOV5超参数介绍
在深度学习训练中,超参数的设置非常重要。在YOLOV5的训练过程中,虽然有默认设置,但是根据自己的需求调整超参数可以大幅提高模型的精度。
1.1 Model
Model size是指网络的大小。该参数主要有三个值可选:small, medium, large。具体可通过修改yolov5/models/yolo.py的块数进行设置。下面是修改块数获得三个版本模型的代码:
# small
depth_multiple = 0.33 # 1.0 for large model,small是large的1/3
width_multiple = 0.50
backbone = [ # backbone要这样写才能让blocks生效
nn.Conv2d(3, 32, 3, 1, 1), # 0
nn.Conv2d(32, 64, 3, 2, 1), # 1-P1/2
nn.CELU(), # 2
SPP(), # 3
nn.Conv2d(320, 64, 1, 1), # 4
nn.Conv2d(64, 64, 3, 1, 1), # 5
nn.CELU(), # 6
nn.Conv2d(64, 32, 1, 1), # 7
], #含有8个块(block)
# medium
depth_multiple = 0.67 # 1.0 for large model,medium是large的2/3
width_multiple = 0.75
backbone = [ # backbone要这样写才能让blocks生效
nn.Conv2d(3, 32, 3, 1, 1), # 0
nn.Conv2d(32, 64, 3, 2, 1), # 1-P1/2
nn.CELU(), # 2
nn.Conv2d(64, 64, 3, 2, 1), # 3-P2/4
nn.CELU(), # 4
SPP(), # 5
nn.Conv2d(768, 128, 1, 1), # 6
nn.CELU(), # 7
nn.Conv2d(128, 64, 1, 1), # 8
], #含有9个块(block)
# large
depth_multiple = 1.0 # 1.0 for large model
width_multiple = 1.0
backbone = [ # backbone要这样写才能让blocks生效
nn.Conv2d(3, 32, 3, 1, 1), # 0
nn.Conv2d(32, 64, 3, 2, 1), # 1-P1/2
nn.CELU(), # 2
nn.Conv2d(64, 64, 3, 2, 1), # 3-P2/4
nn.CELU(), # 4
nn.Conv2d(64, 128, 3, 2, 1), # 5-P3/8
nn.CELU(), # 6
SPP(), # 7
nn.Conv2d(1536, 256, 1, 1), # 8
nn.CELU(), # 9
nn.Conv2d(256, 128, 1, 1), # 10
], #含有11个块(block)
其中,depth_multiple是层数的缩放因子,在保持模型像素大小不变的前提下增加总体的网络长度。而width_multiple是通道数的缩放因子,在保持模型的深度不变的前提下增加每层的通道数。
1.2 Train
Batch_size表示每一次输入的样本数量。第一次训练时可以设置为2,之后可以根据实际情况调整,训练中内存不够时可能需要减小batch size来保证模型正常训练。如果增加batch size,也可能会显著提高模型的精度。
Image_size表示训练使用的图像尺寸,一般情况下,YOLOv5训练时使用416 * 416,如果出现OOM等问题,可以降低图像尺寸。
Epochs表示训练的次数。YOLOV5默认的Epochs是300,但根据不同的数据集大小和模型大小可能会有所不同。通过减少Epoch或者使用预训练的方式,可以加速模型训练。
1.3 Test
iou_threshold是指IOU的阈值,当IOU大于该值时表示两个框重合,这时会把得分小的框去除掉。一般默认值为0.45即可。
Confidence_threshold是指置信度的阈值,该值越大,框的数量就越少,模型输出的精度则会相应提高。但是同时,精度也会受到一定程度的影响。一般默认值为0.25即可。
以上就是YOLOV5常用的超参数介绍。
2. 优化策略
针对以上介绍的超参数,我们可以通过以下优化策略获得更好的模型精度。
2.1 学习率
学习率是指在训练时,每一次梯度下降时调整的步长。学习率通常设置为小数,常规情况下默认初始值是0.001,在训练过程中可根据实际情况调整,调整范围为0.0001 - 0.1。
使用官方提供的learning rate scheduler,不同的stage采用不同的lr,代码如下:
hyp['lr0'] = 0.01 # lr0-初次训练,60Epochs
hyp['lrf'] = 1e-4 # final learning rate (下降到lrf)
hyp['momentum'] = 0.937 # SGD momentum
hyp['weight_decay'] = 0.0005 # optimizer weight decay,参数太大时会欠收敛
break_epochs = 0 #Epochs of break (early stopping),提前终止训练
warmup_epochs = min(round(float(hyp['warmup_epochs']) * epochs), max(3, round(0.05 * epochs))) # 热身,最大3-5%的epochs,其中调整lr
scheduler = GPTLR(optimizer, warmup_epochs=warmup_epochs, final_lr=hyp['lrf'], epochs=epochs,
n_batches=len(dataloader)), #官方learning rate scheduler
optimizer.zero_grad()
for epoch in range(start_epoch, epochs): # 循环epoch,分别进行训练
# Prints mAP after each epoch
if epoch == start_epoch or epoch % print_interval == 0:
results, maps = val.run(data_dict, batch_size=batch_size,
imgsz=imgsz_test,
model=model,
single_cls=single_cls,
dataloader=val_loader,
save_dir=save_dir,
plot_imgs=plot_imgs and epoch == epochs - 1,
dataset=test_set,
conf=conf_thres,
iou=iou_thres,
save_json=save_json,
verbose=verbose and not plot_imgs)
# Write epoch results
with open(results_file, 'a') as f:
f.write(s + '\n')
2.2 CutMix
CutMix指的是将多张不同图像的一部分割裂到另一张图像中,以合成新的训练数据。割裂的位置通过生成随机坐标决定。
在训练过程中增加套路,可以提高模型鲁棒性,同时也提高模型的学习效率。CutMix的代码如下:
#----------------------------------------------------------#
# util.py中
#----------------------------------------------------------#
class CutMix(object):
def __init__(self, cutmix_prob=1., cutmix_alpha=1., cutmix_beta=1.):
self.cutmix_prob = cutmix_prob
self.cutmix_alpha = cutmix_alpha
self.cutmix_beta = cutmix_beta
def forward(self, x, y):
r = np.random.rand(1)
if self.cutmix_prob > 0 and r < self.cutmix_prob:
if self.cutmix_beta > 0:
lam = np.random.beta(self.cutmix_alpha, self.cutmix_beta)
else:
lam = self.cutmix_alpha # default cutmix
rand_index = torch.randperm(x.size()[0]).cuda()
target_a = y
target_b = y[rand_index]
bbx1, bby1, bbx2, bby2 = rand_bbox(x.size(), lam)
x[:, :, bbx1:bbx2, bby1:bby2] = x[rand_index, :, bbx1:bbx2, bby1:bby2]
# compute the target area
lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (x.size()[-1] * x.size()[-2]))
return x, target_a, target_b, lam
else:
return x, y, y, 1.
def backward(self, criterion, pred, y, target_a, target_b, lam):
if self.cutmix_prob > 0 and lam < 1:
return lam * criterion(pred, target_a) + (1 - lam) * criterion(pred, target_b)
else:
return criterion(pred, y)
def rand_bbox(size, lam):
# cutmix算法
W = size[2]
H = size[3]
cut_rat = np.sqrt(1. - lam)
cut_w = np.int(W * cut_rat)
cut_h = np.int(H * cut_rat)
# uniform
cx = np.random.randint(W)
cy = np.random.randint(H)
bbx1 = np.clip(cx - cut_w // 2, 0, W)
bby1 = np.clip(cy - cut_h // 2, 0, H)
bbx2 = np.clip(cx + cut_w // 2, 0, W)
bby2 = np.clip(cy + cut_h // 2, 0, H)
return bbx1, bby1, bbx2, bby2
2.3 MixUp
MixUp是在模型训练时将不同图像按一定的比例混合,生成新的训练数据,提高模型的精度。
MixUp和CutMix相同,在训练代码实现中进行检测,如果需要使用MixUp或者CutMix时,将数据进行预处理即可。
2.4 Soft NMS
Soft NMS将NMS算法改进,NMS常用于目标检测中,因为目标检测中往往同一个目标会被检测出多次。所以,在NMS算法中,将两者的iou值即重叠度比较高的框做出一个iof值,如果iof值较大,会将得分小的那个框删除。而Soft NMS通过降低得分范围,在NMS执行过程中,根据处理框的置信度不同,适当降低阈值(confidence threshold)而非直接删除框,又因为较低阈值的参数范围较大,因此保留的框的数量会较多,使较小的目标也能被检测到。
对于Soft NMS,在yolov5中的代码实现如下:
class NMS(nn.Module):
def __init__(self, anchors, num_classes, conf_thres=0.1, nms_thres=0.6, ):
super(NMS, self).__init__()
self.anchors = torch.Tensor(anchors) / self.stride # scaling anchors
self.num_anchors = self.anchors.shape[0] # number of anchors
self.num_classes = num_classes # number of classes
self.ignore_thres = conf_thres
self.obj_thresh = conf_thres
self.nms_thresh = nms_thres
self.label_smooth_eps = 0.15 # label smoothing
self._build_extensions()
def forward(self, p, img_size, augment=False):
return non_max_suppression(p, self.conf_thres, self.nms_thresh, self.num_classes, self.anchors.to(p.device),
img_size, augment=augment, label_smoothing=self.label_smooth_eps)
def _build_extensions(self):
# ----------- NMS ----------
def xx(ltrb, conf, min_wh=None, multi_label=True, classes=None): # NMS globally across classes
default_anchors = [12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192,
243, 459, 401] # P3-P7
conf = conf.sigmoid().squeeze() if conf is not None else None
# if multi_label and conf is not None and conf.ndim > 1:
# pred, conf = pred[conf > self.conf_thres], conf[conf > self.conf_thres]
# if not len(conf): # no boxes
# return []
l, t, r, b = ltrb.unbind(1)
areas = (r - l) * (b - t)
if min_wh is not None: # limit anchor boxes to ignore areas with < min_wh pixels
mask = areas >= min_wh
l, t, r, b = l[mask], t[mask], r[mask], b[mask]
if conf is not None:
conf = conf[mask]
# if multi_label:
# pred = pred[mask]
areas = areas[mask]
if not len(areas): # no boxes
return []
if conf is None:
conf = torch.ones((len(l),), device=l.device)
# compute sort order and IoU
order = torch.argsort(conf) # order index by objectness
l = l[order]
t = t[order]
r = r[order]
b = b[order]
areas = areas[order]
# if multi_label:
# pred = pred[order]
if classes is not None: # filter by class
class_indices = (classes[order] == classes[:, None]).max(-1)[0] # binary mask
l = l[class_indices]
t = t[class_indices]
r = r[class_indices]
b = b[class_indices]
# if multi_label:
# pred = pred[class_indices]
areas = areas[class_indices]
iou = box_iou((l, t, r, b), (l, t, r, b)).tril_(diagonal=-1)
iou, indices = iou.sort(descending=True)