Changing default padding

Hi, can someone confirm if I’m correctly modifying the YOLO convolution module? I’m editing the ultralytics/nn/modules/conv.py file directly (inside the modules folder) to change how Conv blocks behave. Is that the proper way to apply custom changes to YOLO layers?

ChatGPT recommend the following, changing from the default:

def \__init_\_(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):

    """

    Initialize Conv layer with given parameters.



    Args:

        c1 (int): Number of input channels.

        c2 (int): Number of output channels.

        k (int): Kernel size.

        s (int): Stride.

        p (int, optional): Padding.

        g (int): Groups.

        d (int): Dilation.

        act (bool | nn.Module): Activation function.

    """

    super().\__init_\_()

    self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)

    self.bn = nn.BatchNorm2d(c2)

    self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()



def forward(self, x):

    """

    Apply convolution, batch normalization and activation to input tensor.



    Args:

        x (torch.Tensor): Input tensor.



    Returns:

        (torch.Tensor): Output tensor.

    """

    return self.act(self.bn(self.conv(x)))



def forward_fuse(self, x):

    """

    Apply convolution and activation without batch normalization.



    Args:

        x (torch.Tensor): Input tensor.



    Returns:

        (torch.Tensor): Output tensor.

    """

    return self.act(self.conv(x))

to:

def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
    super().__init__()
    # amount of padding we need to add manually
    self.pad = autopad(k, p, d)  # int or list[int] for asymmetric kernels
    # set Conv2d padding=0; we will pad explicitly in forward()
    self.conv = nn.Conv2d(c1, c2, k, s, padding=0, groups=g, dilation=d, bias=False)
    self.bn = nn.BatchNorm2d(c2)
    self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

def _reflect_pad(self, x):
    # support int or [px_l, px_r, py_t, py_b] (autopad returns int for common cases)
    if isinstance(self.pad, int):
        if self.pad:
            x = F.pad(x, [self.pad, self.pad, self.pad, self.pad], mode="reflect")
    else:
        px = self.pad
        # px is [pad_x, pad_y] or [l, r, t, b] depending on upstream; normalize to 4-tuple
        if len(px) == 2:
            l = r = px[0] // 2
            t = b = px[1] // 2
        else:
            l, r, t, b = px[0], px[1], px[2], px[3]
        if any(v > 0 for v in (l, r, t, b)):
            x = F.pad(x, [l, r, t, b], mode="reflect")
    return x

def forward(self, x):
    x = self._reflect_pad(x)
    return self.act(self.bn(self.conv(x)))

def forward_fuse(self, x):
    x = self._reflect_pad(x)
    return self.act(self.conv(x))

My main objective is to improve model performance on border objects. Is this the correct way to change the padding and do you think my strategy could yield good results?

Thank you

Before modifying the convolution padding, I would suspect there could be other, simpler tactics that you could test first. What have you tried so far?

I am doing a master thesis so I am basically exploring as many options as I can, it is not for practical purposes, just for research. But if it returns interesting results that would be great haha.

But I have tried mostly specitic data augmentations, zone-specific.

In the future I will also be trying to change the loss function to deal with certain bboxes differently and I will for sure ask many questions for that too.

Any help is very appreciated haha :slight_smile:

So I will be upfront and say I don’t know a lot about modifying these parameters. Since that’s the case, I threw your changes into an LLM (a local one) and it seemed to think that the changes were reasonable, but that it would add some computational overhead. It suspects that it could help improve feature representation by better preserving texture or edge information than zero-padding.

That said, I would probably try just increasing the image resolution for inference as a start. If you’re using the default imgsz=640 you could experiments where you increase the resolution to find the best setting for your use case. You could also try using tiled inference with SAHI

as it could help better preserve edge information. Another thing you could try is copying the edge pixels of the input image during a pre-processing step and keeping the default-zero padding. It’s not the most ideal solution, but it’s something that you could test if you wanted.

Thank you so much for all the information! I will definitely look into every possibility you have suggested. Can you just please explain a bit deeper the strategy of “copying the edge pixels of the input image during a pre-processing step”

You can use the OpenCV cv2.copyMakeBorder function with the cv2.BORDER_REPLICATE border type. This will add one extra row/column of pixels to the image, duplicating the adjacent pixel values.

The example from the OpenCV docs:

# Input image pixels:
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
 
 
# Border type: cv2.ORDER_REPLICATE
[[ 0  0  0  1  2  3  4  5  5  5]
 [ 0  0  0  1  2  3  4  5  5  5]
 [ 0  0  0  1  2  3  4  5  5  5]
 [ 6  6  6  7  8  9 10 11 11 11]
 [12 12 12 13 14 15 16 17 17 17]
 [18 18 18 19 20 21 22 23 23 23]
 [18 18 18 19 20 21 22 23 23 23]
 [18 18 18 19 20 21 22 23 23 23]]
1 Like