新闻中心

轻量级人像分割模型:SINet 和 ExtremeC3Net

2025-07-31
浏览次数:
返回列表
本文介绍SINet和ExtremeC3Net两个轻量级人像分割模型,二者参数分别为0.087M、0.038M,Flop为0.064G、0.128G。可通过PaddleHub快速调用,也能基于PaddleInference推理部署,并给出了Paddle2.0上的实现代码。

☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

轻量级人像分割模型:sinet 和 extremec3net -

Motiff妙多 Motiff妙多

Motiff妙多是一款AI驱动的界面设计工具,定位为“AI时代设计工具”

Motiff妙多 334 查看详情 Motiff妙多

引入

  • 随着算力和算法的不断提升,能够训练的模型也越来越大了,当然精度也越来越高了
  • 不过过于巨大的模型也带来了部署上的不便
  • 今天就介绍两个轻量级的人像分割模型:SINet 和 ExtremeC3Net

项目说明

  • 项目模型转换至开源项目ext_portrait_segmentation
  • 感谢上述项目提供的开源代码和模型

模型规格

  • 具体的模型规格如下表:
    model Param Flop
    SINet 0.087 M 0.064 G
    ExtremeC3 0.038 M 0.128 G
  • 可以看出这两个模型算是相当轻量的了

效果展示

  • ExtremeC3Net:

轻量级人像分割模型:SINet 和 ExtremeC3Net -        

  • SINet:

轻量级人像分割模型:SINet 和 ExtremeC3Net -        

快速使用

  • 按照惯例已经将两个模型封装为PaddleHub Module
  • 可通过PaddleHub进行快速调用
In [1]
!pip install paddlehub==2.0.0b2
    In [9]
# 导入PaddleHubimport paddlehub as hub# 加载模型# 模型可选:SINet_Portrait_Segmentation 和 ExtremeC3_Portrait_Segmentationmodel = hub.Module(directory='SINet_Portrait_Segmentation')# 人像分割outputs = model.Segmentation(images=None,
                       paths=['00001.jpg'],
                       batch_size=1,
                       output_dir='output',
                       visualization=True)# 结果显示%matplotlib inlineimport cv2import numpy as npimport matplotlib.pyplot as plt

img = np.concatenate([
    cv2.imread('00001.jpg'),
    cv2.cvtColor(outputs[0]['mask'], cv2.COLOR_GRAY2BGR),
    outputs[0]['result']
], 1)
plt.axis('off')
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.show()
       
<Figure size 432x288 with 1 Axes>
                In [3]
# 导入PaddleHubimport paddlehub as hub# 加载模型# 模型可选:SINet_Portrait_Segmentation 和 ExtremeC3_Portrait_Segmentationmodel = hub.Module(directory='ExtremeC3_Portrait_Segmentation')# 人像分割outputs = model.Segmentation(images=None,
                       paths=['00001.jpg'],
                       batch_size=1,
                       output_dir='output',
                       visualization=True)# 结果显示%matplotlib inlineimport cv2import numpy as npimport matplotlib.pyplot as plt

img = np.concatenate([
    cv2.imread('00001.jpg'),
    cv2.cvtColor(outputs[0]['mask'], cv2.COLOR_GRAY2BGR),
    outputs[0]['result']
], 1)
plt.axis('off')
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.show()
       
<Figure size 432x288 with 1 Axes>
               

推理部署

  • 除了使用PaddleHub一键调用之外,当然也可以使用推理模型进行推理部署
  • 接下来简单介绍一下如何基于PaddleInference完成推理部署
  • 更多详情可以参考我的另一个项目:PaddleQuickInference
In [4]
# 安装PaddleQuickInference!pip install ppqi -i https://pypi.python.org/simple
    In [5]
from ppqi import InferenceModelfrom SINet.processor import preprocess, postprocess# 参数配置configs = {    'img_path': '00001.jpg',    's*e_dir': 's*e_img',    'model_name': 'SINet_Portrait_Segmentation',    'use_gpu': False,    'use_mkldnn': False}# 第一步:数据预处理input_data = preprocess(configs['img_path'])# 第二步:加载模型model = InferenceModel(
    modelpath='SINet/'+configs['model_name'], 
    use_gpu=configs['use_gpu'], 
    use_mkldnn=configs['use_mkldnn']
)
model.eval()# 第三步:模型推理output = model(input_data)# 第四步:结果后处理postprocess(
    output, 
    configs['s*e_dir'],
    configs['img_path'],
    configs['model_name']
)
    In [6]
from ppqi import InferenceModelfrom ExtremeC3Net.processor import preprocess, postprocess# 参数配置configs = {    'img_path': '00001.jpg',    's*e_dir': 's*e_img',    'model_name': 'ExtremeC3_Portrait_Segmentation',    'use_gpu': False,    'use_mkldnn': False}# 第一步:数据预处理input_data = preprocess(configs['img_path'])# 第二步:加载模型model = InferenceModel(
    modelpath='ExtremeC3Net/'+configs['model_name'], 
    use_gpu=configs['use_gpu'], 
    use_mkldnn=configs['use_mkldnn']
)
model.eval()# 第三步:模型推理output = model(input_data)# 第四步:结果后处理postprocess(
    output, 
    configs['s*e_dir'],
    configs['img_path'],
    configs['model_name']
)
   

模型实现

  • 接下来再介绍一下如何在Paddle2.0上实现这两个模型吧
  • 代码上与原项目的Pytorch代码是相似的
  • 针对框架之间的差异将其中的一些算子做了替换
  • 具体详情请参考下方代码
In [7]
# model/sinet.py'''
ExtPortraitSeg
Copyright (c) 2019-present N*ER Corp.
MIT license
'''import paddleimport paddle.nn as nn

BN_moment = 0.1def channel_shuffle(x, groups):
    batchsize, num_channels, height, width = x.shape

    channels_per_group = num_channels // groups    # reshape
    x = x.reshape([batchsize, groups,
               channels_per_group, height, width])    # transpose
    x = paddle.transpose(x, [0, 2, 1, 3, 4])    
    # flatten
    x = x.reshape([batchsize, groups*channels_per_group, height, width])    return xclass CBR(nn.Layer):
    '''
    This class defines the convolution layer with batch normalization and PReLU activation
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: stride rate for down-sampling. Default is 1
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)

        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum=BN_moment)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)
        output = self.act(output)        return outputclass separableCBR(nn.Layer):
    '''
    This class defines the convolution layer with batch normalization and PReLU activation
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: stride rate for down-sampling. Default is 1
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)

        self.conv = nn.Sequential(
            nn.Conv2D(nIn, nIn, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn, bias_attr=False),
            nn.Conv2D(nIn, nOut,  kernel_size=1, stride=1, bias_attr=False),
        )
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum= BN_moment)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)
        output = self.act(output)        return outputclass SqueezeBlock(nn.Layer):
    def __init__(self, exp_size, divide=4.0):
        super(SqueezeBlock, self).__init__()        if divide > 1:
            self.dense = nn.Sequential(
                nn.Linear(exp_size, int(exp_size / divide)),
                nn.PReLU(int(exp_size / divide)),
                nn.Linear(int(exp_size / divide), exp_size),
                nn.PReLU(exp_size),
            )        else:
            self.dense = nn.Sequential(
                nn.Linear(exp_size, exp_size),
                nn.PReLU(exp_size)
            )    def forward(self, x):
        batch, channels, height, width = x.shape
        out = paddle.nn.functional.*g_pool2d(x, kernel_size=[height, width]).reshape([batch, channels])
        
        out = self.dense(out)
        out = out.reshape([batch, channels, 1, 1])        return paddle.multiply(out, x)class SEseparableCBR(nn.Layer):
    '''
    This class defines the convolution layer with batch normalization and PReLU activation
    '''

    def __init__(self, nIn, nOut, kSize, stride=1, divide=2.0):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: stride rate for down-sampling. Default is 1
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)

        self.conv = nn.Sequential(
            nn.Conv2D(nIn, nIn, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn, bias_attr=False),
            SqueezeBlock(nIn, divide=divide),
            nn.Conv2D(nIn, nOut,  kernel_size=1, stride=1, bias_attr=False),
        )

        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum= BN_moment)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)
        output = self.act(output)        return outputclass BR(nn.Layer):
    '''
        This class groups the batch normalization and PReLU activation
    '''

    def __init__(self, nOut):
        '''
        :param nOut: output feature maps
        '''
        super().__init__()
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum= BN_moment)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        '''
        :param input: input feature map
        :return: normalized and thresholded feature map
        '''
        output = self.bn(input)
        output = self.act(output)        return outputclass CB(nn.Layer):
    '''
       This class groups the convolution and batch normalization
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optinal stide for down-sampling
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum= BN_moment)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)        return outputclass C(nn.Layer):
    '''
    This class is for a convolutional layer.
    '''

    def __init__(self, nIn, nOut, kSize, stride=1,group=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optional stride rate for down-sampling
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride,
                              padding=(padding, padding), bias_attr=False, groups=group)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)        return outputclass S2block(nn.Layer):
    '''
    This class defines the dilated convolution.
    '''

    def __init__(self, nIn, nOut, config):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optional stride rate for down-sampling
        :param d: optional dilation rate
        '''
        super().__init__()
        kSize = config[0]
        *gsize = config[1]

        self.resolution_down = False
        if *gsize >1:
            self.resolution_down = True
            self.down_res = nn.AvgPool2D(*gsize, *gsize)
            self.up_res = nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=*gsize)
            self.*gsize = *gsize

        padding = int((kSize - 1) / 2 )
        self.conv = nn.Sequential(
                        nn.Conv2D(nIn, nIn, kernel_size=(kSize, kSize), stride=1,
                                  padding=(padding, padding), groups=nIn, bias_attr=False),
                        nn.BatchNorm2D(nIn, epsilon=1e-03, momentum=BN_moment))

        self.act_conv1x1 = nn.Sequential(
            nn.PReLU(nIn),
            nn.Conv2D(nIn, nOut, kernel_size=1, stride=1, bias_attr=False),
        )

        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum=BN_moment)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        if self.resolution_down:            input = self.down_res(input)
        output = self.conv(input)
        output = self.act_conv1x1(output)        if self.resolution_down:
            output = self.up_res(output)        return self.bn(output)class S2module(nn.Layer):
    '''
    This class defines the ESP block, which is based on the following principle
        Reduce ---> Split ---> Transform --> Merge
    '''

    def __init__(self, nIn, nOut, add=True, config= [[3,1],[5,1]]):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param add: if true, add a residual connection through identity operation. You can use projection too as
                in ResNet paper, but we *oid to use it if the dimensions are not the same because we do not want to
                increase the module complexity
        '''
        super().__init__()

        group_n = len(config)
        n = int(nOut / group_n)
        n1 = nOut - group_n * n

        self.c1 = C(nIn, n, 1, 1, group=group_n)        for i in range(group_n):
            var_name = 'd{}'.format(i + 1)            if i == 0:
                self.__dict__["_sub_layers"][var_name] = S2block(n, n + n1, config[i])            else:
                self.__dict__["_sub_layers"][var_name] = S2block(n, n,  config[i])

        self.BR = BR(nOut)
        self.add = add
        self.group_n = group_n    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        # reduce
        output1 = self.c1(input)
        output1= channel_shuffle(output1, self.group_n)        for i in range(self.group_n):
            var_name = 'd{}'.format(i + 1)
            result_d = self.__dict__["_sub_layers"][var_name](output1)            if i == 0:
                combine = result_d            else:
                combine = paddle.concat([combine, result_d], 1)        # if residual version
        if self.add:
            combine = paddle.add(input, combine)
        output = self.BR(combine)        return outputclass InputProjectionA(nn.Layer):
    '''
    This class projects the input image to the same spatial dimensions as the feature map.
    For example, if the input image is 512 x512 x3 and spatial dimensions of feature map size are 56x56xF, then
    this class will generate an output of 56x56x3
    '''

    def __init__(self, samplingTimes):
        '''
        :param samplingTimes: The rate at which you want to down-sample the image
        '''
        super().__init__()
        self.pool = nn.LayerList()        for i in range(0, samplingTimes):
            self.pool.append(nn.AvgPool2D(2, stride=2))    def forward(self, input):
        '''
        :param input: Input RGB Image
        :return: down-sampled image (pyramid-based approach)
        '''
        for pool in self.pool:            input = pool(input)        return inputclass SINet_Encoder(nn.Layer):

    def __init__(self, config,classes=20, p=5, q=3,  chnn=1.0):
        '''
        :param classes: number of classes in the dataset. Default is 20 for the cityscapes
        :param p: depth multiplier
        :param q: depth multiplier
        '''
        super().__init__()
        dim1 = 16
        dim2 = 48 + 4 * (chnn - 1)
        dim3 = 96 + 4 * (chnn - 1)

        self.level1 = CBR(3, 12, 3, 2)

        self.level2_0 = SEseparableCBR(12,dim1, 3,2, divide=1)

        self.level2 = nn.LayerList()        for i in range(0, p):            if i ==0:
                self.level2.append(S2module(dim1, dim2, config=config[i], add=False))            else:
                self.level2.append(S2module(dim2, dim2,config=config[i]))
        self.BR2 = BR(dim2+dim1)

        self.level3_0 =SEseparableCBR(dim2+dim1,dim2, 3,2, divide=2)
        self.level3 = nn.LayerList()        for i in range(0, q):            if i==0:
                self.level3.append(S2module(dim2, dim3, config=config[2 + i], add=False))            else:
                self.level3.append(S2module(dim3, dim3,config=config[2+i]))
        self.BR3 = BR(dim3+dim2)

        self.classifier = C(dim3+dim2, classes, 1, 1)    def forward(self, input):
        '''
        :param input: Receives the input RGB image
        :return: the transformed feature map with spatial dimensions 1/8th of the input image
        '''
        output1 = self.level1(input) #8h 8w


        output2_0 = self.level2_0(output1)  # 4h 4w

        for i, layer in enumerate(self.level2):            if i == 0:
                output2 = layer(output2_0)            else:
                output2 = layer(output2) # 2h 2w


        output3_0 = self.level3_0(self.BR2(paddle.concat([output2_0, output2],1)))  # h w

        for i, layer in enumerate(self.level3):            if i == 0:
                output3 = layer(output3_0)            else:
                output3 = layer(output3)

        output3_cat = self.BR3(paddle.concat([output3_0, output3], 1))

        classifier = self.classifier(output3_cat)        return classifierclass SINet(nn.Layer):

    def __init__(self,config, classes=20, p=2, q=3, chnn=1.0):
        '''
        :param classes: number of classes in the dataset. Default is 20 for the cityscapes
        :param p: depth multiplier
        :param q: depth multiplier
        '''
        super().__init__()
        dim2 = 48 + 4 * (chnn - 1)

        self.encoder = SINet_Encoder(config, classes, p, q, chnn)

        self.up = nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=2)
        self.bn_3 = nn.BatchNorm2D(classes, epsilon=1e-03)

        self.level2_C = CBR(dim2, classes, 1, 1)

        self.bn_2 = nn.BatchNorm2D(classes, epsilon=1e-03)

        self.classifier = nn.Sequential(
        nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=2),
        nn.Conv2D(classes, classes, 3, 1, 1, bias_attr=False))    def forward(self, input):
        '''
        :param input: RGB image
        :return: transformed feature map
        '''
        output1 = self.encoder.level1(input)  # 8h 8w
        output2_0 = self.encoder.level2_0(output1)  # 4h 4w

        for i, layer in enumerate(self.encoder.level2):            if i == 0:
                output2 = layer(output2_0)            else:
                output2 = layer(output2)  # 2h 2w

        output3_0 = self.encoder.level3_0(self.encoder.BR2(paddle.concat([output2_0, output2], 1)))  # h w

        for i, layer in enumerate(self.encoder.level3):            if i == 0:
                output3 = layer(output3_0)            else:
                output3 = layer(output3)

        output3_cat = self.encoder.BR3(paddle.concat([output3_0, output3], 1))
        Enc_final = self.encoder.classifier(output3_cat) #1/8

        Dnc_stage1 = self.bn_3(self.up(Enc_final))  # 1/4
        stage1_confidence = paddle.max(nn.functional.softmax(Dnc_stage1, 1), axis=1)

        b, c, h, w = Dnc_stage1.shape
        stage1_gate = (1-stage1_confidence).unsqueeze(1).expand([b, c, h, w])

        Dnc_stage2_0 = self.level2_C(output2)  # 2h 2w
        Dnc_stage2 = self.bn_2(self.up(paddle.add(paddle.multiply(Dnc_stage2_0, stage1_gate), (Dnc_stage1))))  # 4h 4w

        classifier = self.classifier(Dnc_stage2)        return classifier
    In [8]
# model/extremeC3.py'''
ExtPortraitSeg
Copyright (c) 2019-present N*ER Corp.
MIT license
'''import paddleimport paddle.nn as nn

basic_0 = 24basic_1 = 48basic_2 = 56basic_3 = 24class CBR(nn.Layer):
    '''
    This class defines the convolution layer with batch normalization and PReLU activation
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: stride rate for down-sampling. Default is 1
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)        # self.conv = nn.Conv2D(nIn, nOut, kSize, stride=stride, padding=padding, bias_attr=False)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)        # self.conv1 = nn.Conv2D(nOut, nOut, (1, kSize), stride=1, padding=(0, padding), bias_attr=False)
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03)
        self.act = nn.PReLU(nOut)        # self.act = nn.ReLU()


    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)        # output = self.conv1(output)
        output = self.bn(output)
        output = self.act(output)        return outputclass BR(nn.Layer):
    '''
        This class groups the batch normalization and PReLU activation
    '''

    def __init__(self, nOut):
        '''
        :param nOut: output feature maps
        '''
        super().__init__()
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03)
        self.act = nn.PReLU(nOut)        # self.act = nn.ReLU()

    def forward(self, input):
        '''
        :param input: input feature map
        :return: normalized and thresholded feature map
        '''
        output = self.bn(input)
        output = self.act(output)        return outputclass CB(nn.Layer):
    '''
       This class groups the convolution and batch normalization
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optinal stide for down-sampling
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)        return outputclass C(nn.Layer):
    '''
    This class is for a convolutional layer.
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optional stride rate for down-sampling
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)        return outputclass C3block(nn.Layer):
    '''
    This class defines the dilated convolution.
    '''

    def __init__(self, nIn, nOut, kSize, stride=1, d=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optional stride rate for down-sampling
        :param d: optional dilation rate
        '''
        super().__init__()
        padding = int((kSize - 1) / 2) * d        if d == 1:
            self.conv =nn.Sequential(
                nn.Conv2D(nIn, nIn, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn, bias_attr=False,
                          dilation=d),
                nn.Conv2D(nIn, nOut, kernel_size=1, stride=1, bias_attr=False)
            )        else:
            combine_kernel = 2 * d - 1

            self.conv = nn.Sequential(
                nn.Conv2D(nIn, nIn, kernel_size=(combine_kernel, 1), stride=stride, padding=(padding - 1, 0),
                          groups=nIn, bias_attr=False),
                nn.BatchNorm2D(nIn),
                nn.PReLU(nIn),
                nn.Conv2D(nIn, nIn, kernel_size=(1, combine_kernel), stride=stride, padding=(0, padding - 1),
                          groups=nIn, bias_attr=False),
                nn.BatchNorm2D(nIn),
                nn.Conv2D(nIn, nIn, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn, bias_attr=False,
                          dilation=d),
                nn.Conv2D(nIn, nOut, kernel_size=1, stride=1, bias_attr=False))    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)        return outputclass Down_advancedC3(nn.Layer):
    def __init__(self, nIn, nOut, ratio=[2,4,8]):
        super().__init__()
        n = int(nOut // 3)
        n1 = nOut - 3 * n
        self.c1 = C(nIn, n, 3, 2)

        self.d1 = C3block(n, n+n1, 3, 1, ratio[0])
        self.d2 = C3block(n, n, 3, 1, ratio[1])
        self.d3 = C3block(n, n, 3, 1, ratio[2])

        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-3)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        output1 = self.c1(input)
        d1 = self.d1(output1)
        d2 = self.d2(output1)
        d3 = self.d3(output1)

        combine = paddle.concat([d1, d2, d3], 1)

        output = self.bn(combine)
        output = self.act(output)        return outputclass AdvancedC3(nn.Layer):
    '''
    This class defines the ESP block, which is based on the following principle
        Reduce ---> Split ---> Transform --> Merge
    '''

    def __init__(self, nIn, nOut, add=True, ratio=[2,4,8]):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param add: if true, add a residual connection through identity operation. You can use projection too as
                in ResNet paper, but we *oid to use it if the dimensions are not the same because we do not want to
                increase the module complexity
        '''
        super().__init__()
        n = int(nOut // 3)
        n1 = nOut - 3 * n
        self.c1 = C(nIn, n, 1, 1)

        self.d1 = C3block(n, n + n1, 3, 1, ratio[0])
        self.d2 = C3block(n, n, 3, 1, ratio[1])
        self.d3 = C3block(n, n, 3, 1, ratio[2])        # self.d4 = Double_CDilated(n, n, 3, 1, 12)
        # self.conv =C(nOut, nOut, 1,1)

        self.bn = BR(nOut)
        self.add = add    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        # reduce
        output1 = self.c1(input)
        d1 = self.d1(output1)
        d2 = self.d2(output1)
        d3 = self.d3(output1)

        combine = paddle.concat([d1, d2, d3], 1)        if self.add:
            combine = paddle.add(input, combine)
        output = self.bn(combine)        return outputclass InputProjectionA(nn.Layer):
    '''
    This class projects the input image to the same spatial dimensions as the feature map.
    For example, if the input image is 512 x512 x3 and spatial dimensions of feature map size are 56x56xF, then
    this class will generate an output of 56x56x3
    '''

    def __init__(self, samplingTimes):
        '''
        :param samplingTimes: The rate at which you want to down-sample the image
        '''
        super().__init__()
        self.pool = nn.LayerList()        for i in range(0, samplingTimes):            # pyramid-based approach for down-sampling
            self.pool.append(nn.AvgPool2D(2, stride=2, padding=0))    def forward(self, input):
        '''
        :param input: Input RGB Image
        :return: down-sampled image (pyramid-based approach)
        '''
        for pool in self.pool:            input = pool(input)        return inputclass ExtremeC3NetCoarse(nn.Layer):
    '''
    This class defines the ESPNet-C network in the paper
    '''

    def __init__(self, classes=20, p=5, q=3):
        '''
        :param classes: number of classes in the dataset. Default is 20 for the cityscapes
        :param p: depth multiplier
        :param q: depth multiplier
        '''
        super().__init__()


        self.level1 = CBR(3, basic_0, 3, 2)
        self.sample1 = InputProjectionA(1)
        self.sample2 = InputProjectionA(2)

        self.b1 = BR(basic_0 + 3)
        self.level2_0 = Down_advancedC3(basic_0 + 3, basic_1, ratio=[1, 2, 3])  # , ratio=[1,2,3]

        self.level2 = nn.LayerList()        for i in range(0, p):
            self.level2.append(
                AdvancedC3(basic_1, basic_1, ratio=[1, 3, 4]))  # , ratio=[1,3,4]
        self.b2 = BR(basic_1 * 2 + 3)

        self.level3_0 = AdvancedC3(basic_1 * 2 + 3, basic_2, add=False,
                                                            ratio=[1, 3, 5])  # , ratio=[1,3,5]

        self.level3 = nn.LayerList()        for i in range(0, q):
            self.level3.append(AdvancedC3(basic_2, basic_2))
        self.b3 = BR(basic_2 * 2)


        self.Coarseclassifier = C(basic_2*2, classes, 1, 1)    def forward(self, input):
        '''
        :param input: Receives the input RGB image
        :return: the transformed feature map with spatial dimensions 1/8th of the input image
        '''
        output0 = self.level1(input)
        inp1 = self.sample1(input)
        inp2 = self.sample2(input)

        output0_cat = self.b1(paddle.concat([output0, inp1], 1))
        output1_0 = self.level2_0(output0_cat)  # down-sampled

        for i, layer in enumerate(self.level2):            if i == 0:
                output1 = layer(output1_0)            else:
                output1 = layer(output1)

        output1_cat = self.b2(paddle.concat([output1, output1_0, inp2], 1))

        output2_0 = self.level3_0(output1_cat)  # down-sampled
        for i, layer in enumerate(self.level3):            if i == 0:
                output2 = layer(output2_0)            else:
                output2 = layer(output2)

        output2_cat = self.b3(paddle.concat([output2_0, output2], 1))

        classifier = self.Coarseclassifier(output2_cat)        return classifierclass ExtremeC3Net(nn.Layer):
    '''
    This class defines the ESPNet-C network in the paper
    '''

    def __init__(self, classes=20, p=5, q=3):
        '''
        :param classes: number of classes in the dataset. Default is 20 for the cityscapes
        :param p: depth multiplier
        :param q: depth multiplier
        '''
        super().__init__()


        self.encoder = ExtremeC3NetCoarse(classes, p, q)        # # load the encoder modules
        del self.encoder.Coarseclassifier

        self.upsample = nn.Sequential(
            nn.Conv2D(kernel_size=(1, 1), in_channels=basic_2*2, out_channels=basic_3,bias_attr=False),
            nn.BatchNorm2D(basic_3),
            nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=2)

        )

        self.Fine = nn.Sequential(            # nn.Conv2D(kernel_size=3, stride=2, padding=1, in_channels=3, out_channels=basic_3,bias_attr=False),
            C(3, basic_3, 3, 2),
            AdvancedC3(basic_3, basic_3, add=True),            # nn.BatchNorm2D(basic_3, epsilon=1e-03),

        )
        self.classifier = nn.Sequential(
            BR(basic_3),
            nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=2),
            nn.Conv2D(kernel_size=(1, 1), in_channels=basic_3, out_channels=classes, bias_attr=False),
        )    def forward(self, input):
        '''
        :param input: Receives the input RGB image
        :return: the transformed feature map with spatial dimensions 1/8th of the input image
        '''
        output0 = self.encoder.level1(input)
        inp1 = self.encoder.sample1(input)
        inp2 = self.encoder.sample2(input)

        output0_cat = self.encoder.b1(paddle.concat([output0, inp1], 1))
        output1_0 = self.encoder.level2_0(output0_cat)  # down-sampled

        for i, layer in enumerate(self.encoder.level2):            if i == 0:
                output1 = layer(output1_0)            else:
                output1 = layer(output1)

        output1_cat = self.encoder.b2(paddle.concat([output1, output1_0, inp2], 1))

        output2_0 = self.encoder.level3_0(output1_cat)  # down-sampled
        for i, layer in enumerate(self.encoder.level3):            if i == 0:
                output2 = layer(output2_0)            else:
                output2 = layer(output2)

        output2_cat = self.encoder.b3(paddle.concat([output2_0, output2], 1))

        Coarse = self.upsample(output2_cat)
        Fine =  self.Fine(input)
        classifier = self.classifier(paddle.add(Coarse, Fine))        
        return classifier
   

以上就是轻量级人像分割模型:SINet 和 ExtremeC3Net的详细内容,更多请关注其它相关文章!


# ai  # 第三步  # 第二步  # 介绍一下  # 可通过  # 可选  # 这两个  # 加载  # 一言  # fig  # latte  # follow  # asic  # igs  # red  # python  # 中文网  # 年度推广营销案例范文模板  # 梁平知名网站建设费用  # 南京抖音seo搜索排名  # 广告网站建设实例图  # SEO中国专访  # 河南建设信息网站  # 玖月seo  # 鲜花网站怎么样进行推广  # 网站该如何推广运营方案  # 网站不推广能引流吗知乎  # 开源 


相关栏目: 【 行业资讯67740 】 【 技术百科0 】 【 网络运营39195


相关推荐: 固态硬盘如何装入机箱  本科一批和本科二批是什么意思  如何找出命令行  固态硬盘如何外接  manager是什么意思  juice是什么意思  一帧是多少秒  power在坐标轴中是什么意思  如何在命令行执行一个jar  单片机怎么连接电路图  花呗征信不好如何恢复 如何修复不良的花呗征信  a股等权市盈率中位数是什么意思  make命令如何使用  如何通过命令系统还原  linux如何跳回命令行界面  征信不好如何快速恢复 征信不好快速恢复的方法  开机如何进入命令行模式  NoSQL数据库有哪些特点  ao3镜像网站永久地址入口  单身交友必备软件  税负是什么意思  iPhone无法打开YouTube原因分析与解决方案  如何查找固态硬盘  j*a怎么用json数组  春运抢票技巧攻略  如何以命令符运行程序  新版路由器如何设置路由命令  光刻机的分类及特点  单片机显存怎么设置最佳  内网和外网区别 内网和外网有什么区别  光刻机的作用及工作原理  苹果手机16有哪些功能  服务器系统怎么装  固态硬盘2m如何修复  跨境电商gmv是什么意思?跨境电商GMV:理解其含义、计算方法和影响因素  j*a数组怎么取元素  固态硬盘颗粒如何修理  夸克解压什么意思  win10如何打开dos命令窗口大小  皓影混动仪表盘上power是什么意思  苹果16主打颜色有哪些  建伍遥控器power是什么意思  如何判断固态硬盘端口  夸克转存中是什么意思  如何看固态硬盘信息  1kb等于多少字节  5r是多少钱  云笔记本电脑有什么用  如何利用运行命令查看声音启动  typescript如何开发 

搜索