新闻中心
基于PaddlePaddle复现CoTNet
基于Self-Attention的Transformer结构,首先在NLP任务中被提出,最近在CV任务中展现出了非常好的效果。然而,大多数现有的Transformer直接在二维特征图上的进行Self-Attention,基于每个空间位置的query和key获得注意力矩阵,但相邻的key之间的上下文信息未得到充分利用。
☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

引入
基于Self-Attention的Transformer结构,首先在NLP任务中被提出,最近在CV任务中展现出了非常好的效果。然而,大多数现有的Transformer直接在二维特征图上的进行Self-Attention,基于每个空间位置的query和key获得注意力矩阵,但相邻的key之间的上下文信息未得到充分利用。本文设计了一种新的注意力结构CoT Block,这种结构充分利用了key的上下文信息,以指导动态注意力矩阵的学习,从而增强了视觉表示的能力。
作者将CoT Block代替了ResNet结构中的3x3卷积,来形成CoTNet,最终在一系列视觉任务(分类、检测、分割)上取得了非常好的性能,此外,CoTNet在CVPR上获得开放域图像识别竞赛冠军。
模型介绍
-
相关资料:
- 论文地址:【Contextual Transformer Networks for Visual Recognition】
- 代码地址:【JDAI-CV/CoTNet】
- 核心代码:【CoTAttention-Usage】
-
论文摘要:
传统的CV领域self-attention直接在2D特征图上获取注意力矩阵,确实很好的获取了全局上下文信息,但是忽视了卷积带来的邻近上下文。而邻近上下文确确实实可以提供很多信息。
本文提供的CoTNet模块很好的继承了传统self- attention的全局上下文,也结合了卷积所带来的邻近上下文,提升了视觉表达能力。且CoTNet模块具有“即插即用”的特性,可以直接替换ResNet中Bottleneck里3*3的卷积,形成Transformer风格架构。
-
Self-attention结构在CV领域的引入,之所以在CV领域使用自注意力机制主要是有以下几点原因:
- 在CNN中,卷积层通过卷积核获得输出特征,但是卷积核的感受野很小
- 采用堆叠卷积层的方式来增加感受野的方式并不高效
- 计算机视觉中的很多任务都是由于语义信息不足从而影响的性能
- 自注意力机制可以捕获全局信息获得更大的感受野
模型介绍
1.Motivation
- 起初,CNN由于其强大的视觉表示学习能力,被广泛使用在各种CV任务中,CNN这种局部信息建模的结构充分使用了空间局部性和平移等边性。但是同样的,CNN由于只能对局部信息建模,就缺少了长距离建模和感知的能力,而这种能力在很多视觉任务中又是非常重要的。
- Transformer由于其强大的全局建模能力,被广泛使用在了各种NLP任务中。受到Transformer结构的启发,ViT、DETR等模型也借鉴了Transformer的结构来进行长距离的建模。然而,原始Transformer中的Self-Attention结构(如上图所示)只是根据query和key的交互来计算注意力矩阵,因此忽略了相邻key之间的联系。
基于此,作者提出了这样一个问题——“有没有一种优雅的方法可以通过利用二维特征图中输入key之间的上下文来增强Transformer结构?”因此作者就提出了上面的结构CoT block。传统的Self-Attention只是根据query和key来计算注意力矩阵,从而导致没有充分利用key的上下文信息。
因此作者首先在key上采用3x3的卷积来建模静态上下文信息,然后将query和上下文信息建模之后的key进行concat,再使用两个连续的1x1卷积来自我注意,生成动态上下文。静态和动态上下文信息最终被融合为输出。(简单的说,就是作者先用卷积来提取了局部了信息,从而充分发掘了key内部的静态上下文信息 )
2.方法
2.1. Multi-head Self-attention
目前在视觉的backbone中,通用的可扩展的局部多头自我注意(scalable local multi-head self-attention),如上图所示。首先用1x1的卷积上X映射到Q、K、V三个不同的空间,Q和K进行相乘获得局部的关系矩阵:
由于原始的Self-Attention对输入特征的位置是不敏感的,所以还需要在Q上加上位置信息,然后将结果与关系矩阵相加:
接着,我们还需要对上面得到的结果进行归一化,得到Attention Map:
得到Attention Map之后,我们需要将kxk的局部信息进行聚合,然后与V相乘,得到Attention之后的结果:
2.2. Contextual Transformer Block
传统的Self-Attention可以很好地触发不同空间位置的特征交互。然而,在传统的Self-Attention机制中,所有的query-key关系都是通过独立的quey-key pair学习的,没有探索两者之间的丰富上下文,这极大的限制了视觉表示学习。
phpBIZ
基于phpBIZ v2.0 中文自由版,主要实现的功能: 会员数据整合: 论坛的用户可无需注册即可以拥有自己在phpBIZ的帐号,注册一个论坛帐号即可同时拥有一个phpBIZ帐号,注册一个phpBIZ帐号同时也会开通一个相应的论坛帐号,因而避免了重复注册 新商品传送至论坛: 商家登陆的每件商品可以选择是否在论坛发帖通知。后台管理员设定传送论坛版块
1
查看详情
因此,作者提出了CoT Block,如上图所示,这个结构将上下文信息的挖掘和Self-Attention的学习聚合到了一个结构中。
首先对于输入特征 ,首先定义了三个变量 (这里只是将V进行了特征的映射,Q和K还是采用了原来的X值 )。
作者首先在K上进行了kxk的分组卷积,来获得具备局部上下文信息表示的K,(记作 ),这个 可以看做是在局部信息上进行了静态的建模。
接着作者将 和Q进行了concat,然后对concat的结果进行了两次连续的卷积操作:
不同于传统的Self-Attention,这里的A矩阵是由query信息和局部上下文信息 交互得到的,而不只是建模了query和key之间的关系。换句话说,就是通过局部上下文建模的引导,增强了自注意力机制。
然后,作者将这个Attention Map和V进行了相乘,得到了动态上下文建模的 :
- 最后CoT的结果为局部静态上下文建模的 和全局动态上下文建模的 fuse之后的结果。
2.3. Contextual Transformer Networks
CoT的设计是一个统一的自我关注的构建块,可以作为ConvNet中标准卷积的替代品。
因此,作者用CoT代替了ResNet和ResNeXt结构中的3x3卷积,形成了CoTNet和CoTNeXt。
可以看出,CoTNet-50的参数和计算量比ResNet-50略小。
与ResNeXt-50相比,CoTNeXt-50的参数数量稍多,但与FLOPs相似。
快速使用
- 推荐使用 【Paddle-Image-Models】 项目来快速加载本模型
- 具体使用方法请参考:【【Paddle-Image-Models】飞桨预训练图像模型库】
模型搭建
导入必要的包
In [1]import paddleimport paddle.nn as nnimport paddle.nn.functional as Fimport numpy as npIn [2]
# 构建CoTNetLayerclass CoTNetLayer(nn.Layer):
def __init__(self, dim=512, kernel_size=3):
super().__init__()
self.dim = dim
self.kernel_size = kernel_size
self.key_embed = nn.Sequential( # 通过K*K的卷积提取上下文信息,视作输入X的静态上下文表达
nn.Conv2D(dim, dim, kernel_size=kernel_size, padding=1, stride=1, bias_attr=False),
nn.BatchNorm2D(dim),
nn.ReLU()
)
self.value_embed = nn.Sequential(
nn.Conv2D(dim, dim, kernel_size=1, stride=1, bias_attr=False), # 1*1的卷积进行Value的编码
nn.BatchNorm2D(dim)
)
factor = 4
self.attention_embed = nn.Sequential( # 通过连续两个1*1的卷积计算注意力矩阵
nn.Conv2D(2 * dim, 2 * dim // factor, 1, bias_attr=False), # 输入concat后的特征矩阵 Channel = 2*C
nn.BatchNorm2D(2 * dim // factor),
nn.ReLU(),
nn.Conv2D(2 * dim // factor, kernel_size * kernel_size * dim, 1, stride=1) # out: H * W * (K*K*C)
) def forward(self, x):
bs, c, h, w = x.shape
k1 = self.key_embed(x) # shape:bs,c,h,w 提取静态上下文信息得到key
# v = self.value_embed(x) # shape:bs,c,h*w 得到value编码
# print(v)
flatten1 = nn.Flatten(start_axis=2, stop_axis=-1) # shape:bs,c,h*w 得到value编码
v = flatten1(self.value_embed(x))
y = paddle.concat([k1, x], axis=1)
att = self.attention_embed(y) # shape:bs,c*k*k,h,w 计算注意力矩阵
att = paddle.reshape(att, [bs, c, self.kernel_size * self.kernel_size, h, w]) # att = att.reshape(bs, c, self.kernel_size * self.kernel_size, h, w)
att = att.mean(2, keepdim=False) # shape:bs,c,h*w 求平均降低维度
att = flatten1(att)
k2 = F.softmax(att, axis=-1) * v # 对每一个H*w进行softmax后
k2 = paddle.reshape(k2, [bs, c, h, w]) return k1 + k2 # 注意力融合# if __name__ == '__main__':# input = paddle.randn(shape=[50, 512, 7, 7])# cot = CoTNetLayer(dim=512, kernel_size=3)# output = cot(input)# print(output.shape)
In [3]
from paddle.nn.layer.activation import ReLUfrom CoTNetBlock import CoTNetLayer
zeros_ = nn.initializer.Constant(value=0.)
ones_ = nn.initializer.Constant(value=1.)
kaiming_normal_ = nn.initializer.KaimingNormal()def get_n_params(model):
pp = 0
for p in list(model.parameters()):
nn = 1
for s in list(p.size()):
nn = nn * s
pp += nn return pp# 构建Bottleneckclass Bottleneck(nn.Layer):
expansion = 4
def __init__(self, in_planes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__() # width = int(planes * (base_width / 64.)) * groups
self.conv1 = nn.Conv2D(in_planes, planes, kernel_size=1, bias_attr=False)
self.bn1 = nn.BatchNorm2D(planes)
self.cot_layer = CoTNetLayer(dim=planes, kernel_size=3)
self.bn2 = nn.BatchNorm2D(planes)
self.conv3 = nn.Conv2D(planes, planes * self.expansion, 1, bias_attr=False)
self.bn3 = nn.BatchNorm2D(planes * self.expansion)
self.relu = ReLU(True)
self.downsample = downsample
self.stride = stride if stride > 1:
self.*d = nn.AvgPool2D(3, 2, padding=1) else:
self.*d = None
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out) if self.*d is not None:
out = self.*d(out)
out = self.cot_layer(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out) if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out) return out# 构建CoTResNetclass CoTResNet(nn.Layer):
def __init__(self, block, layers, num_classes=1000):
super(CoTResNet, self).__init__()
self.in_planes = 64
self.conv1 = nn.Conv2D(3, 64, kernel_size=7, stride=2, padding=3, bias_attr=False)
self.bn1 = nn.BatchNorm2D(64)
self.relu = ReLU()
self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.layer1 = self._maker_layer(block, 64, layers[0])
self.layer2 = self._maker_layer(block, 128, layers[1], stride=2)
self.layer3 = self._maker_layer(block, 256, layers[2], stride=2)
self.layer4 = self._maker_layer(block, 512, layers[3], stride=2)
self.*gpool = nn.AvgPool2D(7, stride=1)
self.fc = nn.Linear(512 * block.expansion, num_classes) def _init_weights(self, m):
if isinstance(m, nn.Conv2D):
kaiming_normal_(m.weight) elif isinstance(m, nn.BatchNorm2D):
ones_(m.weight)
zeros_(m.bias) def _maker_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.in_planes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2D(self.in_planes, planes * block.expansion,
kernel_size=1, stride=stride, bias_attr=False),
nn.BatchNorm2D(planes * block.expansion),
)
layers = []
layers.append(block(self.in_planes, planes, stride, downsample))
self.in_planes = planes * block.expansion for i in range(1, blocks):
layers.append(block(self.in_planes, planes)) return nn.Sequential(*layers) def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.*gpool(x)
flatten = nn.Flatten(start_axis=1, stop_axis=-1)
x = flatten(x)
x = self.fc(x) return xdef cotnet50(**kwargs):
model = CoTResNet(Bottleneck, [3, 4, 6, 3], **kwargs) return model# if __name__ == '__main__':# # x = paddle.rand(shape=[1, 3, 224, 224])# # y = model(x)# # print(y.shape)# paddle.Model(cotnet50()).summary((1, 3, 224, 224))# # paddle.Model(paddle.vision.resnet50()).summary((1, 3, 224, 224))
预设模型
In [4]def cotnet50(**kwargs):
model = CoTResNet(Bottleneck, [3, 4, 6, 3], **kwargs) return model
模型训练
- 对构建的模型进行训练。
模型验证
In [ ]# !mkdir ~/data/ILSVRC2012# !tar -xf ~/data/data89857/ILSVRC2012mini.tar -C ~/data/ILSVRC2012In [5]
import osimport paddleimport numpy as npfrom PIL import Imageclass ILSVRC2012(paddle.io.Dataset):
def __init__(self, root, label_list, transform):
self.transform = transform
self.root = root
self.label_list = label_list
self.load_datas() def load_datas(self):
self.imgs = []
self.labels = [] with open(self.label_list, 'r') as f: for line in f:
img, label = line[:-1].split(' ')
self.imgs.append(os.path.join(self.root, img))
self.labels.append(int(label)) def __getitem__(self, idx):
label = self.labels[idx]
image = self.imgs[idx]
image = Image.open(image).convert('RGB')
image = self.transform(image) return image.astype('float32'), np.array(label).astype('int64') def __len__(self):
return len(self.imgs)
模型训练
In [6]# 1.构建模型network = cotnet50(num_classes=1000)#使用paddle高层APImodel = paddle.Model(network) model.summary((1, 3, 224, 224))#模型可视化
W0426 17:09:21.611791 1204 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0426 17:09:21.615837 1204 device_context.cc:465] device: 0, cuDNN Version: 7.6.
--------------------------------------------------------------------------- Layer (type)Input Shape Output Shape Param # =========================================================================== Conv2D-1 [[1, 3, 224, 224]] [1, 64, 112, 112] 9,408 BatchNorm2D-1 [[1, 64, 112, 112]] [1, 64, 112, 112] 256 ReLU-1 [[1, 64, 112, 112]] [1, 64, 112, 112] 0 MaxPool2D-1 [[1, 64, 112, 112]] [1, 64, 56, 56] 0 Conv2D-3 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096 BatchNorm2D-3 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 ReLU-4 [[1, 256, 56, 56]] [1, 256, 56, 56] 0 Conv2D-4 [[1, 64, 56, 56]] [1, 64, 56, 56] 36,864 BatchNorm2D-4 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 ReLU-2 [[1, 64, 56, 56]] [1, 64, 56, 56] 0 Conv2D-5 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096 BatchNorm2D-5 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 Conv2D-6 [[1, 128, 56, 56]] [1, 32, 56, 56] 4,096 BatchNorm2D-6 [[1, 32, 56, 56]] [1, 32, 56, 56] 128 ReLU-3 [[1, 32, 56, 56]] [1, 32, 56, 56] 0 Conv2D-7 [[1, 32, 56, 56]] [1, 576, 56, 56] 19,008 CoTNetLayer-1 [[1, 64, 56, 56]] [1, 64, 56, 56] 0 BatchNorm2D-7 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 Conv2D-8 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384 BatchNorm2D-8 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024 Conv2D-2 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384 BatchNorm2D-2 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024 Bottleneck-1 [[1, 64, 56, 56]] [1, 256, 56, 56] 0 Conv2D-9 [[1, 256, 56, 56]] [1, 64, 56, 56] 16,384 BatchNorm2D-9 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 ReLU-7 [[1, 256, 56, 56]] [1, 256, 56, 56] 0 Conv2D-10 [[1, 64, 56, 56]] [1, 64, 56, 56] 36,864 BatchNorm2D-10 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 ReLU-5 [[1, 64, 56, 56]] [1, 64, 56, 56] 0 Conv2D-11 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096 BatchNorm2D-11 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 Conv2D-12 [[1, 128, 56, 56]] [1, 32, 56, 56] 4,096 BatchNorm2D-12 [[1, 32, 56, 56]] [1, 32, 56, 56] 128 ReLU-6 [[1, 32, 56, 56]] [1, 32, 56, 56] 0 Conv2D-13 [[1, 32, 56, 56]] [1, 576, 56, 56] 19,008 CoTNetLayer-2 [[1, 64, 56, 56]] [1, 64, 56, 56] 0 BatchNorm2D-13 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 Conv2D-14 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384 BatchNorm2D-14 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024 Bottleneck-2 [[1, 256, 56, 56]] [1, 256, 56, 56] 0 Conv2D-15 [[1, 256, 56, 56]] [1, 64, 56, 56] 16,384 BatchNorm2D-15 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 ReLU-10 [[1, 256, 56, 56]] [1, 256, 56, 56] 0 Conv2D-16 [[1, 64, 56, 56]] [1, 64, 56, 56] 36,864 BatchNorm2D-16 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 ReLU-8 [[1, 64, 56, 56]] [1, 64, 56, 56] 0 Conv2D-17 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096 BatchNorm2D-17 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 Conv2D-18 [[1, 128, 56, 56]] [1, 32, 56, 56] 4,096 BatchNorm2D-18 [[1, 32, 56, 56]] [1, 32, 56, 56] 128 ReLU-9 [[1, 32, 56, 56]] [1, 32, 56, 56] 0 Conv2D-19 [[1, 32, 56, 56]] [1, 576, 56, 56] 19,008 CoTNetLayer-3 [[1, 64, 56, 56]] [1, 64, 56, 56] 0 BatchNorm2D-19 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 Conv2D-20 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384 BatchNorm2D-20 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024 Bottleneck-3 [[1, 256, 56, 56]] [1, 256, 56, 56] 0 Conv2D-22 [[1, 256, 56, 56]] [1, 128, 56, 56] 32,768 BatchNorm2D-22 [[1, 128, 56, 56]] [1, 128, 56, 56] 512 ReLU-13 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 AvgPool2D-1 [[1, 128, 56, 56]] [1, 128, 28, 28] 0 Conv2D-23 [[1, 128, 28, 28]] [1, 128, 28, 28] 147,456 BatchNorm2D-23 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-11 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 Conv2D-24 [[1, 128, 28, 28]] [1, 128, 28, 28] 16,384 BatchNorm2D-24 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-25 [[1, 256, 28, 28]] [1, 64, 28, 28] 16,384 BatchNorm2D-25 [[1, 64, 28, 28]] [1, 64, 28, 28] 256 ReLU-12 [[1, 64, 28, 28]] [1, 64, 28, 28] 0 Conv2D-26 [[1, 64, 28, 28]] [1, 1152, 28, 28] 74,880 CoTNetLayer-4 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 BatchNorm2D-26 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-27 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536 BatchNorm2D-27 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048 Conv2D-21 [[1, 256, 56, 56]] [1, 512, 28, 28] 131,072 BatchNorm2D-21 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048 Bottleneck-4 [[1, 256, 56, 56]] [1, 512, 28, 28] 0 Conv2D-28 [[1, 512, 28, 28]] [1, 128, 28, 28] 65,536 BatchNorm2D-28 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-16 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 Conv2D-29 [[1, 128, 28, 28]] [1, 128, 28, 28] 147,456 BatchNorm2D-29 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-14 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 Conv2D-30 [[1, 128, 28, 28]] [1, 128, 28, 28] 16,384 BatchNorm2D-30 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-31 [[1, 256, 28, 28]] [1, 64, 28, 28] 16,384 BatchNorm2D-31 [[1, 64, 28, 28]] [1, 64, 28, 28] 256 ReLU-15 [[1, 64, 28, 28]] [1, 64, 28, 28] 0 Conv2D-32 [[1, 64, 28, 28]] [1, 1152, 28, 28] 74,880 CoTNetLayer-5 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 BatchNorm2D-32 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-33 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536 BatchNorm2D-33 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048 Bottleneck-5 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 Conv2D-34 [[1, 512, 28, 28]] [1, 128, 28, 28] 65,536 BatchNorm2D-34 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-19 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 Conv2D-35 [[1, 128, 28, 28]] [1, 128, 28, 28] 147,456 BatchNorm2D-35 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-17 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 Conv2D-36 [[1, 128, 28, 28]] [1, 128, 28, 28] 16,384 BatchNorm2D-36 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-37 [[1, 256, 28, 28]] [1, 64, 28, 28] 16,384 BatchNorm2D-37 [[1, 64, 28, 28]] [1, 64, 28, 28] 256 ReLU-18 [[1, 64, 28, 28]] [1, 64, 28, 28] 0 Conv2D-38 [[1, 64, 28, 28]] [1, 1152, 28, 28] 74,880 CoTNetLayer-6 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 BatchNorm2D-38 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-39 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536 BatchNorm2D-39 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048 Bottleneck-6 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 Conv2D-40 [[1, 512, 28, 28]] [1, 128, 28, 28] 65,536 BatchNorm2D-40 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-22 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 Conv2D-41 [[1, 128, 28, 28]] [1, 128, 28, 28] 147,456 BatchNorm2D-41 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-20 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 Conv2D-42 [[1, 128, 28, 28]] [1, 128, 28, 28] 16,384 BatchNorm2D-42 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-43 [[1, 256, 28, 28]] [1, 64, 28, 28] 16,384 BatchNorm2D-43 [[1, 64, 28, 28]] [1, 64, 28, 28] 256 ReLU-21 [[1, 64, 28, 28]] [1, 64, 28, 28] 0 Conv2D-44 [[1, 64, 28, 28]] [1, 1152, 28, 28] 74,880 CoTNetLayer-7 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 BatchNorm2D-44 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-45 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536 BatchNorm2D-45 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048 Bottleneck-7 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 Conv2D-47 [[1, 512, 28, 28]] [1, 256, 28, 28] 131,072 BatchNorm2D-47 [[1, 256, 28, 28]] [1, 256, 28, 28] 1,024 ReLU-25 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 AvgPool2D-2 [[1, 256, 28, 28]] [1, 256, 14, 14] 0 Conv2D-48 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824 BatchNorm2D-48 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-23 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 Conv2D-49 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536 BatchNorm2D-49 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-50 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536 BatchNorm2D-50 [[1, 128, 14, 14]] [1, 128, 14, 14] 512 ReLU-24 [[1, 128, 14, 14]] [1, 128, 14, 14] 0 Conv2D-51 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216 CoTNetLayer-8 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-51 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-52 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-52 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 Conv2D-46 [[1, 512, 28, 28]] [1, 1024, 14, 14] 524,288 BatchNorm2D-46 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 Bottleneck-8 [[1, 512, 28, 28]] [1, 1024, 14, 14] 0 Conv2D-53 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144 BatchNorm2D-53 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-28 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-54 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824 BatchNorm2D-54 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-26 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 Conv2D-55 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536 BatchNorm2D-55 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-56 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536 BatchNorm2D-56 [[1, 128, 14, 14]] [1, 128, 14, 14] 512 ReLU-27 [[1, 128, 14, 14]] [1, 128, 14, 14] 0 Conv2D-57 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216 CoTNetLayer-9 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-57 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-58 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-58 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 Bottleneck-9 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-59 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144 BatchNorm2D-59 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-31 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-60 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824 BatchNorm2D-60 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-29 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 Conv2D-61 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536 BatchNorm2D-61 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-62 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536 BatchNorm2D-62 [[1, 128, 14, 14]] [1, 128, 14, 14] 512 ReLU-30 [[1, 128, 14, 14]] [1, 128, 14, 14] 0 Conv2D-63 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216 CoTNetLayer-10 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-63 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-64 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-64 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 Bottleneck-10 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-65 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144 BatchNorm2D-65 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-34 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-66 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824 BatchNorm2D-66 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-32 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 Conv2D-67 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536 BatchNorm2D-67 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-68 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536 BatchNorm2D-68 [[1, 128, 14, 14]] [1, 128, 14, 14] 512 ReLU-33 [[1, 128, 14, 14]] [1, 128, 14, 14] 0 Conv2D-69 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216 CoTNetLayer-11 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-69 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-70 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-70 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 Bottleneck-11 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-71 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144 BatchNorm2D-71 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-37 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-72 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824 BatchNorm2D-72 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-35 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 Conv2D-73 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536 BatchNorm2D-73 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-74 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536 BatchNorm2D-74 [[1, 128, 14, 14]] [1, 128, 14, 14] 512 ReLU-36 [[1, 128, 14, 14]] [1, 128, 14, 14] 0 Conv2D-75 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216 CoTNetLayer-12 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-75 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-76 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-76 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 Bottleneck-12 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-77 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144 BatchNorm2D-77 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-40 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-78 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824 BatchNorm2D-78 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-38 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 Conv2D-79 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536 BatchNorm2D-79 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-80 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536 BatchNorm2D-80 [[1, 128, 14, 14]] [1, 128, 14, 14] 512 ReLU-39 [[1, 128, 14, 14]] [1, 128, 14, 14] 0 Conv2D-81 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216 CoTNetLayer-13 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-81 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-82 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-82 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 Bottleneck-13 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-84 [[1, 1024, 14, 14]] [1, 512, 14, 14] 524,288 BatchNorm2D-84 [[1, 512, 14, 14]] [1, 512, 14, 14] 2,048 ReLU-43 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0 AvgPool2D-3 [[1, 512, 14, 14]] [1, 512, 7, 7] 0 Conv2D-85 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,359,296 BatchNorm2D-85 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 ReLU-41 [[1, 512, 7, 7]] [1, 512, 7, 7] 0 Conv2D-86 [[1, 512, 7, 7]] [1, 512, 7, 7] 262,144 BatchNorm2D-86 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 Conv2D-87 [[1, 1024, 7, 7]] [1, 256, 7, 7] 262,144 BatchNorm2D-87 [[1, 256, 7, 7]] [1, 256, 7, 7] 1,024 ReLU-42 [[1, 256, 7, 7]] [1, 256, 7, 7] 0 Conv2D-88 [[1, 256, 7, 7]] [1, 4608, 7, 7] 1,184,256 CoTNetLayer-14 [[1, 512, 7, 7]] [1, 512, 7, 7] 0 BatchNorm2D-88 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 Conv2D-89 [[1, 512, 7, 7]] [1, 2048, 7, 7] 1,048,576 BatchNorm2D-89 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192 Conv2D-83 [[1, 1024, 14, 14]] [1, 2048, 7, 7] 2,097,152 BatchNorm2D-83 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192 Bottleneck-14 [[1, 1024, 14, 14]] [1, 2048, 7, 7] 0 Conv2D-90 [[1, 2048, 7, 7]] [1, 512, 7, 7] 1,048,576 BatchNorm2D-90 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 ReLU-46 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0 Conv2D-91 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,359,296 BatchNorm2D-91 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 ReLU-44 [[1, 512, 7, 7]] [1, 512, 7, 7] 0 Conv2D-92 [[1, 512, 7, 7]] [1, 512, 7, 7] 262,144 BatchNorm2D-92 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 Conv2D-93 [[1, 1024, 7, 7]] [1, 256, 7, 7] 262,144 BatchNorm2D-93 [[1, 256, 7, 7]] [1, 256, 7, 7] 1,024 ReLU-45 [[1, 256, 7, 7]] [1, 256, 7, 7] 0 Conv2D-94 [[1, 256, 7, 7]] [1, 4608, 7, 7] 1,184,256 CoTNetLayer-15 [[1, 512, 7, 7]] [1, 512, 7, 7] 0 BatchNorm2D-94 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 Conv2D-95 [[1, 512, 7, 7]] [1, 2048, 7, 7] 1,048,576 BatchNorm2D-95 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192 Bottleneck-15 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0 Conv2D-96 [[1, 2048, 7, 7]] [1, 512, 7, 7] 1,048,576 BatchNorm2D-96 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 ReLU-49 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0 Conv2D-97 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,359,296 BatchNorm2D-97 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 ReLU-47 [[1, 512, 7, 7]] [1, 512, 7, 7] 0 Conv2D-98 [[1, 512, 7, 7]] [1, 512, 7, 7] 262,144 BatchNorm2D-98 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 Conv2D-99 [[1, 1024, 7, 7]] [1, 256, 7, 7] 262,144 BatchNorm2D-99 [[1, 256, 7, 7]] [1, 256, 7, 7] 1,024 ReLU-48 [[1, 256, 7, 7]] [1, 256, 7, 7] 0 Conv2D-100 [[1, 256, 7, 7]] [1, 4608, 7, 7] 1,184,256 CoTNetLayer-16 [[1, 512, 7, 7]] [1, 512, 7, 7] 0 BatchNorm2D-100 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 Conv2D-101 [[1, 512, 7, 7]] [1, 2048, 7, 7] 1,048,576 BatchNorm2D-101 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192 Bottleneck-16 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0 AvgPool2D-4 [[1, 2048, 7, 7]] [1, 2048, 1, 1] 0 Linear-1 [[1, 2048]] [1, 1000] 2,049,000 =========================================================================== Total params: 33,855,464 Trainable params: 33,711,464 Non-trainable params: 144,000 --------------------------------------------------------------------------- Input size (MB): 0.57 Forward/backward pass size (MB): 426.00 Params size (MB): 129.15 Estimated Total Size (MB): 555.72 ---------------------------------------------------------------------------
{'total_params': 33855464, 'trainable_params': 33711464}
In [7]
# 2.数据预处理import paddle.vision.transforms as T
train_transforms = T.Compose([
T.Resize(256, interpolation='bicubic'),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
train_dataset = ILSVRC2012('ILSVRC2012mini', transform=train_transforms, label_list='ILSVRC2012mini/train_list.txt')
val_transforms = T.Compose([
T.Resize(256, interpolation='bicubic'),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
val_dataset = ILSVRC2012('ILSVRC2012mini', transform=val_transforms, label_list='ILSVRC2012mini/val_list.txt')
In [ ]
EPOCHS = 100BATCH_SIZE = 64#优化函数def create_optim(parameters):
step_each_epoch = 40000 // BATCH_SIZE
lr = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=0.01, T_max=step_each_epoch * EPOCHS) return paddle.optimizer.Momentum(
learning_rate=lr,
parameters=parameters,
weight_decay=paddle.regularizer.L2Decay(1.0)) #正则化来提升精度
# 模型训练配置model.prepare(create_optim(network.parameters()),
paddle.nn.CrossEntropyLoss(), # 损失函数
paddle.metric.Accuracy(topk=(1, 5))) # 评估指标
# 训练可视化callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir')
# 启动模型全流程训练model.fit(train_dataset, # 训练数据集
val_dataset, # 评估数据集
epochs=EPOCHS, # 总的训练轮次
batch_size=BATCH_SIZE, # 批次计算的样本量大小
shuffle=True, # 是否打乱样本集
verbose=1, # 日志展示格式
s*e_dir='./chk_points/', # 分阶段的训练模型存储路径
callbacks=callback) # 回调函数使用
以上就是基于PaddlePaddle复现CoTNet的详细内容,更多请关注其它相关文章!
# cos
# 网站推广软件营销
# 什么是seo专业代发
# 网站推广及建设ppt
# 出了
# 所示
# 都是
# 非常好
# 提出了
# 很好
# 充分利用
# 进行了
# 帐号
# 中文网
# type
# latte
# ai
# 凌源优化seo
# 关键词靠前排名
# fb营销推广教学
# 洛阳专业网站建设费用
# 外贸网站应该怎样推广呢
# seo436
# 关键词做起来了排名第
相关栏目:
【
行业资讯67740 】
【
技术百科0 】
【
网络运营39195 】
相关推荐:
如何使用命令行界面
nfc近场通讯功能是什么意思
什么网址不能域名解析
为什么程序员热爱typescript
车子上面nfc功能是什么意思
如何打开命令框
市盈率pe是什么意思
苹果16适合哪些机升级
电瓶车屏幕上显示power是什么意思
typescript和哪个语音很像
平仓是什么意思?
开机如何进入命令行模式
如何查看固态硬盘分区
征信不好如何恢复信誉度 征信不好恢复信誉度的方法
苹果手机16系统有哪些
分享一个稳定的ao3镜像网址
单身聊天app有哪些软件 2025最靠谱的单身交友软件推荐
如何查看网站域名解析
performance是什么意思
ts什么意思
typescript怎么使用vue
typescript怎么添加css样式
固态硬盘如何下载网页
电瓶车的power是什么意思
电动车power灯亮是什么意思
宝马x5仪表盘上边有power是什么意思
为什么夸克流畅播失败
typescript中如何引入本地js
typescript怎么解析vue TypeScript在vue中的使用最新解读
一天多少分钟
树莓派命令行如何新建文件
typescript接口怎么选
j*a map数组怎么用
360n6锁屏壁纸怎么设置
5g手机怎么没视频通话功能
征信信用不好如何恢复 征信信用不好如何恢复指南
hen是什么意思
市盈率和市净率是什么意思
65寸电视长宽多少厘米
春运提前抢票攻略
苹果16新增哪些功能
如何查看win10版本命令行
固态硬盘内存如何查找
选哪个折叠屏手机好
苹果16多有哪些功能
什么是unix时间戳
内网和外网区别 内网和外网有什么区别
vs怎么编写typescript
位置控制单片机怎么用的
固态硬盘损坏如何修复


2025-07-31
浏览次数:次
返回列表
Input Shape Output Shape Param #
===========================================================================
Conv2D-1 [[1, 3, 224, 224]] [1, 64, 112, 112] 9,408
BatchNorm2D-1 [[1, 64, 112, 112]] [1, 64, 112, 112] 256
ReLU-1 [[1, 64, 112, 112]] [1, 64, 112, 112] 0
MaxPool2D-1 [[1, 64, 112, 112]] [1, 64, 56, 56] 0
Conv2D-3 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096
BatchNorm2D-3 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
ReLU-4 [[1, 256, 56, 56]] [1, 256, 56, 56] 0
Conv2D-4 [[1, 64, 56, 56]] [1, 64, 56, 56] 36,864
BatchNorm2D-4 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
ReLU-2 [[1, 64, 56, 56]] [1, 64, 56, 56] 0
Conv2D-5 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096
BatchNorm2D-5 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
Conv2D-6 [[1, 128, 56, 56]] [1, 32, 56, 56] 4,096
BatchNorm2D-6 [[1, 32, 56, 56]] [1, 32, 56, 56] 128
ReLU-3 [[1, 32, 56, 56]] [1, 32, 56, 56] 0
Conv2D-7 [[1, 32, 56, 56]] [1, 576, 56, 56] 19,008
CoTNetLayer-1 [[1, 64, 56, 56]] [1, 64, 56, 56] 0
BatchNorm2D-7 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
Conv2D-8 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384
BatchNorm2D-8 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024
Conv2D-2 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384
BatchNorm2D-2 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024
Bottleneck-1 [[1, 64, 56, 56]] [1, 256, 56, 56] 0
Conv2D-9 [[1, 256, 56, 56]] [1, 64, 56, 56] 16,384
BatchNorm2D-9 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
ReLU-7 [[1, 256, 56, 56]] [1, 256, 56, 56] 0
Conv2D-10 [[1, 64, 56, 56]] [1, 64, 56, 56] 36,864
BatchNorm2D-10 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
ReLU-5 [[1, 64, 56, 56]] [1, 64, 56, 56] 0
Conv2D-11 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096
BatchNorm2D-11 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
Conv2D-12 [[1, 128, 56, 56]] [1, 32, 56, 56] 4,096
BatchNorm2D-12 [[1, 32, 56, 56]] [1, 32, 56, 56] 128
ReLU-6 [[1, 32, 56, 56]] [1, 32, 56, 56] 0
Conv2D-13 [[1, 32, 56, 56]] [1, 576, 56, 56] 19,008
CoTNetLayer-2 [[1, 64, 56, 56]] [1, 64, 56, 56] 0
BatchNorm2D-13 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
Conv2D-14 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384
BatchNorm2D-14 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024
Bottleneck-2 [[1, 256, 56, 56]] [1, 256, 56, 56] 0
Conv2D-15 [[1, 256, 56, 56]] [1, 64, 56, 56] 16,384
BatchNorm2D-15 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
ReLU-10 [[1, 256, 56, 56]] [1, 256, 56, 56] 0
Conv2D-16 [[1, 64, 56, 56]] [1, 64, 56, 56] 36,864
BatchNorm2D-16 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
ReLU-8 [[1, 64, 56, 56]] [1, 64, 56, 56] 0
Conv2D-17 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096
BatchNorm2D-17 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
Conv2D-18 [[1, 128, 56, 56]] [1, 32, 56, 56] 4,096
BatchNorm2D-18 [[1, 32, 56, 56]] [1, 32, 56, 56] 128
ReLU-9 [[1, 32, 56, 56]] [1, 32, 56, 56] 0
Conv2D-19 [[1, 32, 56, 56]] [1, 576, 56, 56] 19,008
CoTNetLayer-3 [[1, 64, 56, 56]] [1, 64, 56, 56] 0
BatchNorm2D-19 [[1, 64, 56, 56]] [1, 64, 56, 56] 256
Conv2D-20 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384
BatchNorm2D-20 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024
Bottleneck-3 [[1, 256, 56, 56]] [1, 256, 56, 56] 0
Conv2D-22 [[1, 256, 56, 56]] [1, 128, 56, 56] 32,768
BatchNorm2D-22 [[1, 128, 56, 56]] [1, 128, 56, 56] 512
ReLU-13 [[1, 512, 28, 28]] [1, 512, 28, 28] 0
AvgPool2D-1 [[1, 128, 56, 56]] [1, 128, 28, 28] 0
Conv2D-23 [[1, 128, 28, 28]] [1, 128, 28, 28] 147,456
BatchNorm2D-23 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
ReLU-11 [[1, 128, 28, 28]] [1, 128, 28, 28] 0
Conv2D-24 [[1, 128, 28, 28]] [1, 128, 28, 28] 16,384
BatchNorm2D-24 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
Conv2D-25 [[1, 256, 28, 28]] [1, 64, 28, 28] 16,384
BatchNorm2D-25 [[1, 64, 28, 28]] [1, 64, 28, 28] 256
ReLU-12 [[1, 64, 28, 28]] [1, 64, 28, 28] 0
Conv2D-26 [[1, 64, 28, 28]] [1, 1152, 28, 28] 74,880
CoTNetLayer-4 [[1, 128, 28, 28]] [1, 128, 28, 28] 0
BatchNorm2D-26 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
Conv2D-27 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536
BatchNorm2D-27 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048
Conv2D-21 [[1, 256, 56, 56]] [1, 512, 28, 28] 131,072
BatchNorm2D-21 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048
Bottleneck-4 [[1, 256, 56, 56]] [1, 512, 28, 28] 0
Conv2D-28 [[1, 512, 28, 28]] [1, 128, 28, 28] 65,536
BatchNorm2D-28 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
ReLU-16 [[1, 512, 28, 28]] [1, 512, 28, 28] 0
Conv2D-29 [[1, 128, 28, 28]] [1, 128, 28, 28] 147,456
BatchNorm2D-29 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
ReLU-14 [[1, 128, 28, 28]] [1, 128, 28, 28] 0
Conv2D-30 [[1, 128, 28, 28]] [1, 128, 28, 28] 16,384
BatchNorm2D-30 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
Conv2D-31 [[1, 256, 28, 28]] [1, 64, 28, 28] 16,384
BatchNorm2D-31 [[1, 64, 28, 28]] [1, 64, 28, 28] 256
ReLU-15 [[1, 64, 28, 28]] [1, 64, 28, 28] 0
Conv2D-32 [[1, 64, 28, 28]] [1, 1152, 28, 28] 74,880
CoTNetLayer-5 [[1, 128, 28, 28]] [1, 128, 28, 28] 0
BatchNorm2D-32 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
Conv2D-33 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536
BatchNorm2D-33 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048
Bottleneck-5 [[1, 512, 28, 28]] [1, 512, 28, 28] 0
Conv2D-34 [[1, 512, 28, 28]] [1, 128, 28, 28] 65,536
BatchNorm2D-34 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
ReLU-19 [[1, 512, 28, 28]] [1, 512, 28, 28] 0
Conv2D-35 [[1, 128, 28, 28]] [1, 128, 28, 28] 147,456
BatchNorm2D-35 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
ReLU-17 [[1, 128, 28, 28]] [1, 128, 28, 28] 0
Conv2D-36 [[1, 128, 28, 28]] [1, 128, 28, 28] 16,384
BatchNorm2D-36 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
Conv2D-37 [[1, 256, 28, 28]] [1, 64, 28, 28] 16,384
BatchNorm2D-37 [[1, 64, 28, 28]] [1, 64, 28, 28] 256
ReLU-18 [[1, 64, 28, 28]] [1, 64, 28, 28] 0
Conv2D-38 [[1, 64, 28, 28]] [1, 1152, 28, 28] 74,880
CoTNetLayer-6 [[1, 128, 28, 28]] [1, 128, 28, 28] 0
BatchNorm2D-38 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
Conv2D-39 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536
BatchNorm2D-39 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048
Bottleneck-6 [[1, 512, 28, 28]] [1, 512, 28, 28] 0
Conv2D-40 [[1, 512, 28, 28]] [1, 128, 28, 28] 65,536
BatchNorm2D-40 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
ReLU-22 [[1, 512, 28, 28]] [1, 512, 28, 28] 0
Conv2D-41 [[1, 128, 28, 28]] [1, 128, 28, 28] 147,456
BatchNorm2D-41 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
ReLU-20 [[1, 128, 28, 28]] [1, 128, 28, 28] 0
Conv2D-42 [[1, 128, 28, 28]] [1, 128, 28, 28] 16,384
BatchNorm2D-42 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
Conv2D-43 [[1, 256, 28, 28]] [1, 64, 28, 28] 16,384
BatchNorm2D-43 [[1, 64, 28, 28]] [1, 64, 28, 28] 256
ReLU-21 [[1, 64, 28, 28]] [1, 64, 28, 28] 0
Conv2D-44 [[1, 64, 28, 28]] [1, 1152, 28, 28] 74,880
CoTNetLayer-7 [[1, 128, 28, 28]] [1, 128, 28, 28] 0
BatchNorm2D-44 [[1, 128, 28, 28]] [1, 128, 28, 28] 512
Conv2D-45 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536
BatchNorm2D-45 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048
Bottleneck-7 [[1, 512, 28, 28]] [1, 512, 28, 28] 0
Conv2D-47 [[1, 512, 28, 28]] [1, 256, 28, 28] 131,072
BatchNorm2D-47 [[1, 256, 28, 28]] [1, 256, 28, 28] 1,024
ReLU-25 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
AvgPool2D-2 [[1, 256, 28, 28]] [1, 256, 14, 14] 0
Conv2D-48 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824
BatchNorm2D-48 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-23 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
Conv2D-49 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536
BatchNorm2D-49 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-50 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536
BatchNorm2D-50 [[1, 128, 14, 14]] [1, 128, 14, 14] 512
ReLU-24 [[1, 128, 14, 14]] [1, 128, 14, 14] 0
Conv2D-51 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216
CoTNetLayer-8 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
BatchNorm2D-51 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-52 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144
BatchNorm2D-52 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096
Conv2D-46 [[1, 512, 28, 28]] [1, 1024, 14, 14] 524,288
BatchNorm2D-46 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096
Bottleneck-8 [[1, 512, 28, 28]] [1, 1024, 14, 14] 0
Conv2D-53 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144
BatchNorm2D-53 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-28 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-54 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824
BatchNorm2D-54 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-26 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
Conv2D-55 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536
BatchNorm2D-55 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-56 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536
BatchNorm2D-56 [[1, 128, 14, 14]] [1, 128, 14, 14] 512
ReLU-27 [[1, 128, 14, 14]] [1, 128, 14, 14] 0
Conv2D-57 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216
CoTNetLayer-9 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
BatchNorm2D-57 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-58 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144
BatchNorm2D-58 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096
Bottleneck-9 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-59 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144
BatchNorm2D-59 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-31 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-60 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824
BatchNorm2D-60 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-29 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
Conv2D-61 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536
BatchNorm2D-61 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-62 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536
BatchNorm2D-62 [[1, 128, 14, 14]] [1, 128, 14, 14] 512
ReLU-30 [[1, 128, 14, 14]] [1, 128, 14, 14] 0
Conv2D-63 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216
CoTNetLayer-10 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
BatchNorm2D-63 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-64 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144
BatchNorm2D-64 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096
Bottleneck-10 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-65 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144
BatchNorm2D-65 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-34 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-66 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824
BatchNorm2D-66 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-32 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
Conv2D-67 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536
BatchNorm2D-67 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-68 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536
BatchNorm2D-68 [[1, 128, 14, 14]] [1, 128, 14, 14] 512
ReLU-33 [[1, 128, 14, 14]] [1, 128, 14, 14] 0
Conv2D-69 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216
CoTNetLayer-11 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
BatchNorm2D-69 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-70 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144
BatchNorm2D-70 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096
Bottleneck-11 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-71 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144
BatchNorm2D-71 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-37 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-72 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824
BatchNorm2D-72 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-35 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
Conv2D-73 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536
BatchNorm2D-73 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-74 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536
BatchNorm2D-74 [[1, 128, 14, 14]] [1, 128, 14, 14] 512
ReLU-36 [[1, 128, 14, 14]] [1, 128, 14, 14] 0
Conv2D-75 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216
CoTNetLayer-12 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
BatchNorm2D-75 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-76 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144
BatchNorm2D-76 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096
Bottleneck-12 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-77 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144
BatchNorm2D-77 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-40 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-78 [[1, 256, 14, 14]] [1, 256, 14, 14] 589,824
BatchNorm2D-78 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
ReLU-38 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
Conv2D-79 [[1, 256, 14, 14]] [1, 256, 14, 14] 65,536
BatchNorm2D-79 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-80 [[1, 512, 14, 14]] [1, 128, 14, 14] 65,536
BatchNorm2D-80 [[1, 128, 14, 14]] [1, 128, 14, 14] 512
ReLU-39 [[1, 128, 14, 14]] [1, 128, 14, 14] 0
Conv2D-81 [[1, 128, 14, 14]] [1, 2304, 14, 14] 297,216
CoTNetLayer-13 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
BatchNorm2D-81 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024
Conv2D-82 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144
BatchNorm2D-82 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096
Bottleneck-13 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0
Conv2D-84 [[1, 1024, 14, 14]] [1, 512, 14, 14] 524,288
BatchNorm2D-84 [[1, 512, 14, 14]] [1, 512, 14, 14] 2,048
ReLU-43 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0
AvgPool2D-3 [[1, 512, 14, 14]] [1, 512, 7, 7] 0
Conv2D-85 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,359,296
BatchNorm2D-85 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
ReLU-41 [[1, 512, 7, 7]] [1, 512, 7, 7] 0
Conv2D-86 [[1, 512, 7, 7]] [1, 512, 7, 7] 262,144
BatchNorm2D-86 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
Conv2D-87 [[1, 1024, 7, 7]] [1, 256, 7, 7] 262,144
BatchNorm2D-87 [[1, 256, 7, 7]] [1, 256, 7, 7] 1,024
ReLU-42 [[1, 256, 7, 7]] [1, 256, 7, 7] 0
Conv2D-88 [[1, 256, 7, 7]] [1, 4608, 7, 7] 1,184,256
CoTNetLayer-14 [[1, 512, 7, 7]] [1, 512, 7, 7] 0
BatchNorm2D-88 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
Conv2D-89 [[1, 512, 7, 7]] [1, 2048, 7, 7] 1,048,576
BatchNorm2D-89 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192
Conv2D-83 [[1, 1024, 14, 14]] [1, 2048, 7, 7] 2,097,152
BatchNorm2D-83 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192
Bottleneck-14 [[1, 1024, 14, 14]] [1, 2048, 7, 7] 0
Conv2D-90 [[1, 2048, 7, 7]] [1, 512, 7, 7] 1,048,576
BatchNorm2D-90 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
ReLU-46 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0
Conv2D-91 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,359,296
BatchNorm2D-91 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
ReLU-44 [[1, 512, 7, 7]] [1, 512, 7, 7] 0
Conv2D-92 [[1, 512, 7, 7]] [1, 512, 7, 7] 262,144
BatchNorm2D-92 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
Conv2D-93 [[1, 1024, 7, 7]] [1, 256, 7, 7] 262,144
BatchNorm2D-93 [[1, 256, 7, 7]] [1, 256, 7, 7] 1,024
ReLU-45 [[1, 256, 7, 7]] [1, 256, 7, 7] 0
Conv2D-94 [[1, 256, 7, 7]] [1, 4608, 7, 7] 1,184,256
CoTNetLayer-15 [[1, 512, 7, 7]] [1, 512, 7, 7] 0
BatchNorm2D-94 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
Conv2D-95 [[1, 512, 7, 7]] [1, 2048, 7, 7] 1,048,576
BatchNorm2D-95 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192
Bottleneck-15 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0
Conv2D-96 [[1, 2048, 7, 7]] [1, 512, 7, 7] 1,048,576
BatchNorm2D-96 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
ReLU-49 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0
Conv2D-97 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,359,296
BatchNorm2D-97 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
ReLU-47 [[1, 512, 7, 7]] [1, 512, 7, 7] 0
Conv2D-98 [[1, 512, 7, 7]] [1, 512, 7, 7] 262,144
BatchNorm2D-98 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
Conv2D-99 [[1, 1024, 7, 7]] [1, 256, 7, 7] 262,144
BatchNorm2D-99 [[1, 256, 7, 7]] [1, 256, 7, 7] 1,024
ReLU-48 [[1, 256, 7, 7]] [1, 256, 7, 7] 0
Conv2D-100 [[1, 256, 7, 7]] [1, 4608, 7, 7] 1,184,256
CoTNetLayer-16 [[1, 512, 7, 7]] [1, 512, 7, 7] 0
BatchNorm2D-100 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048
Conv2D-101 [[1, 512, 7, 7]] [1, 2048, 7, 7] 1,048,576
BatchNorm2D-101 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192
Bottleneck-16 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0
AvgPool2D-4 [[1, 2048, 7, 7]] [1, 2048, 1, 1] 0
Linear-1 [[1, 2048]] [1, 1000] 2,049,000
===========================================================================
Total params: 33,855,464
Trainable params: 33,711,464
Non-trainable params: 144,000
---------------------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 426.00
Params size (MB): 129.15
Estimated Total Size (MB): 555.72
---------------------------------------------------------------------------