当前位置: 首页 > article >正文

OVD开放词汇检测 Detic 训练COCO数据集实践

0、引言

纯视觉检测当前研究基本比较饱和,继续创新提升空间很小,除非在CNN和transformer上提出更强基础建模方式。和文本结合是当前的一大趋势,也是计算机视觉和自然语言处理结合的未来趋势,目前和文本结合的目标检测工作还是有很大的研究空间的,毕竟两种不同模态的数据结合如何达到1+1>2的效果还是值得探索的,目前的结合方式效果也有限,实际还没有达到 >2 的预期效果。

visual grounding,VG,也叫视觉定位,当前有一些较成熟算法使文本和图像可以有效结合,而且文本数据处理比较方便,数据集不复杂,较容易上手,但是局限于提示一句话检测一个目标,缺乏目标检测对目标的通用检测功能。

open vocabulary object detection,OVD,也叫开放词汇目标检测,也是一种和文本结合的目标检测概念,和VG出现的时间都差不多,但是目前OVD的研究还不普遍,究其原因,其中数据集的复杂性是很多人却步的一点,其次网络结构较复杂,新人上手不太友好,所以发布的代码论文也不多。但是它有一个开放检测的优势,这是结合文本所产生的独特优势,并且可以通用检测,具有一定的实用性,可以说是比较理想的结合方式。

这里以Detic网络为例,来梳理如何跑起来一个OVD算法,可以进一步迁移到自制数据集中。

官网GitHub代码:https://github.com/facebookresearch/Detic/tree/main

具体的安装和数据集准备可以结合官网教程进行,主要介绍一些个人觉得官网不太详细的地方,其中我考虑了离线运行的方式,具体是数据集准备和代码框架部分。

1、代码框架安装:

除了直接下载官网代码外,还有子模块安装CenterNet2 Deformable DETR 官网是用git的方式和子模块一并安装,离线的方式就是直接从两个子模块官网下载代码,放到指定路径就行:

下载下来把main去了,改成如图文件夹名字即可,其中deformable DETR还有模块要编译:

进到图示文件夹,执行命令:

python setup.py build install

或者是(LINUX系统可以用下面命令):

cd third_party/Deformable-DETR/models/ops
./make.sh

其次就是预训练模型下载,涉及到 R-50.pklViT-B-32.pt等,直接搜索下载即可

2、数据集准备

个人觉得最恶心的就是数据集部分,挺复杂的。这里只对COCO数据集做说明,其他的可类比,整体流程可以按官网来,如下是定义的数据集:

数据集文件夹结构如下,metadata下载自带,都放于datasets文件夹下:

$Detic_ROOT/datasets/metadata/coco/

①下载,首先把下面的都给下了,官网coco2017数据集

coco/train2017/val2017/annotations/captions_train2017.jsoninstances_train2017.json instances_val2017.json

②类别划分,OVD的一大特点就是对训练集中没有的类别的进行检测,所以要把原有数据集进行划分,一部分类别用于训练,一部分类别用于验证,这个是OVR-CNN中的做法,Detic采用了该方式,后续很多研究也采用该方式,如下是我从OVR-CNN中获取的代码,这个Detic没有直接提供

"""
Prepare zero-shot split for COCO dataset.
Based on the paper: Bansal, Ankan, et al. "Zero-shot object detection." 
Proceedings of the European Conference on Computer Vision (ECCV). 2018.
"""import json
import numpy as np
import torch
from maskrcnn_benchmark.config import cfg
from maskrcnn_benchmark.modeling.language_backbone.transformers import BERTdef load_coco_annotations():"""Load COCO annotations for training and validation sets."""with open('../datasets/coco/annotations/instances_train2017.json', 'r') as fin:coco_train_anno_all = json.load(fin)with open('../datasets/coco/annotations/instances_train2017.json', 'r') as fin:coco_train_anno_seen = json.load(fin)with open('../datasets/coco/annotations/instances_train2017.json', 'r') as fin:coco_train_anno_unseen = json.load(fin)with open('../datasets/coco/annotations/instances_val2017.json', 'r') as fin:coco_val_anno_all = json.load(fin)with open('../datasets/coco/annotations/instances_val2017.json', 'r') as fin:coco_val_anno_seen = json.load(fin)with open('../datasets/coco/annotations/instances_val2017.json', 'r') as fin:coco_val_anno_unseen = json.load(fin)return (coco_train_anno_all, coco_train_anno_seen, coco_train_anno_unseen,coco_val_anno_all, coco_val_anno_seen, coco_val_anno_unseen)def load_class_splits():"""Load seen and unseen class splits."""with open('../datasets/coco/zero-shot/mscoco_seen_classes.json', 'r') as fin:labels_seen = json.load(fin)with open('../datasets/coco/zero-shot/mscoco_unseen_classes.json', 'r') as fin:labels_unseen = json.load(fin)return labels_seen, labels_unseendef create_class_mappings(coco_val_anno_all, labels_seen, labels_unseen):"""Create mappings between class IDs and splits."""class_id_to_split = {}class_name_to_split = {}for item in coco_val_anno_all['categories']:if item['name'] in labels_seen:class_id_to_split[item['id']] = 'seen'class_name_to_split[item['name']] = 'seen'elif item['name'] in labels_unseen:class_id_to_split[item['id']] = 'unseen'class_name_to_split[item['name']] = 'unseen'return class_id_to_split, class_name_to_splitdef load_glove_embeddings(class_name_to_split):"""Load GloVe embeddings for classes."""class_name_to_glove = {}with open('../datasets/coco/zero-shot/glove.6B.300d.txt', 'r', encoding='utf-8') as fin:for row in fin:row_tk = row.split()if row_tk[0] in class_name_to_split:class_name_to_glove[row_tk[0]] = [float(num) for num in row_tk[1:]]return class_name_to_glovedef get_bert_embeddings(class_name_to_split):"""Get BERT embeddings for classes."""bert = BERT(cfg)bert = bert.to('cuda')class_name_to_bertemb = {}for c in class_name_to_split:if c not in bert.tokenizer.vocab:print(f'{c} not found')continuecid = bert.tokenizer.vocab[c]class_name_to_bertemb[c] = bert.embeddings[cid]class_list = list(class_name_to_split.keys())encoded_class_list = bert(class_list)mask = (1 - encoded_class_list['special_tokens_mask']).to(torch.float32)embeddings = (encoded_class_list['input_embeddings'] * mask[:, :, None]).sum(1) / mask.sum(1)[:, None]embeddings = embeddings.cpu().numpy()class_name_to_bertemb = {}for c, emb in zip(class_list, embeddings.tolist()):class_name_to_bertemb[c] = embreturn class_name_to_bertembdef filter_annotation(anno_dict, split_name_list, class_id_to_split, class_name_to_glove, class_name_to_bertemb):"""Filter annotations based on split names."""filtered_categories = []for item in anno_dict['categories']:if class_id_to_split.get(item['id']) in split_name_list:item['embedding'] = {}item['embedding']['GloVE'] = class_name_to_glove[item['name']]item['embedding']['BertEmb'] = class_name_to_bertemb[item['name']]item['split'] = class_id_to_split.get(item['id'])filtered_categories.append(item)anno_dict['categories'] = filtered_categoriesfiltered_images = []filtered_annotations = []useful_image_ids = set()for item in anno_dict['annotations']:if class_id_to_split.get(item['category_id']) in split_name_list:filtered_annotations.append(item)useful_image_ids.add(item['image_id'])for item in anno_dict['images']:if item['id'] in useful_image_ids:filtered_images.append(item)anno_dict['annotations'] = filtered_annotationsanno_dict['images'] = filtered_imagesdef save_filtered_annotations(coco_train_anno_seen, coco_train_anno_unseen, coco_train_anno_all,coco_val_anno_seen, coco_val_anno_unseen, coco_val_anno_all):"""Save filtered annotations to JSON files."""with open('../datasets/coco/zero-shot/instances_train2017_seen_2.json', 'w') as fout:json.dump(coco_train_anno_seen, fout)with open('../datasets/coco/zero-shot/instances_train2017_unseen_2.json', 'w') as fout:json.dump(coco_train_anno_unseen, fout)with open('../datasets/coco/zero-shot/instances_train2017_all_2.json', 'w') as fout:json.dump(coco_train_anno_all, fout)with open('../datasets/coco/zero-shot/instances_val2017_seen_2.json', 'w') as fout:json.dump(coco_val_anno_seen, fout)with open('../datasets/coco/zero-shot/instances_val2017_unseen_2.json', 'w') as fout:json.dump(coco_val_anno_unseen, fout)with open('../datasets/coco/zero-shot/instances_val2017_all_2.json', 'w') as fout:json.dump(coco_val_anno_all, fout)def main():# Load annotations(coco_train_anno_all, coco_train_anno_seen, coco_train_anno_unseen,coco_val_anno_all, coco_val_anno_seen, coco_val_anno_unseen) = load_coco_annotations()# Load class splitslabels_seen, labels_unseen = load_class_splits()# Create class mappingsclass_id_to_split, class_name_to_split = create_class_mappings(coco_val_anno_all, labels_seen, labels_unseen)# Load GloVe embeddingsclass_name_to_glove = load_glove_embeddings(class_name_to_split)# Get BERT embeddingsclass_name_to_bertemb = get_bert_embeddings(class_name_to_split)# Filter annotationsfilter_annotation(coco_train_anno_seen, ['seen'], class_id_to_split, class_name_to_glove, class_name_to_bertemb)filter_annotation(coco_train_anno_unseen, ['unseen'], class_id_to_split, class_name_to_glove, class_name_to_bertemb)filter_annotation(coco_train_anno_all, ['seen', 'unseen'], class_id_to_split, class_name_to_glove, class_name_to_bertemb)filter_annotation(coco_val_anno_seen, ['seen'], class_id_to_split, class_name_to_glove, class_name_to_bertemb)filter_annotation(coco_val_anno_unseen, ['unseen'], class_id_to_split, class_name_to_glove, class_name_to_bertemb)filter_annotation(coco_val_anno_all, ['seen', 'unseen'], class_id_to_split, class_name_to_glove, class_name_to_bertemb)# Save filtered annotationssave_filtered_annotations(coco_train_anno_seen, coco_train_anno_unseen, coco_train_anno_all,coco_val_anno_seen, coco_val_anno_unseen, coco_val_anno_all)if __name__ == '__main__':main()

把这个代码放到tools文件夹,直接python tools/xxx.py执行即可,

但是要提前准备好 glove.6B.300d.txt,mscoco_seen_classes.json,mscoco_unseen_classes.json几个文件,后面两个直接去GitHub的OVR-CNN官网去下载,glove.6B.300d.txt直接搜索自行下载,统一放到 coco/zero-shot文件夹下,执行后会生成6个新的文件:

剩下的数据集操作就和Detic官网一致了,

python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_train2017_seen_2.json
python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_val2017_all_2.json

执行如上两条命令得到如下绿框的两个标签文件然后是上图最后一个标签文件

python tools/get_lvis_cat_info.py --ann datasets/coco/zero-shot/instances_train2017_seen_2_oriorder.py

如上就把coco/zero-shot所需要的标签都生成好了

③coco文本描述标签生成

python tools/get_cc_tags.py --cc_ann datasets/coco/annotations/captions_train2017.json --out_path datasets/coco/captions_train2017_tags_allcaps.json --allcaps --convert_caption --cat_path datasets/coco/annotations/instances_val2017.json

这会生成一个captions_train2017_tags_allcaps.json文件,需要放入coco/annotations文件夹下,至此,coco支持OVD训练的数据就都准备好了

④其他数据说明。从配置文件中可以看到,会使用metadata下面的 datasets/metadata/coco_clip_a+cname.npy 

coco_clip_a+cname.npy 文件是通过 tools/dump_clip_features.py 脚本生成的。生成过程如下:

脚本功能:

  • 读取数据集的类别信息(从 JSON 文件)
  • 使用 CLIP 模型生成每个类别的文本嵌入向量
  • 将生成的嵌入向量保存为 numpy 文件

命令如下:

python tools/dump_clip_features.py \--ann datasets/coco/annotations/instances_val2017.json \--out_path datasets/metadata/coco_clip_a+cname.npy

3、进行训练

单卡就可以训练,把batch size调小一点。根据提供的模型来看,是先进行有监督训练,如下红框的配置文件,在有监督的基础上得到预训练模型用来训绿框的配置,以此得到具有OVD功能的模型

单卡训练代码:

python train_net.py --num-gpus 1 --config-file .\configs\BoxSup_OVCOCO_CLIP_R50_1x.yaml

训练界面:

Command Line Args: Namespace(config_file='.\\configs\\BoxSup_OVCOCO_CLIP_R50_1x.yaml', dist_url='tcp://127.0.0.1:14167', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[06/01 18:32:26 detectron2]: Rank of current process: 0. World size: 1
[06/01 18:32:27 detectron2]: Environment info:
----------------------  -------------------------------------------------------------------------------------------------
sys.platform            win32
Python                  3.8.18 (default, Sep 11 2023, 13:39:12) [MSC v.1916 64 bit (AMD64)]
numpy                   1.23.2
detectron2              0.6 @f:\llava_grounding_main\detectron2\detectron2
Compiler                MSVC 193732825
CUDA compiler           CUDA 12.1
detectron2 arch flags   f:\llava_grounding_main\detectron2\detectron2\_C.cp38-win_amd64.pyd; cannot find cuobjdump
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 2.4.0+cu121 @C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   NVIDIA GeForce RTX 3060 (arch=8.6)
Driver version          560.94
CUDA_HOME               C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
Pillow                  9.5.0
torchvision             0.19.0+cu121 @C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torchvision
torchvision arch flags  C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torchvision\_C.pyd; cannot find cuobjdump
fvcore                  0.1.5.post20221221
iopath                  0.1.7
cv2                     4.8.1
----------------------  -------------------------------------------------------------------------------------------------
PyTorch built with:- C++ Version: 201703- MSVC 192930154- Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications- Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)- OpenMP 2019- LAPACK is enabled (usually provided by MKL)- CPU capability usage: AVX2- CUDA Runtime 12.1- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90- CuDNN 90.1  (built against CUDA 12.4)- Magma 2.5.4- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=9.1.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,[06/01 18:32:27 detectron2]: Command line arguments: Namespace(config_file='.\\configs\\BoxSup_OVCOCO_CLIP_R50_1x.yaml', dist_url='tcp://127.0.0.1:14167', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[06/01 18:32:27 detectron2]: Contents of args.config_file=.\configs\BoxSup_OVCOCO_CLIP_R50_1x.yaml:
_BASE_: "Base_OVCOCO_C4_1x.yaml"[06/01 18:32:27 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:ASPECT_RATIO_GROUPING: trueDATASET_ANN:- box- boxDATASET_BS:- 8- 32DATASET_INPUT_SCALE:- &id001- 0.1- 2.0- - 0.5- 1.5DATASET_INPUT_SIZE:- 896- 384DATASET_MAX_SIZES:- 1333- 667DATASET_MIN_SIZES:- - 640- 800- - 320- 400DATASET_RATIO:- 1- 1FILTER_EMPTY_ANNOTATIONS: trueMULTI_DATASET_GROUPING: falseNUM_WORKERS: 4REPEAT_THRESHOLD: 0.0SAMPLER_TRAIN: TrainingSamplerTARFILE_PATH: datasets/imagenet/metadata-22k/tar_files.npyTAR_INDEX_DIR: datasets/imagenet/metadata-22k/tarindex_npyUSE_DIFF_BS_SIZE: falseUSE_RFS:- false- falseUSE_TAR_DATASET: false
DATASETS:PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000PROPOSAL_FILES_TEST: []PROPOSAL_FILES_TRAIN: []TEST:- coco_generalized_zeroshot_valTRAIN:- coco_zeroshot_train_oriorder
DEBUG: false
DEBUG_SHOW_NAME: false
EVAL_AP_FIX: false
EVAL_CAT_SPEC_AR: false
EVAL_PRED_AR: false
EVAL_PROPOSAL_AR: false
FIND_UNUSED_PARAM: true
FP16: true
GEN_PSEDO_LABELS: false
GLOBAL:HACK: 1.0
INPUT:CROP:ENABLED: falseSIZE:- 0.9- 0.9TYPE: relative_rangeCUSTOM_AUG: ''FORMAT: BGRMASK_FORMAT: polygonMAX_SIZE_TEST: 1333MAX_SIZE_TRAIN: 1333MIN_SIZE_TEST: 800MIN_SIZE_TRAIN:- 800MIN_SIZE_TRAIN_SAMPLING: choiceNOT_CLAMP_BOX: falseRANDOM_FLIP: horizontalSCALE_RANGE: *id001TEST_INPUT_TYPE: defaultTEST_SIZE: 640TRAIN_SIZE: 640
IS_DEBUG: false
MODEL:ANCHOR_GENERATOR:ANGLES:- - -90- 0- 90ASPECT_RATIOS:- - 0.5- 1.0- 2.0NAME: DefaultAnchorGeneratorOFFSET: 0.0SIZES:- - 32- 64- 128- 256- 512BACKBONE:FREEZE_AT: 2NAME: build_resnet_backboneBIFPN:NORM: GNNUM_BIFPN: 6NUM_LEVELS: 5OUT_CHANNELS: 160SEPARABLE_CONV: falseCAP_BATCH_RATIO: 4CENTERNET:AS_PROPOSAL: falseCENTER_NMS: falseFPN_STRIDES:- 8- 16- 32- 64- 128HM_FOCAL_ALPHA: 0.25HM_FOCAL_BETA: 4HM_MIN_OVERLAP: 0.8IGNORE_HIGH_FP: -1.0INFERENCE_TH: 0.05IN_FEATURES:- p3- p4- p5- p6- p7LOC_LOSS_TYPE: giouLOSS_GAMMA: 2.0MIN_RADIUS: 4MORE_POS: falseMORE_POS_THRESH: 0.2MORE_POS_TOPK: 9NEG_WEIGHT: 1.0NMS_TH_TEST: 0.6NMS_TH_TRAIN: 0.6NORM: GNNOT_NMS: falseNOT_NORM_REG: trueNO_REDUCE: falseNUM_BOX_CONVS: 4NUM_CLASSES: 80NUM_CLS_CONVS: 4NUM_SHARE_CONVS: 0ONLY_PROPOSAL: falsePOST_NMS_TOPK_TEST: 100POST_NMS_TOPK_TRAIN: 100POS_WEIGHT: 1.0PRE_NMS_TOPK_TEST: 1000PRE_NMS_TOPK_TRAIN: 1000PRIOR_PROB: 0.01REG_WEIGHT: 2.0SIGMOID_CLAMP: 0.0001SOI:- - 0- 80- - 64- 160- - 128- 320- - 256- 640- - 512- 10000000USE_DEFORMABLE: falseWITH_AGN_HM: falseDATASET_LOSS_WEIGHT: []DETR:CLS_WEIGHT: 2.0DEC_LAYERS: 6DEEP_SUPERVISION: trueDIM_FEEDFORWARD: 2048DROPOUT: 0.1ENC_LAYERS: 6FOCAL_ALPHA: 0.25FROZEN_WEIGHTS: ''GIOU_WEIGHT: 2.0HIDDEN_DIM: 256L1_WEIGHT: 5.0NHEADS: 8NO_OBJECT_WEIGHT: 0.1NUM_CLASSES: 80NUM_FEATURE_LEVELS: 4NUM_OBJECT_QUERIES: 100PRE_NORM: falseTWO_STAGE: falseUSE_FED_LOSS: falseWEAK_WEIGHT: 0.1WITH_BOX_REFINE: falseDEVICE: cudaDLA:DLAUP_IN_FEATURES:- dla3- dla4- dla5DLAUP_NODE: convMS_OUTPUT: falseNORM: BNNUM_LAYERS: 34OUT_FEATURES:- dla2USE_DLA_UP: trueDYNAMIC_CLASSIFIER: falseFPN:FUSE_TYPE: sumIN_FEATURES: []NORM: ''OUT_CHANNELS: 256KEYPOINT_ON: falseLOAD_PROPOSALS: falseMASK_ON: falseMETA_ARCHITECTURE: CustomRCNNNUM_SAMPLE_CATS: 50PANOPTIC_FPN:COMBINE:ENABLED: trueINSTANCES_CONFIDENCE_THRESH: 0.5OVERLAP_THRESH: 0.5STUFF_AREA_LIMIT: 4096INSTANCE_LOSS_WEIGHT: 1.0PIXEL_MEAN:- 103.53- 116.28- 123.675PIXEL_STD:- 1.0- 1.0- 1.0PROPOSAL_GENERATOR:MIN_SIZE: 0NAME: RPNRESET_CLS_TESTS: falseRESNETS:DEFORM_MODULATED: falseDEFORM_NUM_GROUPS: 1DEFORM_ON_PER_STAGE:- false- false- false- falseDEPTH: 50NORM: FrozenBNNUM_GROUPS: 1OUT_FEATURES:- res4RES2_OUT_CHANNELS: 256RES5_DILATION: 1STEM_OUT_CHANNELS: 64STRIDE_IN_1X1: trueWIDTH_PER_GROUP: 64RETINANET:BBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_WEIGHTS: &id003- 1.0- 1.0- 1.0- 1.0FOCAL_LOSS_ALPHA: 0.25FOCAL_LOSS_GAMMA: 2.0IN_FEATURES:- p3- p4- p5- p6- p7IOU_LABELS:- 0- -1- 1IOU_THRESHOLDS:- 0.4- 0.5NMS_THRESH_TEST: 0.5NORM: ''NUM_CLASSES: 80NUM_CONVS: 4PRIOR_PROB: 0.01SCORE_THRESH_TEST: 0.05SMOOTH_L1_LOSS_BETA: 0.1TOPK_CANDIDATES_TEST: 1000ROI_BOX_CASCADE_HEAD:BBOX_REG_WEIGHTS:- &id002- 10.0- 10.0- 5.0- 5.0- - 20.0- 20.0- 10.0- 10.0- - 30.0- 30.0- 15.0- 15.0IOUS:- 0.5- 0.6- 0.7ROI_BOX_HEAD:ADD_FEATURE_TO_PROP: falseADD_IMAGE_BOX: falseBBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_LOSS_WEIGHT: 1.0BBOX_REG_WEIGHTS: *id002CAPTION_WEIGHT: 1.0CAT_FREQ_PATH: datasets/coco/zero-shot/instances_train2017_seen_2_oriorder_cat_info.jsonCLS_AGNOSTIC_BBOX_REG: trueCONV_DIM: 256EQL_FREQ_CAT: 200FC_DIM: 1024FED_LOSS_FREQ_WEIGHT: 0.5FED_LOSS_NUM_CAT: 50IGNORE_ZERO_CATS: trueIMAGE_BOX_SIZE: 1.0IMAGE_LABEL_LOSS: max_sizeIMAGE_LOSS_WEIGHT: 0.1MULT_PROPOSAL_SCORE: falseNAME: ''NEG_CAP_WEIGHT: 0.125NORM: ''NORM_TEMP: 50.0NORM_WEIGHT: trueNUM_CONV: 0NUM_FC: 0POOLER_RESOLUTION: 14POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2PRIOR_PROB: 0.01SMOOTH_L1_BETA: 0.0SOFTMAX_WEAK_LOSS: falseTRAIN_ON_PRED_BOXES: falseUSE_BIAS: 0.0USE_EQL_LOSS: falseUSE_FED_LOSS: falseUSE_SIGMOID_CE: trueUSE_ZEROSHOT_CLS: trueWITH_SOFTMAX_PROP: falseWS_NUM_PROPS: 128ZEROSHOT_WEIGHT_DIM: 512ZEROSHOT_WEIGHT_PATH: datasets/metadata/coco_clip_a+cname.npyROI_HEADS:BATCH_SIZE_PER_IMAGE: 512IN_FEATURES:- res4IOU_LABELS:- 0- 1IOU_THRESHOLDS:- 0.5MASK_WEIGHT: 1.0NAME: CustomRes5ROIHeadsNMS_THRESH_TEST: 0.5NUM_CLASSES: 80ONE_CLASS_PER_PROPOSAL: falsePOSITIVE_FRACTION: 0.25PROPOSAL_APPEND_GT: trueSCORE_THRESH_TEST: 0.05ROI_KEYPOINT_HEAD:CONV_DIMS:- 512- 512- 512- 512- 512- 512- 512- 512LOSS_WEIGHT: 1.0MIN_KEYPOINTS_PER_IMAGE: 1NAME: KRCNNConvDeconvUpsampleHeadNORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: trueNUM_KEYPOINTS: 17POOLER_RESOLUTION: 14POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2ROI_MASK_HEAD:CLS_AGNOSTIC_MASK: falseCONV_DIM: 256NAME: MaskRCNNConvUpsampleHeadNORM: ''NUM_CONV: 0POOLER_RESOLUTION: 14POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2RPN:BATCH_SIZE_PER_IMAGE: 256BBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_LOSS_WEIGHT: 1.0BBOX_REG_WEIGHTS: *id003BOUNDARY_THRESH: -1CONV_DIMS:- -1HEAD_NAME: StandardRPNHeadIN_FEATURES:- res4IOU_LABELS:- 0- -1- 1IOU_THRESHOLDS:- 0.3- 0.7LOSS_WEIGHT: 1.0NMS_THRESH: 0.7POSITIVE_FRACTION: 0.5POST_NMS_TOPK_TEST: 1000POST_NMS_TOPK_TRAIN: 2000PRE_NMS_TOPK_TEST: 6000PRE_NMS_TOPK_TRAIN: 12000SMOOTH_L1_BETA: 0.0SEM_SEG_HEAD:COMMON_STRIDE: 4CONVS_DIM: 128IGNORE_VALUE: 255IN_FEATURES:- p2- p3- p4- p5LOSS_WEIGHT: 1.0NAME: SemSegFPNHeadNORM: GNNUM_CLASSES: 54SWIN:OUT_FEATURES:- 1- 2- 3SIZE: TUSE_CHECKPOINT: falseSYNC_CAPTION_BATCH: falseTEST_CLASSIFIERS: []TEST_NUM_CLASSES: []TIMM:BASE_NAME: resnet50FREEZE_AT: 0NORM: FrozenBNOUT_LEVELS:- 3- 4- 5PRETRAINED: falseWEIGHTS: models/R-50.pklWITH_CAPTION: false
OUTPUT_DIR: output/Detic-COCO/BoxSup_OVCOCO_CLIP_R50_1x
QUICK_DEBUG: false
SAVE_DEBUG: false
SAVE_DEBUG_PATH: output/save_debug/
SAVE_PTH: false
SEED: -1
SOLVER:AMP:ENABLED: falseBACKBONE_MULTIPLIER: 1.0BASE_LR: 0.02BASE_LR_END: 0.0BIAS_LR_FACTOR: 1.0CHECKPOINT_PERIOD: 1000000000CLIP_GRADIENTS:CLIP_TYPE: valueCLIP_VALUE: 1.0ENABLED: falseNORM_TYPE: 2.0CUSTOM_MULTIPLIER: 1.0CUSTOM_MULTIPLIER_NAME: []GAMMA: 0.1IMS_PER_BATCH: 2LR_SCHEDULER_NAME: WarmupMultiStepLRMAX_ITER: 90000MOMENTUM: 0.9NESTEROV: falseOPTIMIZER: SGDREFERENCE_WORLD_SIZE: 0RESET_ITER: falseSTEPS:- 60000- 80000TRAIN_ITER: -1USE_CUSTOM_SOLVER: falseWARMUP_FACTOR: 0.001WARMUP_ITERS: 1000WARMUP_METHOD: linearWEIGHT_DECAY: 0.0001WEIGHT_DECAY_BIAS: nullWEIGHT_DECAY_NORM: 0.0
TEST:AUG:ENABLED: falseFLIP: trueMAX_SIZE: 4000MIN_SIZES:- 400- 500- 600- 700- 800- 900- 1000- 1100- 1200DETECTIONS_PER_IMAGE: 100EVAL_PERIOD: 0EXPECTED_RESULTS: []KEYPOINT_OKS_SIGMAS: []PRECISE_BN:ENABLED: falseNUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
VIS_THRESH: 0.3
WITH_IMAGE_LABELS: false[06/01 18:32:27 detectron2]: Full config saved to output/Detic-COCO/BoxSup_OVCOCO_CLIP_R50_1x\config.yaml
[06/01 18:32:28 d2.utils.env]: Using a generated random seed 28139170
[06/01 18:32:29 detectron2]: Model:
CustomRCNN((backbone): ResNet((stem): BasicStem((conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)))(res2): Sequential((0): BottleneckBlock((shortcut): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))))(res3): Sequential((0): BottleneckBlock((shortcut): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(3): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))))(res4): Sequential((0): BottleneckBlock((shortcut): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05))(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(3): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(4): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(5): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))))(proposal_generator): RPN((rpn_head): StandardRPNHead((conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)(activation): ReLU())(objectness_logits): Conv2d(1024, 15, kernel_size=(1, 1), stride=(1, 1))(anchor_deltas): Conv2d(1024, 60, kernel_size=(1, 1), stride=(1, 1)))(anchor_generator): DefaultAnchorGenerator((cell_anchors): BufferList()))(roi_heads): CustomRes5ROIHeads((pooler): ROIPooler((level_poolers): ModuleList((0): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)))(res5): Sequential((0): BottleneckBlock((shortcut): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05))(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05))))(box_predictor): DeticFastRCNNOutputLayers((cls_score): ZeroShotClassifier((linear): Linear(in_features=2048, out_features=512, bias=True))(bbox_pred): Sequential((0): Linear(in_features=2048, out_features=2048, bias=True)(1): ReLU(inplace=True)(2): Linear(in_features=2048, out_features=4, bias=True))))
)
[06/01 18:32:29 fvcore.common.checkpoint]: [Checkpointer] Loading from models/R-50.pkl ...
[06/01 18:32:29 d2.checkpoint.c2_model_loading]: Renaming Caffe2 weights ......
[06/01 18:32:29 d2.checkpoint.c2_model_loading]: Following weights matched with model:
| Names in Model              | Names in Checkpoint      | Shapes                                          |
|:----------------------------|:-------------------------|:------------------------------------------------|
| backbone.res2.0.conv1.*     | res2_0_branch2a_{bn_*,w} | (64,) (64,) (64,) (64,) (64,64,1,1)             |
| backbone.res2.0.conv2.*     | res2_0_branch2b_{bn_*,w} | (64,) (64,) (64,) (64,) (64,64,3,3)             |
| backbone.res2.0.conv3.*     | res2_0_branch2c_{bn_*,w} | (256,) (256,) (256,) (256,) (256,64,1,1)        |
| backbone.res2.0.shortcut.*  | res2_0_branch1_{bn_*,w}  | (256,) (256,) (256,) (256,) (256,64,1,1)        |
| backbone.res2.1.conv1.*     | res2_1_branch2a_{bn_*,w} | (64,) (64,) (64,) (64,) (64,256,1,1)            |
| backbone.res2.1.conv2.*     | res2_1_branch2b_{bn_*,w} | (64,) (64,) (64,) (64,) (64,64,3,3)             |
| backbone.res2.1.conv3.*     | res2_1_branch2c_{bn_*,w} | (256,) (256,) (256,) (256,) (256,64,1,1)        |
| backbone.res2.2.conv1.*     | res2_2_branch2a_{bn_*,w} | (64,) (64,) (64,) (64,) (64,256,1,1)            |
| backbone.res2.2.conv2.*     | res2_2_branch2b_{bn_*,w} | (64,) (64,) (64,) (64,) (64,64,3,3)             |
| backbone.res2.2.conv3.*     | res2_2_branch2c_{bn_*,w} | (256,) (256,) (256,) (256,) (256,64,1,1)        |
| backbone.res3.0.conv1.*     | res3_0_branch2a_{bn_*,w} | (128,) (128,) (128,) (128,) (128,256,1,1)       |
| backbone.res3.0.conv2.*     | res3_0_branch2b_{bn_*,w} | (128,) (128,) (128,) (128,) (128,128,3,3)       |
| backbone.res3.0.conv3.*     | res3_0_branch2c_{bn_*,w} | (512,) (512,) (512,) (512,) (512,128,1,1)       |
| backbone.res3.0.shortcut.*  | res3_0_branch1_{bn_*,w}  | (512,) (512,) (512,) (512,) (512,256,1,1)       |
| backbone.res3.1.conv1.*     | res3_1_branch2a_{bn_*,w} | (128,) (128,) (128,) (128,) (128,512,1,1)       |
| backbone.res3.1.conv2.*     | res3_1_branch2b_{bn_*,w} | (128,) (128,) (128,) (128,) (128,128,3,3)       |
| backbone.res3.1.conv3.*     | res3_1_branch2c_{bn_*,w} | (512,) (512,) (512,) (512,) (512,128,1,1)       |
| backbone.res3.2.conv1.*     | res3_2_branch2a_{bn_*,w} | (128,) (128,) (128,) (128,) (128,512,1,1)       |
| backbone.res3.2.conv2.*     | res3_2_branch2b_{bn_*,w} | (128,) (128,) (128,) (128,) (128,128,3,3)       |
| backbone.res3.2.conv3.*     | res3_2_branch2c_{bn_*,w} | (512,) (512,) (512,) (512,) (512,128,1,1)       |
| backbone.res3.3.conv1.*     | res3_3_branch2a_{bn_*,w} | (128,) (128,) (128,) (128,) (128,512,1,1)       |
| backbone.res3.3.conv2.*     | res3_3_branch2b_{bn_*,w} | (128,) (128,) (128,) (128,) (128,128,3,3)       |
| backbone.res3.3.conv3.*     | res3_3_branch2c_{bn_*,w} | (512,) (512,) (512,) (512,) (512,128,1,1)       |
| backbone.res4.0.conv1.*     | res4_0_branch2a_{bn_*,w} | (256,) (256,) (256,) (256,) (256,512,1,1)       |
| backbone.res4.0.conv2.*     | res4_0_branch2b_{bn_*,w} | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| backbone.res4.0.conv3.*     | res4_0_branch2c_{bn_*,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| backbone.res4.0.shortcut.*  | res4_0_branch1_{bn_*,w}  | (1024,) (1024,) (1024,) (1024,) (1024,512,1,1)  |
| backbone.res4.1.conv1.*     | res4_1_branch2a_{bn_*,w} | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| backbone.res4.1.conv2.*     | res4_1_branch2b_{bn_*,w} | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| backbone.res4.1.conv3.*     | res4_1_branch2c_{bn_*,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| backbone.res4.2.conv1.*     | res4_2_branch2a_{bn_*,w} | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| backbone.res4.2.conv2.*     | res4_2_branch2b_{bn_*,w} | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| backbone.res4.2.conv3.*     | res4_2_branch2c_{bn_*,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| backbone.res4.3.conv1.*     | res4_3_branch2a_{bn_*,w} | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| backbone.res4.3.conv2.*     | res4_3_branch2b_{bn_*,w} | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| backbone.res4.3.conv3.*     | res4_3_branch2c_{bn_*,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| backbone.res4.4.conv1.*     | res4_4_branch2a_{bn_*,w} | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| backbone.res4.4.conv2.*     | res4_4_branch2b_{bn_*,w} | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| backbone.res4.4.conv3.*     | res4_4_branch2c_{bn_*,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| backbone.res4.5.conv1.*     | res4_5_branch2a_{bn_*,w} | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| backbone.res4.5.conv2.*     | res4_5_branch2b_{bn_*,w} | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| backbone.res4.5.conv3.*     | res4_5_branch2c_{bn_*,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| backbone.stem.conv1.norm.*  | res_conv1_bn_*           | (64,) (64,) (64,) (64,)                         |
| backbone.stem.conv1.weight  | conv1_w                  | (64, 3, 7, 7)                                   |
| roi_heads.res5.0.conv1.*    | res5_0_branch2a_{bn_*,w} | (512,) (512,) (512,) (512,) (512,1024,1,1)      |
| roi_heads.res5.0.conv2.*    | res5_0_branch2b_{bn_*,w} | (512,) (512,) (512,) (512,) (512,512,3,3)       |
| roi_heads.res5.0.conv3.*    | res5_0_branch2c_{bn_*,w} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1)  |
| roi_heads.res5.0.shortcut.* | res5_0_branch1_{bn_*,w}  | (2048,) (2048,) (2048,) (2048,) (2048,1024,1,1) |
| roi_heads.res5.1.conv1.*    | res5_1_branch2a_{bn_*,w} | (512,) (512,) (512,) (512,) (512,2048,1,1)      |
| roi_heads.res5.1.conv2.*    | res5_1_branch2b_{bn_*,w} | (512,) (512,) (512,) (512,) (512,512,3,3)       |
| roi_heads.res5.1.conv3.*    | res5_1_branch2c_{bn_*,w} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1)  |
| roi_heads.res5.2.conv1.*    | res5_2_branch2a_{bn_*,w} | (512,) (512,) (512,) (512,) (512,2048,1,1)      |
| roi_heads.res5.2.conv2.*    | res5_2_branch2b_{bn_*,w} | (512,) (512,) (512,) (512,) (512,512,3,3)       |
| roi_heads.res5.2.conv3.*    | res5_2_branch2c_{bn_*,w} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1)  |
WARNING [06/01 18:32:29 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
proposal_generator.rpn_head.anchor_deltas.{bias, weight}
proposal_generator.rpn_head.conv.{bias, weight}
proposal_generator.rpn_head.objectness_logits.{bias, weight}
roi_heads.box_predictor.bbox_pred.0.{bias, weight}
roi_heads.box_predictor.bbox_pred.2.{bias, weight}
roi_heads.box_predictor.cls_score.linear.{bias, weight}
roi_heads.box_predictor.cls_score.zs_weight
roi_heads.box_predictor.freq_weight
WARNING [06/01 18:32:29 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:fc1000.{bias, weight}stem.conv1.bias
[06/01 18:32:29 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(800,), max_size=1333, sample_style='choice'), RandomFlip()]
[06/01 18:32:45 d2.data.datasets.coco]: Loading datasets\coco/zero-shot/instances_train2017_seen_2_oriorder.json takes 16.23 seconds.
[06/01 18:32:46 d2.data.datasets.coco]: Loaded 107761 images in COCO format from datasets\coco/zero-shot/instances_train2017_seen_2_oriorder.json
[06/01 18:32:52 d2.data.build]: Removed 0 images with no usable annotations. 107761 images left.
[06/01 18:32:59 d2.data.build]: Distribution of instances among all 80 categories:
|   category    | #instances   |   category   | #instances   |   category    | #instances   |
|:-------------:|:-------------|:------------:|:-------------|:-------------:|:-------------|
|    person     | 257253       |   bicycle    | 7056         |      car      | 43533        |
|  motorcycle   | 8654         |   airplane   | 0            |      bus      | 0            |
|     train     | 4570         |    truck     | 9970         |     boat      | 10576        |
| traffic light | 0            | fire hydrant | 0            |   stop sign   | 0            |
| parking meter | 0            |    bench     | 9820         |     bird      | 10542        |
|      cat      | 0            |     dog      | 0            |     horse     | 6567         |
|     sheep     | 9223         |     cow      | 0            |   elephant    | 0            |
|     bear      | 1294         |    zebra     | 5269         |    giraffe    | 5128         |
|   backpack    | 8714         |   umbrella   | 0            |    handbag    | 12342        |
|      tie      | 0            |   suitcase   | 6112         |    frisbee    | 2681         |
|     skis      | 6623         |  snowboard   | 0            |  sports ball  | 0            |
|     kite      | 8802         | baseball bat | 0            | baseball gl.. | 0            |
|  skateboard   | 0            |  surfboard   | 6095         | tennis racket | 0            |
|    bottle     | 24070        |  wine glass  | 0            |      cup      | 0            |
|     fork      | 5474         |    knife     | 0            |     spoon     | 6159         |
|     bowl      | 14323        |    banana    | 9195         |     apple     | 5776         |
|   sandwich    | 4356         |    orange    | 6302         |   broccoli    | 7261         |
|    carrot     | 7758         |   hot dog    | 0            |     pizza     | 5807         |
|     donut     | 7005         |     cake     | 0            |     chair     | 38073        |
|     couch     | 0            | potted plant | 0            |      bed      | 4192         |
| dining table  | 0            |    toilet    | 4149         |      tv       | 5803         |
|    laptop     | 4960         |    mouse     | 2261         |    remote     | 5700         |
|   keyboard    | 0            |  cell phone  | 0            |   microwave   | 1672         |
|     oven      | 3334         |   toaster    | 225          |     sink      | 0            |
| refrigerator  | 2634         |     book     | 24077        |     clock     | 6320         |
|     vase      | 6577         |   scissors   | 0            |  teddy bear   | 0            |
|  hair drier   | 0            |  toothbrush  | 1945         |               |              |
|     total     | 656232       |              |              |               |              |
[06/01 18:32:59 d2.data.build]: Using training sampler TrainingSampler
[06/01 18:32:59 d2.data.common]: Serializing 107761 elements to byte tensors and concatenating them all ...
[06/01 18:33:02 d2.data.common]: Serialized dataset takes 361.37 MiB
train_net.py:141: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.scaler = GradScaler()
[06/01 18:33:04 detectron2]: Starting training from iteration 0
F:\Detic-main\detic\modeling\meta_arch\custom_rcnn.py:133: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with autocast():
C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torch\functional.py:513: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3610.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torch\optim\lr_scheduler.py:216: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-ratewarnings.warn(
[06/01 18:33:44 d2.utils.events]:  eta: 19:57:04  iter: 20  total_loss: 2.692  loss_cls: 1.654  loss_box_reg: 0.1041  loss_rpn_cls: 0.6569  loss_rpn_loc: 0.2909  time: 0.8569  data_time: 1.1773  lr: 0.00039962  max_mem: 4648M
[06/01 18:34:00 d2.utils.events]:  eta: 19:50:43  iter: 40  total_loss: 3.211  loss_cls: 1.362  loss_box_reg: 0.4589  loss_rpn_cls: 0.5044  loss_rpn_loc: 0.5539  time: 0.8242  data_time: 0.0033  lr: 0.00079922  max_mem: 4648M
[06/01 18:34:14 d2.utils.events]:  eta: 19:41:28  iter: 60  total_loss: 3.191  loss_cls: 2.113  loss_box_reg: 0.1936  loss_rpn_cls: 0.5298  loss_rpn_loc: 0.3124  time: 0.7772  data_time: 0.0035  lr: 0.0011988  max_mem: 4648M
[06/01 18:34:28 d2.utils.events]:  eta: 19:30:39  iter: 80  total_loss: 2.688  loss_cls: 1.888  loss_box_reg: 0.0985  loss_rpn_cls: 0.489  loss_rpn_loc: 0.2139  time: 0.7650  data_time: 0.0029  lr: 0.0015984  max_mem: 4648M
[06/01 18:34:43 d2.utils.events]:  eta: 19:26:52  iter: 100  total_loss: 2.677  loss_cls: 1.859  loss_box_reg: 0.1172  loss_rpn_cls: 0.4848  loss_rpn_loc: 0.1601  time: 0.7560  data_time: 0.0059  lr: 0.001998  max_mem: 4648M
[06/01 18:34:56 d2.utils.events]:  eta: 19:14:31  iter: 120  total_loss: 2.585  loss_cls: 1.628  loss_box_reg: 0.1362  loss_rpn_cls: 0.4333  loss_rpn_loc: 0.2356  time: 0.7354  data_time: 0.0089  lr: 0.0023976  max_mem: 4648M
[06/01 18:35:08 d2.utils.events]:  eta: 18:31:02  iter: 140  total_loss: 2.278  loss_cls: 1.44  loss_box_reg: 0.1201  loss_rpn_cls: 0.4139  loss_rpn_loc: 0.199  time: 0.7191  data_time: 0.0030  lr: 0.0027972  max_mem: 4648M
[06/01 18:35:22 d2.utils.events]:  eta: 17:49:20  iter: 160  total_loss: 1.838  loss_cls: 1.102  loss_box_reg: 0.1764  loss_rpn_cls: 0.4233  loss_rpn_loc: 0.1694  time: 0.7117  data_time: 0.0030  lr: 0.0031968  max_mem: 4648M
[06/01 18:35:36 d2.utils.events]:  eta: 18:02:41  iter: 180  total_loss: 2.179  loss_cls: 0.8355  loss_box_reg: 0.212  loss_rpn_cls: 0.4705  loss_rpn_loc: 0.3198  time: 0.7143  data_time: 0.0083  lr: 0.0035964  max_mem: 4648M
[06/01 18:35:52 d2.utils.events]:  eta: 18:30:17  iter: 200  total_loss: 2.071  loss_cls: 0.7756  loss_box_reg: 0.1469  loss_rpn_cls: 0.4746  loss_rpn_loc: 0.297  time: 0.7191  data_time: 0.0031  lr: 0.003996  max_mem: 4648M
[06/01 18:36:06 d2.utils.events]:  eta: 18:20:54  iter: 220  total_loss: 1.531  loss_cls: 0.4182  loss_box_reg: 0.214  loss_rpn_cls: 0.4799  loss_rpn_loc: 0.2744  time: 0.7186  data_time: 0.0031  lr: 0.0043956  max_mem: 4648M
[06/01 18:36:18 d2.utils.events]:  eta: 17:59:07  iter: 240  total_loss: 1.333  loss_cls: 0.4441  loss_box_reg: 0.1751  loss_rpn_cls: 0.4943  loss_rpn_loc: 0.2628  time: 0.7098  data_time: 0.0034  lr: 0.0047952  max_mem: 4648M
[06/01 18:36:31 d2.utils.events]:  eta: 17:45:09  iter: 260  total_loss: 1.232  loss_cls: 0.3432  loss_box_reg: 0.1424  loss_rpn_cls: 0.4351  loss_rpn_loc: 0.2283  time: 0.7027  data_time: 0.0027  lr: 0.0051948  max_mem: 4648M
[06/01 18:36:44 d2.utils.events]:  eta: 17:41:25  iter: 280  total_loss: 1.024  loss_cls: 0.2882  loss_box_reg: 0.09724  loss_rpn_cls: 0.4378  loss_rpn_loc: 0.2105  time: 0.7011  data_time: 0.0028  lr: 0.0055944  max_mem: 4648M
[06/01 18:36:58 d2.utils.events]:  eta: 17:39:48  iter: 300  total_loss: 1.087  loss_cls: 0.2558  loss_box_reg: 0.08751  loss_rpn_cls: 0.4812  loss_rpn_loc: 0.2395  time: 0.6986  data_time: 0.0033  lr: 0.005994  max_mem: 4648M
[06/01 18:37:11 d2.utils.events]:  eta: 17:19:01  iter: 320  total_loss: 1.097  loss_cls: 0.2909  loss_box_reg: 0.1351  loss_rpn_cls: 0.4473  loss_rpn_loc: 0.212  time: 0.6949  data_time: 0.0046  lr: 0.0063936  max_mem: 4648M
[06/01 18:37:26 d2.utils.events]:  eta: 17:38:27  iter: 340  total_loss: 0.9854  loss_cls: 0.2784  loss_box_reg: 0.1124  loss_rpn_cls: 0.4079  loss_rpn_loc: 0.1789  time: 0.6995  data_time: 0.0034  lr: 0.0067932  max_mem: 4648M
[06/01 18:37:38 d2.utils.events]:  eta: 17:16:29  iter: 360  total_loss: 0.9864  loss_cls: 0.2833  loss_box_reg: 0.09979  loss_rpn_cls: 0.3913  loss_rpn_loc: 0.16  time: 0.6935  data_time: 0.0042  lr: 0.0071928  max_mem: 4648M
[06/01 18:37:52 d2.utils.events]:  eta: 17:14:32  iter: 380  total_loss: 1.31  loss_cls: 0.3687  loss_box_reg: 0.1613  loss_rpn_cls: 0.4713  loss_rpn_loc: 0.2502  time: 0.6926  data_time: 0.0035  lr: 0.0075924  max_mem: 4648M
[06/01 18:38:05 d2.utils.events]:  eta: 17:10:56  iter: 400  total_loss: 1.053  loss_cls: 0.2638  loss_box_reg: 0.1124  loss_rpn_cls: 0.4462  loss_rpn_loc: 0.1738  time: 0.6917  data_time: 0.0028  lr: 0.007992  max_mem: 4648M
[06/01 18:38:17 d2.utils.events]:  eta: 16:59:19  iter: 420  total_loss: 1.155  loss_cls: 0.3765  loss_box_reg: 0.103  loss_rpn_cls: 0.4749  loss_rpn_loc: 0.2382  time: 0.6855  data_time: 0.0028  lr: 0.0083916  max_mem: 4648M
[06/01 18:38:33 d2.utils.events]:  eta: 17:13:50  iter: 440  total_loss: 1.132  loss_cls: 0.2836  loss_box_reg: 0.1157  loss_rpn_cls: 0.4713  loss_rpn_loc: 0.2449  time: 0.6905  data_time: 0.0034  lr: 0.0087912  max_mem: 4648M
[06/01 18:38:46 d2.utils.events]:  eta: 17:11:50  iter: 460  total_loss: 1.144  loss_cls: 0.2744  loss_box_reg: 0.1068  loss_rpn_cls: 0.445  loss_rpn_loc: 0.2166  time: 0.6892  data_time: 0.0036  lr: 0.0091908  max_mem: 4648M
[06/01 18:39:01 d2.utils.events]:  eta: 17:14:07  iter: 480  total_loss: 0.8858  loss_cls: 0.268  loss_box_reg: 0.08814  loss_rpn_cls: 0.4031  loss_rpn_loc: 0.143  time: 0.6918  data_time: 0.0028  lr: 0.0095904  max_mem: 4648M
[06/01 18:39:17 d2.utils.events]:  eta: 17:23:01  iter: 500  total_loss: 1.236  loss_cls: 0.3258  loss_box_reg: 0.1566  loss_rpn_cls: 0.4449  loss_rpn_loc: 0.2385  time: 0.6957  data_time: 0.0038  lr: 0.00999  max_mem: 4648M

绿框的:

python train_net.py --num-gpus 1 --config-file .\configs\Detic_OVCOCO_CLIP_R50_1x_max-size_caption.yaml

训练界面:

Command Line Args: Namespace(config_file='.\\configs\\Detic_OVCOCO_CLIP_R50_1x_max-size_caption.yaml', dist_url='tcp://127.0.0.1:20594', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[06/01 18:10:49 detectron2]: Rank of current process: 0. World size: 1
[06/01 18:10:50 detectron2]: Environment info:
----------------------  -------------------------------------------------------------------------------------------------
sys.platform            win32
Python                  3.8.18 (default, Sep 11 2023, 13:39:12) [MSC v.1916 64 bit (AMD64)]
numpy                   1.23.2
detectron2              0.6 @f:\llava_grounding_main\detectron2\detectron2
Compiler                MSVC 193732825
CUDA compiler           CUDA 12.1
detectron2 arch flags   f:\llava_grounding_main\detectron2\detectron2\_C.cp38-win_amd64.pyd; cannot find cuobjdump
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 2.4.0+cu121 @C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   NVIDIA GeForce RTX 3060 (arch=8.6)
Driver version          560.94
CUDA_HOME               C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
Pillow                  9.5.0
torchvision             0.19.0+cu121 @C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torchvision
torchvision arch flags  C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torchvision\_C.pyd; cannot find cuobjdump
fvcore                  0.1.5.post20221221
iopath                  0.1.7
cv2                     4.8.1
----------------------  -------------------------------------------------------------------------------------------------
PyTorch built with:- C++ Version: 201703- MSVC 192930154- Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications- Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)- OpenMP 2019- LAPACK is enabled (usually provided by MKL)- CPU capability usage: AVX2- CUDA Runtime 12.1- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90- CuDNN 90.1  (built against CUDA 12.4)- Magma 2.5.4- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=9.1.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,[06/01 18:10:50 detectron2]: Command line arguments: Namespace(config_file='.\\configs\\Detic_OVCOCO_CLIP_R50_1x_max-size_caption.yaml', dist_url='tcp://127.0.0.1:20594', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[06/01 18:10:50 detectron2]: Contents of args.config_file=.\configs\Detic_OVCOCO_CLIP_R50_1x_max-size_caption.yaml:
_BASE_: "Base_OVCOCO_C4_1x.yaml"
MODEL:WEIGHTS: "models/BoxSup_OVCOCO_CLIP_R50_1x.pth"WITH_CAPTION: TrueSYNC_CAPTION_BATCH: TrueROI_BOX_HEAD:WS_NUM_PROPS: 32ADD_IMAGE_BOX: True # caption loss is added to the image-boxIMAGE_LABEL_LOSS: 'max_size'NEG_CAP_WEIGHT: 1.0
SOLVER:IMS_PER_BATCH: 16BASE_LR: 0.02STEPS: (60000, 80000)MAX_ITER: 90000
DATASETS:TRAIN: ("coco_zeroshot_train_oriorder", "coco_caption_train_tags")
INPUT:CUSTOM_AUG: ResizeShortestEdgeMIN_SIZE_TRAIN_SAMPLING: rangeMIN_SIZE_TRAIN: (800, 800)
DATALOADER:SAMPLER_TRAIN: "MultiDatasetSampler"DATASET_RATIO: [1, 4]USE_DIFF_BS_SIZE: TrueDATASET_BS: [2, 8]USE_RFS: [False, False]DATASET_MIN_SIZES: [[800, 800], [400, 400]]DATASET_MAX_SIZES: [1333, 667]FILTER_EMPTY_ANNOTATIONS: FalseMULTI_DATASET_GROUPING: TrueDATASET_ANN: ['box', 'captiontag']NUM_WORKERS: 8
WITH_IMAGE_LABELS: True[06/01 18:10:50 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:ASPECT_RATIO_GROUPING: trueDATASET_ANN:- box- captiontagDATASET_BS:- 2- 8DATASET_INPUT_SCALE:- &id001- 0.1- 2.0- - 0.5- 1.5DATASET_INPUT_SIZE:- 896- 384DATASET_MAX_SIZES:- 1333- 667DATASET_MIN_SIZES:- - 800- 800- - 400- 400DATASET_RATIO:- 1- 4FILTER_EMPTY_ANNOTATIONS: falseMULTI_DATASET_GROUPING: trueNUM_WORKERS: 8REPEAT_THRESHOLD: 0.0SAMPLER_TRAIN: MultiDatasetSamplerTARFILE_PATH: datasets/imagenet/metadata-22k/tar_files.npyTAR_INDEX_DIR: datasets/imagenet/metadata-22k/tarindex_npyUSE_DIFF_BS_SIZE: trueUSE_RFS:- false- falseUSE_TAR_DATASET: false
DATASETS:PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000PROPOSAL_FILES_TEST: []PROPOSAL_FILES_TRAIN: []TEST:- coco_generalized_zeroshot_valTRAIN:- coco_zeroshot_train_oriorder- coco_caption_train_tags
DEBUG: false
DEBUG_SHOW_NAME: false
EVAL_AP_FIX: false
EVAL_CAT_SPEC_AR: false
EVAL_PRED_AR: false
EVAL_PROPOSAL_AR: false
FIND_UNUSED_PARAM: true
FP16: true
GEN_PSEDO_LABELS: false
GLOBAL:HACK: 1.0
INPUT:CROP:ENABLED: falseSIZE:- 0.9- 0.9TYPE: relative_rangeCUSTOM_AUG: ResizeShortestEdgeFORMAT: BGRMASK_FORMAT: polygonMAX_SIZE_TEST: 1333MAX_SIZE_TRAIN: 1333MIN_SIZE_TEST: 800MIN_SIZE_TRAIN:- 800- 800MIN_SIZE_TRAIN_SAMPLING: rangeNOT_CLAMP_BOX: falseRANDOM_FLIP: horizontalSCALE_RANGE: *id001TEST_INPUT_TYPE: defaultTEST_SIZE: 640TRAIN_SIZE: 640
IS_DEBUG: false
MODEL:ANCHOR_GENERATOR:ANGLES:- - -90- 0- 90ASPECT_RATIOS:- - 0.5- 1.0- 2.0NAME: DefaultAnchorGeneratorOFFSET: 0.0SIZES:- - 32- 64- 128- 256- 512BACKBONE:FREEZE_AT: 2NAME: build_resnet_backboneBIFPN:NORM: GNNUM_BIFPN: 6NUM_LEVELS: 5OUT_CHANNELS: 160SEPARABLE_CONV: falseCAP_BATCH_RATIO: 4CENTERNET:AS_PROPOSAL: falseCENTER_NMS: falseFPN_STRIDES:- 8- 16- 32- 64- 128HM_FOCAL_ALPHA: 0.25HM_FOCAL_BETA: 4HM_MIN_OVERLAP: 0.8IGNORE_HIGH_FP: -1.0INFERENCE_TH: 0.05IN_FEATURES:- p3- p4- p5- p6- p7LOC_LOSS_TYPE: giouLOSS_GAMMA: 2.0MIN_RADIUS: 4MORE_POS: falseMORE_POS_THRESH: 0.2MORE_POS_TOPK: 9NEG_WEIGHT: 1.0NMS_TH_TEST: 0.6NMS_TH_TRAIN: 0.6NORM: GNNOT_NMS: falseNOT_NORM_REG: trueNO_REDUCE: falseNUM_BOX_CONVS: 4NUM_CLASSES: 80NUM_CLS_CONVS: 4NUM_SHARE_CONVS: 0ONLY_PROPOSAL: falsePOST_NMS_TOPK_TEST: 100POST_NMS_TOPK_TRAIN: 100POS_WEIGHT: 1.0PRE_NMS_TOPK_TEST: 1000PRE_NMS_TOPK_TRAIN: 1000PRIOR_PROB: 0.01REG_WEIGHT: 2.0SIGMOID_CLAMP: 0.0001SOI:- - 0- 80- - 64- 160- - 128- 320- - 256- 640- - 512- 10000000USE_DEFORMABLE: falseWITH_AGN_HM: falseDATASET_LOSS_WEIGHT: []DETR:CLS_WEIGHT: 2.0DEC_LAYERS: 6DEEP_SUPERVISION: trueDIM_FEEDFORWARD: 2048DROPOUT: 0.1ENC_LAYERS: 6FOCAL_ALPHA: 0.25FROZEN_WEIGHTS: ''GIOU_WEIGHT: 2.0HIDDEN_DIM: 256L1_WEIGHT: 5.0NHEADS: 8NO_OBJECT_WEIGHT: 0.1NUM_CLASSES: 80NUM_FEATURE_LEVELS: 4NUM_OBJECT_QUERIES: 100PRE_NORM: falseTWO_STAGE: falseUSE_FED_LOSS: falseWEAK_WEIGHT: 0.1WITH_BOX_REFINE: falseDEVICE: cudaDLA:DLAUP_IN_FEATURES:- dla3- dla4- dla5DLAUP_NODE: convMS_OUTPUT: falseNORM: BNNUM_LAYERS: 34OUT_FEATURES:- dla2USE_DLA_UP: trueDYNAMIC_CLASSIFIER: falseFPN:FUSE_TYPE: sumIN_FEATURES: []NORM: ''OUT_CHANNELS: 256KEYPOINT_ON: falseLOAD_PROPOSALS: falseMASK_ON: falseMETA_ARCHITECTURE: CustomRCNNNUM_SAMPLE_CATS: 50PANOPTIC_FPN:COMBINE:ENABLED: trueINSTANCES_CONFIDENCE_THRESH: 0.5OVERLAP_THRESH: 0.5STUFF_AREA_LIMIT: 4096INSTANCE_LOSS_WEIGHT: 1.0PIXEL_MEAN:- 103.53- 116.28- 123.675PIXEL_STD:- 1.0- 1.0- 1.0PROPOSAL_GENERATOR:MIN_SIZE: 0NAME: RPNRESET_CLS_TESTS: falseRESNETS:DEFORM_MODULATED: falseDEFORM_NUM_GROUPS: 1DEFORM_ON_PER_STAGE:- false- false- false- falseDEPTH: 50NORM: FrozenBNNUM_GROUPS: 1OUT_FEATURES:- res4RES2_OUT_CHANNELS: 256RES5_DILATION: 1STEM_OUT_CHANNELS: 64STRIDE_IN_1X1: trueWIDTH_PER_GROUP: 64RETINANET:BBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_WEIGHTS: &id003- 1.0- 1.0- 1.0- 1.0FOCAL_LOSS_ALPHA: 0.25FOCAL_LOSS_GAMMA: 2.0IN_FEATURES:- p3- p4- p5- p6- p7IOU_LABELS:- 0- -1- 1IOU_THRESHOLDS:- 0.4- 0.5NMS_THRESH_TEST: 0.5NORM: ''NUM_CLASSES: 80NUM_CONVS: 4PRIOR_PROB: 0.01SCORE_THRESH_TEST: 0.05SMOOTH_L1_LOSS_BETA: 0.1TOPK_CANDIDATES_TEST: 1000ROI_BOX_CASCADE_HEAD:BBOX_REG_WEIGHTS:- &id002- 10.0- 10.0- 5.0- 5.0- - 20.0- 20.0- 10.0- 10.0- - 30.0- 30.0- 15.0- 15.0IOUS:- 0.5- 0.6- 0.7ROI_BOX_HEAD:ADD_FEATURE_TO_PROP: falseADD_IMAGE_BOX: trueBBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_LOSS_WEIGHT: 1.0BBOX_REG_WEIGHTS: *id002CAPTION_WEIGHT: 1.0CAT_FREQ_PATH: datasets/coco/zero-shot/instances_train2017_seen_2_oriorder_cat_info.jsonCLS_AGNOSTIC_BBOX_REG: trueCONV_DIM: 256EQL_FREQ_CAT: 200FC_DIM: 1024FED_LOSS_FREQ_WEIGHT: 0.5FED_LOSS_NUM_CAT: 50IGNORE_ZERO_CATS: trueIMAGE_BOX_SIZE: 1.0IMAGE_LABEL_LOSS: max_sizeIMAGE_LOSS_WEIGHT: 0.1MULT_PROPOSAL_SCORE: falseNAME: ''NEG_CAP_WEIGHT: 1.0NORM: ''NORM_TEMP: 50.0NORM_WEIGHT: trueNUM_CONV: 0NUM_FC: 0POOLER_RESOLUTION: 14POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2PRIOR_PROB: 0.01SMOOTH_L1_BETA: 0.0SOFTMAX_WEAK_LOSS: falseTRAIN_ON_PRED_BOXES: falseUSE_BIAS: 0.0USE_EQL_LOSS: falseUSE_FED_LOSS: falseUSE_SIGMOID_CE: trueUSE_ZEROSHOT_CLS: trueWITH_SOFTMAX_PROP: falseWS_NUM_PROPS: 32ZEROSHOT_WEIGHT_DIM: 512ZEROSHOT_WEIGHT_PATH: datasets/metadata/coco_clip_a+cname.npyROI_HEADS:BATCH_SIZE_PER_IMAGE: 512IN_FEATURES:- res4IOU_LABELS:- 0- 1IOU_THRESHOLDS:- 0.5MASK_WEIGHT: 1.0NAME: CustomRes5ROIHeadsNMS_THRESH_TEST: 0.5NUM_CLASSES: 80ONE_CLASS_PER_PROPOSAL: falsePOSITIVE_FRACTION: 0.25PROPOSAL_APPEND_GT: trueSCORE_THRESH_TEST: 0.05ROI_KEYPOINT_HEAD:CONV_DIMS:- 512- 512- 512- 512- 512- 512- 512- 512LOSS_WEIGHT: 1.0MIN_KEYPOINTS_PER_IMAGE: 1NAME: KRCNNConvDeconvUpsampleHeadNORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: trueNUM_KEYPOINTS: 17POOLER_RESOLUTION: 14POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2ROI_MASK_HEAD:CLS_AGNOSTIC_MASK: falseCONV_DIM: 256NAME: MaskRCNNConvUpsampleHeadNORM: ''NUM_CONV: 0POOLER_RESOLUTION: 14POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2RPN:BATCH_SIZE_PER_IMAGE: 256BBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_LOSS_WEIGHT: 1.0BBOX_REG_WEIGHTS: *id003BOUNDARY_THRESH: -1CONV_DIMS:- -1HEAD_NAME: StandardRPNHeadIN_FEATURES:- res4IOU_LABELS:- 0- -1- 1IOU_THRESHOLDS:- 0.3- 0.7LOSS_WEIGHT: 1.0NMS_THRESH: 0.7POSITIVE_FRACTION: 0.5POST_NMS_TOPK_TEST: 1000POST_NMS_TOPK_TRAIN: 2000PRE_NMS_TOPK_TEST: 6000PRE_NMS_TOPK_TRAIN: 12000SMOOTH_L1_BETA: 0.0SEM_SEG_HEAD:COMMON_STRIDE: 4CONVS_DIM: 128IGNORE_VALUE: 255IN_FEATURES:- p2- p3- p4- p5LOSS_WEIGHT: 1.0NAME: SemSegFPNHeadNORM: GNNUM_CLASSES: 54SWIN:OUT_FEATURES:- 1- 2- 3SIZE: TUSE_CHECKPOINT: falseSYNC_CAPTION_BATCH: trueTEST_CLASSIFIERS: []TEST_NUM_CLASSES: []TIMM:BASE_NAME: resnet50FREEZE_AT: 0NORM: FrozenBNOUT_LEVELS:- 3- 4- 5PRETRAINED: falseWEIGHTS: models/BoxSup_OVCOCO_CLIP_R50_1x.pthWITH_CAPTION: true
OUTPUT_DIR: output/Detic-COCO/Detic_OVCOCO_CLIP_R50_1x_max-size_caption
QUICK_DEBUG: false
SAVE_DEBUG: false
SAVE_DEBUG_PATH: output/save_debug/
SAVE_PTH: false
SEED: -1
SOLVER:AMP:ENABLED: falseBACKBONE_MULTIPLIER: 1.0BASE_LR: 0.02BASE_LR_END: 0.0BIAS_LR_FACTOR: 1.0CHECKPOINT_PERIOD: 1000000000CLIP_GRADIENTS:CLIP_TYPE: valueCLIP_VALUE: 1.0ENABLED: falseNORM_TYPE: 2.0CUSTOM_MULTIPLIER: 1.0CUSTOM_MULTIPLIER_NAME: []GAMMA: 0.1IMS_PER_BATCH: 16LR_SCHEDULER_NAME: WarmupMultiStepLRMAX_ITER: 90000MOMENTUM: 0.9NESTEROV: falseOPTIMIZER: SGDREFERENCE_WORLD_SIZE: 0RESET_ITER: falseSTEPS:- 60000- 80000TRAIN_ITER: -1USE_CUSTOM_SOLVER: falseWARMUP_FACTOR: 0.001WARMUP_ITERS: 1000WARMUP_METHOD: linearWEIGHT_DECAY: 0.0001WEIGHT_DECAY_BIAS: nullWEIGHT_DECAY_NORM: 0.0
TEST:AUG:ENABLED: falseFLIP: trueMAX_SIZE: 4000MIN_SIZES:- 400- 500- 600- 700- 800- 900- 1000- 1100- 1200DETECTIONS_PER_IMAGE: 100EVAL_PERIOD: 0EXPECTED_RESULTS: []KEYPOINT_OKS_SIGMAS: []PRECISE_BN:ENABLED: falseNUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
VIS_THRESH: 0.3
WITH_IMAGE_LABELS: true[06/01 18:10:50 detectron2]: Full config saved to output/Detic-COCO/Detic_OVCOCO_CLIP_R50_1x_max-size_caption\config.yaml
[06/01 18:10:50 d2.utils.env]: Using a generated random seed 50341023
Loading pretrained CLIP
[06/01 18:10:56 detectron2]: Model:
CustomRCNN((backbone): ResNet((stem): BasicStem((conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)))(res2): Sequential((0): BottleneckBlock((shortcut): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))))(res3): Sequential((0): BottleneckBlock((shortcut): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(3): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))))(res4): Sequential((0): BottleneckBlock((shortcut): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05))(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(3): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(4): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(5): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))))(proposal_generator): RPN((rpn_head): StandardRPNHead((conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)(activation): ReLU())(objectness_logits): Conv2d(1024, 15, kernel_size=(1, 1), stride=(1, 1))(anchor_deltas): Conv2d(1024, 60, kernel_size=(1, 1), stride=(1, 1)))(anchor_generator): DefaultAnchorGenerator((cell_anchors): BufferList()))(roi_heads): CustomRes5ROIHeads((pooler): ROIPooler((level_poolers): ModuleList((0): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)))(res5): Sequential((0): BottleneckBlock((shortcut): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05))(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05))))(box_predictor): DeticFastRCNNOutputLayers((cls_score): ZeroShotClassifier((linear): Linear(in_features=2048, out_features=512, bias=True))(bbox_pred): Sequential((0): Linear(in_features=2048, out_features=2048, bias=True)(1): ReLU(inplace=True)(2): Linear(in_features=2048, out_features=4, bias=True))))(text_encoder): CLIPTEXT((transformer): Transformer((resblocks): Sequential((0): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(1): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(2): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(3): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(4): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(5): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(6): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(7): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(8): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(9): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(10): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(11): ResidualAttentionBlock((attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True))(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(mlp): Sequential((c_fc): Linear(in_features=512, out_features=2048, bias=True)(gelu): QuickGELU()(c_proj): Linear(in_features=2048, out_features=512, bias=True))(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True))))(token_embedding): Embedding(49408, 512)(ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True))
)
[06/01 18:10:56 fvcore.common.checkpoint]: [Checkpointer] Loading from models/BoxSup_OVCOCO_CLIP_R50_1x.pth ...
C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\fvcore\common\checkpoint.py:252: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.return torch.load(f, map_location=torch.device("cpu"))
WARNING [06/01 18:10:56 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
roi_heads.box_predictor.freq_weight
text_encoder.ln_final.{bias, weight}
text_encoder.token_embedding.weight
text_encoder.transformer.resblocks.0.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.0.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.0.ln_1.{bias, weight}
text_encoder.transformer.resblocks.0.ln_2.{bias, weight}
text_encoder.transformer.resblocks.0.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.0.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.1.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.1.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.1.ln_1.{bias, weight}
text_encoder.transformer.resblocks.1.ln_2.{bias, weight}
text_encoder.transformer.resblocks.1.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.1.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.10.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.10.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.10.ln_1.{bias, weight}
text_encoder.transformer.resblocks.10.ln_2.{bias, weight}
text_encoder.transformer.resblocks.10.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.10.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.11.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.11.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.11.ln_1.{bias, weight}
text_encoder.transformer.resblocks.11.ln_2.{bias, weight}
text_encoder.transformer.resblocks.11.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.11.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.2.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.2.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.2.ln_1.{bias, weight}
text_encoder.transformer.resblocks.2.ln_2.{bias, weight}
text_encoder.transformer.resblocks.2.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.2.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.3.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.3.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.3.ln_1.{bias, weight}
text_encoder.transformer.resblocks.3.ln_2.{bias, weight}
text_encoder.transformer.resblocks.3.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.3.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.4.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.4.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.4.ln_1.{bias, weight}
text_encoder.transformer.resblocks.4.ln_2.{bias, weight}
text_encoder.transformer.resblocks.4.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.4.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.5.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.5.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.5.ln_1.{bias, weight}
text_encoder.transformer.resblocks.5.ln_2.{bias, weight}
text_encoder.transformer.resblocks.5.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.5.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.6.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.6.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.6.ln_1.{bias, weight}
text_encoder.transformer.resblocks.6.ln_2.{bias, weight}
text_encoder.transformer.resblocks.6.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.6.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.7.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.7.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.7.ln_1.{bias, weight}
text_encoder.transformer.resblocks.7.ln_2.{bias, weight}
text_encoder.transformer.resblocks.7.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.7.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.8.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.8.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.8.ln_1.{bias, weight}
text_encoder.transformer.resblocks.8.ln_2.{bias, weight}
text_encoder.transformer.resblocks.8.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.8.mlp.c_proj.{bias, weight}
text_encoder.transformer.resblocks.9.attn.out_proj.{bias, weight}
text_encoder.transformer.resblocks.9.attn.{in_proj_bias, in_proj_weight}
text_encoder.transformer.resblocks.9.ln_1.{bias, weight}
text_encoder.transformer.resblocks.9.ln_2.{bias, weight}
text_encoder.transformer.resblocks.9.mlp.c_fc.{bias, weight}
text_encoder.transformer.resblocks.9.mlp.c_proj.{bias, weight}
text_encoder.{positional_embedding, text_projection}
[06/01 18:10:57 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='range'), RandomFlip()]
[06/01 18:11:13 d2.data.datasets.coco]: Loading datasets\coco/zero-shot/instances_train2017_seen_2_oriorder.json takes 16.05 seconds.
[06/01 18:11:14 d2.data.datasets.coco]: Loaded 107761 images in COCO format from datasets\coco/zero-shot/instances_train2017_seen_2_oriorder.json
[06/01 18:11:21 detic.data.datasets.lvis_v1]: Loaded 100646 images in the LVIS v1 format from datasets\coco/annotations/captions_train2017_tags_allcaps.json
dataset sizes [107761, 100646]
[06/01 18:11:24 d2.data.common]: Serializing 208407 elements to byte tensors and concatenating them all ...
[06/01 18:11:28 d2.data.common]: Serialized dataset takes 412.78 MiB
train_net.py:141: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.scaler = GradScaler()
[06/01 18:11:29 detectron2]: Starting training from iteration 0
F:\Detic-main\detic\modeling\meta_arch\custom_rcnn.py:133: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with autocast():
C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torch\functional.py:513: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3610.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
C:\ProgramData\miniconda3\envs\llavag\lib\site-packages\torch\nn\functional.py:5560: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
[06/01 18:12:47 d2.utils.events]:  eta: 13:30:02  iter: 20  total_loss: 0.9656  loss_cls: 0  loss_box_reg: 0  image_loss: 0.9482  loss_rpn_cls: 0  loss_rpn_loc: 0  time: 0.7746  data_time: 3.1242  lr: 0.00039962  max_mem: 4891M
[06/01 18:13:00 d2.utils.events]:  eta: 14:16:59  iter: 40  total_loss: 0.8001  loss_cls: 0.1149  loss_box_reg: 0.07416  image_loss: 0  loss_rpn_cls: 0.0408  loss_rpn_loc: 0.04454  time: 0.7132  data_time: 0.0090  lr: 0.00079922  max_mem: 4891M
[06/01 18:13:13 d2.utils.events]:  eta: 14:16:47  iter: 60  total_loss: 0.7527  loss_cls: 0.03435  loss_box_reg: 0.02981  image_loss: 0.3297  loss_rpn_cls: 0.01289  loss_rpn_loc: 0.01264  time: 0.6837  data_time: 0.0076  lr: 0.0011988  max_mem: 4892M
[06/01 18:13:26 d2.utils.events]:  eta: 17:05:51  iter: 80  total_loss: 0.6891  loss_cls: 0.03277  loss_box_reg: 0.03636  image_loss: 0  loss_rpn_cls: 0.02024  loss_rpn_loc: 0.01448  time: 0.6751  data_time: 0.0089  lr: 0.0015984  max_mem: 4892M
[06/01 18:13:39 d2.utils.events]:  eta: 17:05:37  iter: 100  total_loss: 0.7607  loss_cls: 0.02319  loss_box_reg: 0.03272  image_loss: 0.3596  loss_rpn_cls: 0.01089  loss_rpn_loc: 0.008463  time: 0.6667  data_time: 0.0054  lr: 0.001998  max_mem: 4892M
[06/01 18:13:51 d2.utils.events]:  eta: 13:29:08  iter: 120  total_loss: 0.6448  loss_cls: 0  loss_box_reg: 0  image_loss: 0.546  loss_rpn_cls: 0  loss_rpn_loc: 0  time: 0.6546  data_time: 0.0066  lr: 0.0023976  max_mem: 4892M
[06/01 18:14:03 d2.utils.events]:  eta: 12:56:32  iter: 140  total_loss: 0.6852  loss_cls: 0  loss_box_reg: 0  image_loss: 0.5059  loss_rpn_cls: 0  loss_rpn_loc: 0  time: 0.6474  data_time: 0.0054  lr: 0.0027972  max_mem: 4893M
[06/01 18:14:17 d2.utils.events]:  eta: 14:15:50  iter: 160  total_loss: 0.6721  loss_cls: 0.07292  loss_box_reg: 0.05914  image_loss: 0  loss_rpn_cls: 0.04195  loss_rpn_loc: 0.02563  time: 0.6494  data_time: 0.0055  lr: 0.0031968  max_mem: 4893M
[06/01 18:14:30 d2.utils.events]:  eta: 19:33:53  iter: 180  total_loss: 0.8056  loss_cls: 0.0753  loss_box_reg: 0.08548  image_loss: 0  loss_rpn_cls: 0.03067  loss_rpn_loc: 0.02844  time: 0.6500  data_time: 0.0056  lr: 0.0035964  max_mem: 4893M
[06/01 18:14:41 d2.utils.events]:  eta: 12:56:01  iter: 200  total_loss: 0.6465  loss_cls: 0  loss_box_reg: 0  image_loss: 0.5738  loss_rpn_cls: 0  loss_rpn_loc: 0  time: 0.6403  data_time: 0.0066  lr: 0.003996  max_mem: 4893M
[06/01 18:14:54 d2.utils.events]:  eta: 13:28:14  iter: 220  total_loss: 0.6347  loss_cls: 0.09051  loss_box_reg: 0.1167  image_loss: 0  loss_rpn_cls: 0.03716  loss_rpn_loc: 0.01069  time: 0.6408  data_time: 0.0051  lr: 0.0043956  max_mem: 4893M
[06/01 18:15:07 d2.utils.events]:  eta: 14:15:04  iter: 240  total_loss: 0.6625  loss_cls: 0.1059  loss_box_reg: 0.09115  image_loss: 0  loss_rpn_cls: 0.07941  loss_rpn_loc: 0.06764  time: 0.6406  data_time: 0.0055  lr: 0.0047952  max_mem: 4893M
[06/01 18:15:20 d2.utils.events]:  eta: 14:14:53  iter: 260  total_loss: 0.7402  loss_cls: 0.05906  loss_box_reg: 0.05109  image_loss: 0.3339  loss_rpn_cls: 0.03409  loss_rpn_loc: 0.02727  time: 0.6397  data_time: 0.0051  lr: 0.0051948  max_mem: 4894M
[06/01 18:15:32 d2.utils.events]:  eta: 14:14:41  iter: 280  total_loss: 0.7019  loss_cls: 0.04317  loss_box_reg: 0.03996  image_loss: 0.2476  loss_rpn_cls: 0.02012  loss_rpn_loc: 0.01788  time: 0.6384  data_time: 0.0060  lr: 0.0055944  max_mem: 4895M
[06/01 18:15:45 d2.utils.events]:  eta: 14:14:30  iter: 300  total_loss: 0.6237  loss_cls: 0.05511  loss_box_reg: 0.04056  image_loss: 0.2471  loss_rpn_cls: 0.01661  loss_rpn_loc: 0.004652  time: 0.6382  data_time: 0.0059  lr: 0.005994  max_mem: 4895M
[06/01 18:15:58 d2.utils.events]:  eta: 17:01:22  iter: 320  total_loss: 0.6549  loss_cls: 0.05796  loss_box_reg: 0.04952  image_loss: 0  loss_rpn_cls: 0.04849  loss_rpn_loc: 0.01719  time: 0.6385  data_time: 0.0053  lr: 0.0063936  max_mem: 4895M
[06/01 18:16:11 d2.utils.events]:  eta: 17:01:09  iter: 340  total_loss: 0.6248  loss_cls: 0.04142  loss_box_reg: 0.05109  image_loss: 0.2425  loss_rpn_cls: 0.02078  loss_rpn_loc: 0.008507  time: 0.6378  data_time: 0.0057  lr: 0.0067932  max_mem: 4895M
[06/01 18:16:24 d2.utils.events]:  eta: 19:29:57  iter: 360  total_loss: 0.908  loss_cls: 0.0986  loss_box_reg: 0.0627  image_loss: 0  loss_rpn_cls: 0.08474  loss_rpn_loc: 0.01765  time: 0.6386  data_time: 0.0060  lr: 0.0071928  max_mem: 4895M
[06/01 18:16:36 d2.utils.events]:  eta: 17:00:41  iter: 380  total_loss: 0.861  loss_cls: 0  loss_box_reg: 0  image_loss: 0.6764  loss_rpn_cls: 0  loss_rpn_loc: 0  time: 0.6375  data_time: 0.0059  lr: 0.0075924  max_mem: 4895M
[06/01 18:16:50 d2.utils.events]:  eta: 19:30:29  iter: 400  total_loss: 0.7996  loss_cls: 0.2128  loss_box_reg: 0.1527  image_loss: 0  loss_rpn_cls: 0.0739  loss_rpn_loc: 0.06048  time: 0.6394  data_time: 0.0064  lr: 0.007992  max_mem: 4895M
[06/01 18:17:02 d2.utils.events]:  eta: 19:27:32  iter: 420  total_loss: 0.8131  loss_cls: 0  loss_box_reg: 0  image_loss: 0.6404  loss_rpn_cls: 0  loss_rpn_loc: 0  time: 0.6384  data_time: 0.0086  lr: 0.0083916  max_mem: 4895M
[06/01 18:17:15 d2.utils.events]:  eta: 16:59:12  iter: 440  total_loss: 0.7116  loss_cls: 0  loss_box_reg: 0  image_loss: 0.5867  loss_rpn_cls: 0  loss_rpn_loc: 0  time: 0.6374  data_time: 0.0052  lr: 0.0087912  max_mem: 4895M
[06/01 18:17:28 d2.utils.events]:  eta:

至此完成了如上两个网络配置的训练所需做的准备,走通了也便于开展数据集迁移研究。

相关文章:

OVD开放词汇检测 Detic 训练COCO数据集实践

0、引言 纯视觉检测当前研究基本比较饱和&#xff0c;继续创新提升空间很小&#xff0c;除非在CNN和transformer上提出更强基础建模方式。和文本结合是当前的一大趋势&#xff0c;也是计算机视觉和自然语言处理结合的未来趋势&#xff0c;目前和文本结合的目标检测工作还是有很…...

docker、ctr、crictl命令简介与使用

概述 在使用k3s过程中&#xff0c;经常需要使用ctr和crictl两个命令&#xff0c;本文记录一下。 ctr 类似docker命令是docker-shim容器运行时的客户端工具&#xff0c;ctr是Containerd的客户端工具。一个简单的CLI接口&#xff0c;用作Containerd本身的一些调试用途&#xf…...

WEB安全--SQL注入--bypass技巧2

继之前文章的补充&#xff1a; WEB安全--SQL注入--bypass技巧_sql注入过滤空格-CSDN博客 Q1&#xff1a;发现sql注入的时间盲注时&#xff0c;如果时间盲注的函数都被过滤了&#xff0c;怎么办&#xff1f; 除了找其他函数替换、编码等方式&#xff0c;还有以下方式绕过&…...

【强化学习哲学 Day 1】Q-Learning - 在不确定中寻找确定

&#x1f3ad; 故事&#xff1a;那些选择的时刻 你还记得那些站在十字路口的时刻吗&#xff1f; 也许是刚进实验室&#xff0c;面对满墙的研究方向海报&#xff0c;不知道哪条路通向你想要的未来&#xff1b;也许是第一份工作的选择&#xff0c;大厂的螺丝钉还是小公司的多面…...

WEB3——什么是ABI

怎么获得ABI&#xff1f; 在编译完合约后&#xff0c;可以在左边下面点击复制ABI ABI&#xff08;Application Binary Interface&#xff0c;应用二进制接口&#xff09;是用来让前端或服务端 JavaScript 代码与智能合约进行交互的桥梁&#xff0c;它描述了合约的函数、事件和…...

嵌入式软件--stm32 DAY 8.5 基础复习总结

1.时钟树 在数据手册里面&#xff0c;有一张密密麻麻的图&#xff0c;正是时钟系统里的时钟树。 对于时钟&#xff0c;我们注意有两点。一个是系统时钟SYSCLK,一个是依赖外部晶振生成的RTC. RTC以外部低速晶振作为时钟源或者外部高速晶振128分频后作为时钟源&#xff0c;又或者…...

MMRL: Multi-Modal Representation Learning for Vision-Language Models(多模态表示学习)

摘要 预训练的VLMs,对于跨任务的迁移学习至关重要&#xff0c;然而&#xff0c;在few-shot数据集上微调会导致过拟合&#xff0c;降低在新任务上的性能。为解决这个问题&#xff0c;提出一种新的多模态表征学习框架&#xff08;MMRL&#xff09;,该框架引入了一个共享、可学习…...

贪心算法求解汽车加油问题

一、问题描述 一辆汽车加满油后可以行驶 n km。在前往目的地的途中&#xff0c;有多个加油站。我们的目标是设计一个有效的算法&#xff0c;确定汽车应该在哪些加油站停靠加油&#xff0c;以使得沿途的加油次数最少。 二、输入输出形式 算法的输入包括两部分&#xff1a;第一…...

JVM Full GC 频繁问题排查、优化及解决方案

引言 在Java应用程序中&#xff0c;JVM&#xff08;Java虚拟机&#xff09;通过垃圾回收机制自动管理内存&#xff0c;确保不再使用的对象能够被及时清理和释放。虽然垃圾回收在大多数情况下运行顺利&#xff0c;但当Full GC频繁发生时&#xff0c;它会严重影响应用性能&#x…...

rsync服务的搭建

目录 一、rsync介绍 rsync的安装 二、rsync的语法 三、rsync命令使用 1. 本机同步 2. 远程同步 四、rsync作为服务使用 1、尝试启动rsync程序 2、rsync的配置文件介绍 注意事项&#xff1a; 3. rsyncinotify实时同步 3.依赖服务托管xinetd&#xff08;CentOS 6中rs…...

JDK21深度解密 Day 8:Spring Boot 3与虚拟线程整合

【JDK21深度解密 Day 8】Spring Boot 3与虚拟线程整合 引言:Spring Boot 3遇上JDK21虚拟线程 在本系列的第8天,我们将聚焦于Spring Boot 3与JDK21虚拟线程的整合实践。作为全网首套完整的JDK21特性解析,我们不仅会探讨虚拟线程如何颠覆传统Java并发模型,还会通过完整的Sp…...

vscode 配置 QtCreat Cmake项目

1.vscode安装CmakeTool插件并配置QT中cmake的路径&#xff0c;不止这一处 2.cmake生成器使用Ninja&#xff08;Ninja在安装QT时需要勾选&#xff09;&#xff0c;可以解决[build] cc1plus.exe: error: too many filenames given; type ‘cc1plus.exe --help’ for usage 编译时…...

排序算法-归并排序与快速排序

归并排序与快速排序 快速排序是利用的递归思想&#xff1a;选取一个基准数&#xff0c;把小于基准数的放左边 大于的放右边直到整个序列有序 。快排分割函数 O(lognn), 空间 :没有额外开辟新的数组但是递归树调用函数会占用栈内存 O(logn) 。 归并排序&#xff1a;在递归返回的…...

HTML实现端午节主题网站:龙舟争渡,凭吊祭江诵君赋。

名人说&#xff1a;龙舟争渡&#xff0c;助威呐喊&#xff0c;凭吊祭江诵君赋。——苏轼《六幺令天中节》 创作者&#xff1a;Code_流苏(CSDN)&#xff08;一个喜欢古诗词和编程的Coder&#x1f60a;&#xff09; 目录 一、项目概览&#xff1a;传统与现代的技术碰撞1. 核心特…...

uniapp uni-id 如果是正式项目,需自行实现发送邮件的相关功能

(3) 使用云对象sendEmailCode 发送邮箱验证码&#xff0c;报错送邮箱验证码失败 Error: 已启动测试模式&#xff0c;直接使用&#xff1a;123456作为邮箱验证码即可。 如果是正式项目&#xff0c;需自行实现发送邮件的相关功能 - DCloud问答 uni-id 没有实现邮箱验证码逻辑&am…...

Spring boot 策略模式

public abstract class Node {/*** 执行** param a* param b* return*/public abstract Integer execute(int a, int b); }package my.node;import org.springframework.stereotype.Component;Component("exec") public class ExecNode extends Node {Overridepublic…...

websocket在vue中的使用步骤,以及实现聊天

一、WebSocket集成步骤 ‌连接初始化‌ 在Vue组件中创建WebSocket实例&#xff0c;建议在mounted生命周期中执行&#xff1a; data() {return {socket: null,messages: []} }, mounted() {this.socket new WebSocket(wss://your-server-endpoint); }‌事件监听配置 ‌连接成…...

C++学习-入门到精通【12】文件处理

C学习-入门到精通【12】文件处理 目录 C学习-入门到精通【12】文件处理一、文件和流二、创建顺序文件三、从顺序文件读取数据文件定位指针对之前的程序进行修改&#xff1a;贷款查询程序 四、更新顺序文件五、随机存取文件1.创建随机存取文件2.修改程序&#xff1a;贷款处理程序…...

第十一篇:MySQL 在分布式系统中的一致性保障与中间件实践

随着微服务和分布式架构的发展&#xff0c;单点数据库早已无法满足系统的横向扩展需求。本篇聚焦 MySQL 在分布式系统中的一致性保障机制&#xff0c;以及相关中间件的使用策略与实战经验。 一、一致性问题的由来 在 单机 MySQL 环境 中&#xff0c;事务具有原子性、隔离性&am…...

Java中如何枚举正则表达式捕获组的名字

在使用正则表达式在匹配文本时&#xff0c;除了可以通过表达式捕获命中的文本串外&#xff0c;还可以对捕获的文本串进行命名。尤其是在解析日志的场景中&#xff0c;经常会被用到。表达式如下&#xff1a; \<(?<pri>\d)\>(?<time>.*) (?<host>\S)…...

matlab实现图像压缩编码

一、基于DCT的JPEG压缩&#xff08;有损&#xff09; 1. 核心步骤 图像分块&#xff1a;将图像划分为88的小块。离散余弦变换&#xff08;DCT&#xff09;&#xff1a;对每个块进行DCT变换。量化&#xff1a;对DCT系数进行量化以减少高频信息。熵编码&#xff1a;使用哈夫曼或…...

如何排查Redis单个Key命中率骤降?

问题现象 Redis整体命中率98%&#xff0c;但监控发现特定Key&#xff08;如user:1000:profile&#xff09;的命中率从99%骤降至40%&#xff0c;引发服务延迟上升。 排查步骤 1. 确认现象与定位Key // 通过Redis监控工具获取Key指标 public void monitorKey(String key) {Je…...

记一次 Starrocks be 内存异常宕机

突发性 be 内存飙高&#xff0c;直至被系统 kill 掉&#xff0c;be 内存如下&#xff1a;其中 starrocks_be_update_mem_bytes 指标打满&#xff0c;重启也是如此 [rootlocalhost bin]# curl -XGET -s http://192.168.1.49:8040/metrics | grep "^starrocks_be_.*_mem_b…...

Spring Boot 读取.env文件获取配置

Spring Boot 读取.env文件获取配置 在Resouce 目录下创建.env文件 # DEEP SEEK TOKEN DEEP_SEEK_TOKENyour_deep_seek_key # 阿里云百炼 TOKEN ALI_BAILIAN_TOKENyour_ali_bailian_keyyml引入.env文件 spring:config:import: optional:classpath:.env[.properties]使用.env文…...

LangChain-结合GLM+SQL+函数调用实现数据库查询(一)

业务流程 实现步骤 1. 加载数据库配置 在项目的根目录下创建.env 文件&#xff0c;设置文件内容&#xff1a; DB_HOSTxxx DB_PORT3306 DB_USERxxx DB_PASSWORDxxx DB_NAMExxx DB_CHARSETutf8mb4 加载环境变量&#xff0c;从 .env 文件中读取数据库配置信息 使用 os.getenv…...

python训练营打卡第41天

简单CNN 知识回顾 数据增强卷积神经网络定义的写法batch归一化&#xff1a;调整一个批次的分布&#xff0c;常用与图像数据特征图&#xff1a;只有卷积操作输出的才叫特征图调度器&#xff1a;直接修改基础学习率 卷积操作常见流程如下&#xff1a; 1. 输入 → 卷积层 → Batch…...

1.3HarmonyOS NEXT统一开发范式与跨端适配:开启高效跨设备应用开发新时代

HarmonyOS NEXT统一开发范式与跨端适配&#xff1a;开启高效跨设备应用开发新时代 在HarmonyOS NEXT的技术体系中&#xff0c;统一开发范式与跨端适配是两大关键特性&#xff0c;它们为开发者打破了设备边界&#xff0c;极大地提升了开发效率与应用体验。本章节将深入探讨方舟…...

麒麟v10,arm64架构,编译安装Qt5.12.8

Window和麒麟x86_64架构&#xff0c;官网提供安装包&#xff0c;麒麟arm64架构的&#xff0c;只能自己用编码编译安装。 注意&#xff0c;“桌面”路径是中文&#xff0c;所以不要把源码放在桌面上编译。 1. 下载源码 从官网下载源码&#xff1a;https://download.qt.io/arc…...

ArcGIS Pro 3.4 二次开发 - 布局

环境:ArcGIS Pro SDK 3.4 + .NET 8 文章目录 布局1 布局工程项1.1 引用布局工程项及其关联的布局1.2 在新视图中打开布局工程项1.3 激活已打开的布局视图1.4 引用活动布局视图1.5 将 pagx 导入工程1.6 移除布局工程项1.7 创建并打开一个新的基本布局1.8 使用修改后的CIM创建新…...

基于随机函数链接神经网络(RVFL)的锂电池健康状态(SOH)预测

基于随机函数链接神经网络(RVFL)的锂电池健康状态(SOH)预测 一、RVFL网络的基本原理与结构 随机向量功能链接(Random Vector Functional Link, RVFL)网络是一种单隐藏层前馈神经网络的随机化版本,其核心特征在于输入层到隐藏层的权重随机生成且固定,输出层权重通过最…...