保姆级教程：手把手教你用Python脚本将UAVDT数据集转成YOLO格式（附完整代码）

张

张建站

2026/5/21 4:41:14

10分钟阅读

保姆级教程：手把手教你用Python脚本将UAVDT数据集转成YOLO格式（附完整代码）

无人机视角下的目标检测实战UAVDT数据集转YOLO格式全流程解析无人机航拍视频中的车辆检测一直是计算机视觉领域的热点应用场景。UAVDT作为目前最大的无人机航拍目标检测数据集包含超过80,000帧高清图像和丰富的标注信息。但对于刚接触该领域的研究者来说原始数据格式与主流检测框架的兼容性问题常常成为第一道门槛。1. 环境准备与数据理解在开始转换工作前我们需要先搭建好开发环境并深入理解UAVDT数据集的结构特点。这个步骤看似简单但合理的环境配置和对数据的充分理解能避免后续80%的路径错误和格式问题。1.1 基础环境配置推荐使用Python 3.8环境主要依赖库包括pip install numpy opencv-python tqdm pillow对于大规模数据处理建议准备至少16GB内存的机器。数据集解压后约占用45GB磁盘空间确保有足够的存储容量。1.2 UAVDT数据集结构解析下载后的UAVDT数据集通常包含以下关键目录和文件UAV-benchmark-M/ ├── Mxxxx/ # 各场景子目录如M0201、M1102等 │ ├── img1/ # 图像序列帧 │ └── gt/ # 标注文件 │ └── gt_whole.txt # 全局标注文件 └── annotations/ # 官方提供的额外标注信息特别需要注意gt_whole.txt的标注格式每行包含9个逗号分隔的字段帧序号,目标ID,bbox_left,bbox_top,bbox_width,bbox_height,out-of-view,occlusion,类别其中最后三个字段需要特别关注out-of-view: 目标是否部分在画面外1表示是occlusion: 遮挡程度0-3数值越大遮挡越严重类别: 1-汽车2-卡车3-巴士2. 原始标注解析与重组将全局标注文件按帧拆分为独立的文本文件是转换过程的第一步。这个阶段需要处理原始数据中的特殊情况和异常值。2.1 标注文件分割以下Python脚本将gt_whole.txt按帧拆分为单独的txt文件import os from tqdm import tqdm def split_annotations(dataset_root, seq_name): img_dir os.path.join(dataset_root, seq_name, img1) gt_file os.path.join(dataset_root, seq_name, gt, gt_whole.txt) output_dir os.path.join(dataset_root, seq_name, gt_per_frame) os.makedirs(output_dir, exist_okTrue) # 统计总帧数 frame_count len([f for f in os.listdir(img_dir) if f.endswith(.jpg)]) # 初始化各帧的标注容器 frame_annots {i: [] for i in range(1, frame_count1)} with open(gt_file, r) as f: for line in f: parts line.strip().split(,) frame_idx int(parts[0]) if 1 frame_idx frame_count: frame_annots[frame_idx].append(line) # 写入各帧标注文件 for frame_idx, annots in tqdm(frame_annots.items(), descSaving per-frame annotations): output_path os.path.join(output_dir, f{frame_idx:06d}.txt) with open(output_path, w) as f: f.writelines(annots) # 使用示例 split_annotations(/path/to/UAV-benchmark-M, M1102)注意原始标注中存在部分目标被标记为out-of-view的情况这些目标是否需要保留取决于具体应用场景。在无人机检测中通常建议保留这些目标以增强模型对部分可见目标的识别能力。2.2 特殊字段处理策略UAVDT中的遮挡和出界标记为数据增强提供了宝贵信息。我们可以在转换过程中保留这些信息供后续使用def process_annotation_line(line): parts line.strip().split(,) # 提取基础信息 frame_idx, obj_id map(int, parts[:2]) bbox list(map(float, parts[2:6])) # 提取特殊标记 out_of_view int(parts[6]) occlusion int(parts[7]) obj_class int(parts[8]) # 可根据需求过滤特定标注 if out_of_view 0 and occlusion 2: return None # 忽略严重遮挡且出界的对象 return { bbox: bbox, class: obj_class, occlusion: occlusion, out_of_view: out_of_view }3. YOLO格式转换核心逻辑将解析后的标注转换为YOLO格式需要理解两种标注体系的本质差异并处理好坐标归一化和类别映射等关键问题。3.1 坐标系统转换YOLO格式使用归一化的中心坐标和宽高表示法转换公式如下x_center (bbox_left bbox_width/2) / image_width y_center (bbox_top bbox_height/2) / image_height width bbox_width / image_width height bbox_height / image_height实现代码示例import cv2 def uavdt_to_yolo_bbox(bbox, img_w, img_h): 将UAVDT原始bbox转换为YOLO格式 x_min, y_min, w, h bbox # 计算中心坐标 x_center (x_min w / 2) / img_w y_center (y_min h / 2) / img_h # 计算归一化宽高 norm_w w / img_w norm_h h / img_h return [x_center, y_center, norm_w, norm_h]3.2 类别映射与文件生成UAVDT的原始类别需要映射为连续的整数索引这是YOLO格式的要求CLASS_MAPPING { 1: 0, # car → 0 2: 1, # truck → 1 3: 2 # bus → 2 } def generate_yolo_label(frame_file, img_size, output_dir): img_w, img_h img_size frame_id os.path.splitext(os.path.basename(frame_file))[0] with open(frame_file, r) as f: lines f.readlines() yolo_lines [] for line in lines: annot process_annotation_line(line) if annot is None: continue yolo_bbox uavdt_to_yolo_bbox(annot[bbox], img_w, img_h) class_id CLASS_MAPPING[annot[class]] yolo_line f{class_id} { .join(map(str, yolo_bbox))} yolo_lines.append(yolo_line) # 写入YOLO格式标签文件 output_path os.path.join(output_dir, f{frame_id}.txt) with open(output_path, w) as f: f.write(\n.join(yolo_lines))4. 数据集组织与验证合理的目录结构是保证模型训练顺利进行的基础。我们采用YOLOv5推荐的目录结构并添加必要的验证步骤。4.1 标准目录结构UAVDT_YOLO/ ├── images/ │ ├── train/ # 训练集图像 │ ├── val/ # 验证集图像 │ └── test/ # 测试集图像 └── labels/ ├── train/ # 训练集标签 ├── val/ # 验证集标签 └── test/ # 测试集标签数据集划分脚本示例import random from sklearn.model_selection import train_test_split def organize_dataset(image_dir, label_dir, output_root, test_ratio0.2, val_ratio0.1): # 获取所有图像文件不带扩展名 image_files [f.split(.)[0] for f in os.listdir(image_dir) if f.endswith(.jpg)] # 划分train/val/test train_val, test train_test_split(image_files, test_sizetest_ratio, random_state42) train, val train_test_split(train_val, test_sizeval_ratio/(1-test_ratio), random_state42) # 创建目录结构 os.makedirs(os.path.join(output_root, images, train), exist_okTrue) os.makedirs(os.path.join(output_root, images, val), exist_okTrue) os.makedirs(os.path.join(output_root, images, test), exist_okTrue) os.makedirs(os.path.join(output_root, labels, train), exist_okTrue) # ... 其他目录创建 # 移动文件到对应目录 for split, files in [(train, train), (val, val), (test, test)]: for file_id in files: # 移动图像 src_img os.path.join(image_dir, f{file_id}.jpg) dst_img os.path.join(output_root, images, split, f{file_id}.jpg) shutil.copy(src_img, dst_img) # 移动标签 src_label os.path.join(label_dir, f{file_id}.txt) if os.path.exists(src_label): dst_label os.path.join(output_root, labels, split, f{file_id}.txt) shutil.copy(src_label, dst_label)4.2 数据验证与可视化转换完成后强烈建议进行可视化验证以确保标注正确性import cv2 import matplotlib.pyplot as plt def visualize_annotation(image_path, label_path, class_names[car, truck, bus]): img cv2.imread(image_path) img cv2.cvtColor(img, cv2.COLOR_BGR2RGB) h, w img.shape[:2] with open(label_path, r) as f: lines f.readlines() plt.figure(figsize(12, 8)) plt.imshow(img) ax plt.gca() for line in lines: class_id, xc, yc, bw, bh map(float, line.strip().split()) # 转换回绝对坐标 x1 int((xc - bw/2) * w) y1 int((yc - bh/2) * h) x2 int((xc bw/2) * w) y2 int((yc bh/2) * h) rect plt.Rectangle((x1, y1), x2-x1, y2-y1, fillFalse, colorred, linewidth2) ax.add_patch(rect) plt.text(x1, y1-5, class_names[int(class_id)], colorwhite, fontsize12, bboxdict(facecolorred, alpha0.7)) plt.axis(off) plt.show()5. 高级处理技巧与优化在实际应用中我们还可以通过一些高级技巧进一步提升数据质量和训练效果。5.1 处理图像尺寸差异UAVDT数据集中的图像尺寸并不统一常见的有1024×540和1920×1080两种。我们可以通过以下方式处理def get_image_size(image_path): with Image.open(image_path) as img: return img.size # 返回 (width, height) def batch_process_with_varying_sizes(image_dir, label_dir, output_dir): for img_file in os.listdir(image_dir): if not img_file.endswith(.jpg): continue img_path os.path.join(image_dir, img_file) label_path os.path.join(label_dir, os.path.splitext(img_file)[0] .txt) if not os.path.exists(label_path): continue img_w, img_h get_image_size(img_path) generate_yolo_label(label_path, (img_w, img_h), output_dir)5.2 数据增强建议针对无人机视角的特点推荐以下增强策略小目标增强由于航拍图像中目标通常较小建议使用马赛克增强mosaic augmentation适当减小anchor尺寸增加小目标检测层视角变换适度旋转±15度透视变换模拟不同拍摄角度光照适应随机调整亮度、对比度模拟不同天气条件# 示例增强配置YOLOv5格式 augmentation_config # UAVDT专用数据增强配置 hsv_h: 0.015 # 色调增强 hsv_s: 0.7 # 饱和度增强 hsv_v: 0.4 # 亮度增强 degrees: 15 # 旋转角度 translate: 0.1 # 平移 scale: 0.5 # 缩放 shear: 0.0 # 剪切 perspective: 0.0001 # 透视变换 flipud: 0.0 # 垂直翻转 fliplr: 0.5 # 水平翻转 mosaic: 1.0 # 马赛克增强 mixup: 0.1 # MixUp增强 6. 完整流程脚本与使用指南将所有步骤整合为端到端的处理流程并提供详细的参数说明和错误处理机制。6.1 完整转换脚本import os import shutil import cv2 from tqdm import tqdm from PIL import Image from sklearn.model_selection import train_test_split class UAVDT2YOLOConverter: def __init__(self, dataset_root, output_dir): self.dataset_root dataset_root self.output_dir output_dir self.class_mapping {1:0, 2:1, 3:2} # UAVDT到YOLO的类别映射 def convert_sequence(self, seq_name, img_sizeNone): 处理单个场景序列 # 路径设置 img_dir os.path.join(self.dataset_root, seq_name, img1) gt_file os.path.join(self.dataset_root, seq_name, gt, gt_whole.txt) temp_label_dir os.path.join(self.output_dir, temp_labels, seq_name) os.makedirs(temp_label_dir, exist_okTrue) # 获取图像尺寸如果未指定 if img_size is None: sample_img next(f for f in os.listdir(img_dir) if f.endswith(.jpg)) img cv2.imread(os.path.join(img_dir, sample_img)) img_size (img.shape[1], img.shape[0]) # 处理标注文件 frame_annots self._parse_annotations(gt_file) # 生成YOLO格式标签 for frame_idx, annots in tqdm(frame_annots.items(), descfProcessing {seq_name}): self._generate_yolo_label( annots, frame_idx, img_size, os.path.join(temp_label_dir, f{frame_idx:06d}.txt) ) # 复制图像到临时目录 temp_img_dir os.path.join(self.output_dir, temp_images, seq_name) os.makedirs(temp_img_dir, exist_okTrue) for img_file in tqdm(os.listdir(img_dir), descCopying images): if img_file.endswith(.jpg): src os.path.join(img_dir, img_file) dst os.path.join(temp_img_dir, img_file) shutil.copy(src, dst) return temp_img_dir, temp_label_dir def _parse_annotations(self, gt_file): 解析原始标注文件 frame_annots {} with open(gt_file, r) as f: for line in f: parts line.strip().split(,) frame_idx int(parts[0]) if frame_idx not in frame_annots: frame_annots[frame_idx] [] frame_annots[frame_idx].append(line) return frame_annots def _generate_yolo_label(self, annot_lines, frame_idx, img_size, output_path): 生成YOLO格式标签文件 img_w, img_h img_size yolo_lines [] for line in annot_lines: parts line.strip().split(,) bbox list(map(float, parts[2:6])) class_id int(parts[8]) # 过滤无效标注 if class_id not in self.class_mapping: continue # 坐标转换 xc (bbox[0] bbox[2]/2) / img_w yc (bbox[1] bbox[3]/2) / img_h bw bbox[2] / img_w bh bbox[3] / img_h yolo_lines.append( f{self.class_mapping[class_id]} {xc:.6f} {yc:.6f} {bw:.6f} {bh:.6f} ) # 写入文件 with open(output_path, w) as f: f.write(\n.join(yolo_lines)) # 使用示例 converter UAVDT2YOLOConverter(/path/to/UAV-benchmark-M, /path/to/output) sequences [M0201, M0202, M1101, M1102] # 待处理的场景序列 for seq in sequences: converter.convert_sequence(seq)6.2 常见问题排查在实际运行中可能会遇到以下典型问题及解决方案路径错误确保所有路径使用os.path.join构建检查UAVDT数据集是否完整下载内存不足对于大型序列考虑分批处理使用del及时释放不再需要的大对象标注漂移验证图像尺寸是否正确读取检查YOLO坐标是否在[0,1]范围内类别映射错误确认CLASS_MAPPING与数据集实际类别一致处理前统计原始标注中的类别分布def check_class_distribution(gt_file): 检查类别分布情况 class_counts {1:0, 2:0, 3:0} with open(gt_file, r) as f: for line in f: class_id int(line.strip().split(,)[8]) if class_id in class_counts: class_counts[class_id] 1 print(fClass distribution: {class_counts}) # 使用示例 check_class_distribution(/path/to/UAV-benchmark-M/M1102/gt/gt_whole.txt)

从亚稳态到时序收敛：一个IC设计新人的踩坑日记与避坑指南

从亚稳态到时序收敛：一个IC设计新人的踩坑日记与避坑指南那是一个周五的深夜，实验室只剩下我和示波器闪烁的绿光。当温度降到-40℃时，屏幕上本该稳定的信号突然开始疯狂跳动——我的第一个流片项目在低温测试中失败了。作为刚入行的数字IC设…...

2026/5/21 4:41:13 阅读更多 →

UE5运行时动态调整游戏视口：解决UI遮挡导致物体位置偏移的实战方案

UE5运行时动态调整游戏视口：解决UI遮挡导致物体位置偏移的实战方案当你在UE5项目中设计了一个精美的HUD界面，却发现那些半透明的UI元素正在悄悄改变游戏世界的坐标规则——原本应该出现在屏幕中心的角色突然偏离了位置。这不是视觉错觉，而是…...

2026/5/21 4:41:09 阅读更多 →

Zygo沙盒环境配置：安全运行不受信任的脚本

Zygo沙盒环境配置：安全运行不受信任的脚本【免费下载链接】zygomys Zygo is a Lisp interpreter written in 100% Go. Central use case: dynamically compose Go struct trees in a zygo script, then invoke compiled Go functions on those trees. Makes Go ref…...

2026/5/21 4:40:04 阅读更多 →

大彩串口屏在非接触测温仪HMI设计中的实战应用与优势解析

1. 项目概述：串口屏如何重塑非接触测温仪的用户体验在非接触红外测温仪这个看似传统的行业里，用户体验的“最后一公里”往往决定了产品的成败。几年前，我们团队接手一个手持式红外测温仪的项目升级，客户反馈的核心痛点非常集中&am…...

2026/5/21 4:08:59 阅读更多 →

在macOS上运行Windows程序的终极指南：使用Whisky轻松突破系统壁垒

在macOS上运行Windows程序的终极指南：使用Whisky轻松突破系统壁垒【免费下载链接】Whisky A modern Wine wrapper for macOS built with SwiftUI 项目地址: https://gitcode.com/gh_mirrors/wh/Whisky 想要在Apple Silicon Mac上无缝运行Windows专属软件和游…...

2026/5/21 4:08:54 阅读更多 →