yolort.models

Models structure

The models expect a list of Tensor[C, H, W], in the range 0-1. The models internally resize the images but the behaviour varies depending on the model. Check the constructor of the models for more information.

yolort.models.YOLOv5(arch: str | None = None, model: Module | None = None, num_classes: int = 80, pretrained: bool = False, progress: bool = True, size: Tuple[int, int] = (640, 640), size_divisible: int = 32, fixed_shape: Tuple[int, int] | None = None, fill_color: int = 114, **kwargs: Any) None[source]

Wrapping the pre-processing (LetterBox) into the YOLO models.

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes but they will be resized to a fixed size that maintains the aspect ratio before passing it to the backbone.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

  • boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.

  • labels (Int64Tensor[N]): the class label for each ground-truth box

The model returns a Dict[Tensor] during training, containing the classification and regression losses.

During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows, where N is the number of detections:

  • boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.

  • labels (Int64Tensor[N]): the predicted labels for each detection

  • scores (Tensor[N]): the scores for each detection

Example

Demo pipeline for YOLOv5 Inference.

from yolort.models import YOLOv5

# Load the yolov5s version 6.0 models
arch = 'yolov5_darknet_pan_s_r60'
model = YOLOv5(arch=arch, pretrained=True, score_thresh=0.35)
model = model.eval()

# Perform inference on an image file
predictions = model.predict('bus.jpg')
# Perform inference on a list of image files
predictions2 = model.predict(['bus.jpg', 'zidane.jpg'])

We also support loading the custom checkpoints trained from ultralytics/yolov5

from yolort.models import YOLOv5

# Your trained checkpoint from ultralytics
checkpoint_path = 'yolov5n.pt'
model = YOLOv5.load_from_yolov5(checkpoint_path, score_thresh=0.35)
model = model.eval()

# Perform inference on an image file
predictions = model.predict('bus.jpg')
Parameters:
  • arch (string) – YOLO model architecture. Default: None

  • model (nn.Module) – YOLO model. Default: None

  • num_classes (int) – number of output classes of the model (doesn’t including background). Default: 80

  • pretrained (bool) – If true, returns a model pre-trained on COCO train2017

  • progress (bool) – If True, displays a progress bar of the download to stderr

  • size – (Tuple[int, int]): the minimum and maximum size of the image to be rescaled. Default: (640, 640)

  • size_divisible (int) – stride of the models. Default: 32

  • fixed_shape (Tuple[int, int], optional) – Padding mode for letterboxing. If set to True, the image will be padded to shape fixed_shape if specified. Instead the image will be padded to a minimum rectangle to match min_size / max_size and each of its edges is divisible by size_divisible if it is not specified. Default: None

  • fill_color (int) – fill value for padding. Default: 114

Pre-trained weights

The pre-trained models return the predictions of the following classes:

COCO_INSTANCE_CATEGORY_NAMES = [
   'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
   'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
   'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
   'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
   'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
   'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
   'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
   'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
   'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',
   'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
   'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
   'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]
yolort.models.yolov5n(upstream_version: str = 'r6.0', export_friendly: bool = False, **kwargs: Any)[source]
Parameters:
  • upstream_version (str) – model released by the upstream YOLOv5. Possible values are [“r6.0”]. Default: “r6.0”.

  • export_friendly (bool) – Deciding whether to use (ONNX/TVM) export friendly mode. Default: False.

yolort.models.yolov5n6(upstream_version: str = 'r6.0', export_friendly: bool = False, **kwargs: Any)[source]
Parameters:
  • upstream_version (str) – model released by the upstream YOLOv5. Possible values are [“r6.0”]. Default: “r6.0”.

  • export_friendly (bool) – Deciding whether to use (ONNX/TVM) export friendly mode. Default: False.

yolort.models.yolov5s(upstream_version: str = 'r6.0', export_friendly: bool = False, **kwargs: Any)[source]
Parameters:
  • upstream_version (str) – model released by the upstream YOLOv5. Possible values are [“r3.1”, “r4.0”, “r6.0”]. Default: “r6.0”.

  • export_friendly (bool) – Deciding whether to use (ONNX/TVM) export friendly mode. Default: False.

yolort.models.yolov5s6(upstream_version: str = 'r6.0', export_friendly: bool = False, **kwargs: Any)[source]
Parameters:
  • upstream_version (str) – model released by the upstream YOLOv5. Possible values are [“r6.0”]. Default: “r6.0”.

  • export_friendly (bool) – Deciding whether to use (ONNX/TVM) export friendly mode. Default: False.

yolort.models.yolov5m(upstream_version: str = 'r6.0', export_friendly: bool = False, **kwargs: Any)[source]
Parameters:
  • upstream_version (str) – model released by the upstream YOLOv5. Possible values are [“r3.1”, “r4.0”, “r6.0”]. Default: “r6.0”.

  • export_friendly (bool) – Deciding whether to use (ONNX/TVM) export friendly mode. Default: False.

yolort.models.yolov5m6(upstream_version: str = 'r6.0', export_friendly: bool = False, **kwargs: Any)[source]
Parameters:
  • upstream_version (str) – model released by the upstream YOLOv5. Possible values are [“r6.0”]. Default: “r6.0”.

  • export_friendly (bool) – Deciding whether to use (ONNX/TVM) export friendly mode. Default: False.

yolort.models.yolov5l(upstream_version: str = 'r6.0', export_friendly: bool = False, **kwargs: Any)[source]
Parameters:
  • upstream_version (str) – model released by the upstream YOLOv5. Possible values are [“r3.1”, “r4.0”, “r6.0”]. Default: “r6.0”.

  • export_friendly (bool) – Deciding whether to use (ONNX/TVM) export friendly mode. Default: False.

yolort.models.yolov5ts(upstream_version: str = 'r4.0', export_friendly: bool = False, **kwargs: Any)[source]
Parameters:
  • upstream_version (str) – model released by the upstream YOLOv5. Possible values are “r4.0”. Default: “r4.0”.

  • export_friendly (bool) – Deciding whether to use (ONNX/TVM) export friendly mode. Default: False.