Models

Detectools models inherits from abstract class detectools.BaseModel that define necessary functions to be use in trainning and inference process (through Trainer & Predictor classes). They also inheriths from a class from the packages they come from (i.e. ultralytics or hugging face) to be used in development.

class detectools.models.Mask2Former(num_classes: int = 1, pretrain: Literal['large', 'medium', 'small', 'tiny'] = 'tiny', overlap_mask_thr: float = 0.8)[source]

Mask2Former model class in detectools. This class inheriths from Mask2FormerForUniversalSegmentation (HuggingFace, transformers) and BaseModel (detectools). Construct Mask2Former model from huggingface/transformer model architectures.

Parameters:
  • num_classes (int, optional) – Number of classes. Defaults to 1.

  • pretrain (Literal['large', 'medium', 'small', 'tiny'], optional) – Size of the pretrained model. Defaults to “tiny”.

  • overlap_mask_thr (float, optional) – Mask threshold to merge masks from Mask2FormerOutput. Defaults to 0.8.

Attributes:

confidence_thr

Confidence score threshold to consider object as true prediction.

Type:

float

max_detection

Maximum number of object to predict on one image.

Type:

int

nms_threshold

IoU threshold to consider 2 boxes as overlapping for Non Max Suppression algorithm.

Type:

float

num_classes

Number of classes.

Type:

int

size_configs

Dict of existing depth configuration for Mask2Former.

Type:

Dict[str, str]

Methods:

build_boxes(masks: Tensor) Tensor[source]

Build boxes from segmentation mask.

Parameters:

masks (Tensor) – Segmentation mask.

Returns:

  • Boxes (N, 4).

Return type:

Tensor

build_results(raw_outputs: Mask2FormerForUniversalSegmentationOutput, spatial_size: Tuple[int, int]) BatchedFormats[source]

Transform model outputs into BatchedFormats for results.

Parameters:
  • raw_outputs (Mask2FormerForUniversalSegmentationOutput) – Mask2Former output.

  • spatial_size (Tuple[int, int]) – Size of original image (H, W).

Returns:

  • Model output as BatchedFormats.

Return type:

BatchedFormats

get_predictions(images: Tensor) BatchedFormats[source]

Prepare images, Apply model forward pass and build results.

Parameters:

images (Tensor) – RGB images Tensor.

Returns:

  • Predictions for images as BatchedFormats.

Return type:

BatchedFormats

inputs_to_device(input: Any, device: Literal['cpu', 'cuda'])[source]

Send Mask2Former inputs to device.

prepare(images: Tensor, targets: BatchedFormats | None = None) Dict[str, Tensor | Dict[Any, Any]][source]

Transform images and targets into Mask2Former specific format for prediction & loss computation.

Parameters:
  • images (Tensor) – Batch images.

  • targets (BatchedFormats, optional) – Batched targets from DetectionDataset.

Returns:

  • Images data prepared for Mask2Former.

  • If targets: images + targets prepared for Mask2Former.

Return type:

Union[Any, Tuple[Any]]

prepare_target(target: SegmentationFormat) Tuple[Tensor, Dict[int, int]][source]

Prepare targets for Mask2Former model.

Parameters:

target (SegmentationFormat) – Target.

Returns:

  • Segmentation map.

  • Dict of correspondance {object_id : object_label}.

Return type:

Tuple[Tensor, Dict[int, int]]

run_forward(images: Tensor, targets: BatchedFormats, predict: bool = False) Dict[str, Tensor] | Tuple[Dict[str, Tensor], BatchedFormats][source]

Compute loss from images and if target passed, compute loss & return both loss dict and results.

Parameters:
  • images (Tensor) – Batch RGB images.

  • targets (BatchedFormats) – Batch targets.

  • predict (bool, optional) – To return predictions or not. Defaults to False.

Returns:

  • Loss dict.

  • If predict: Predictions.

Return type:

Union[Dict[str, Tensor], Tuple[Dict[str, Tensor], BatchedFormats]]

to_device(device: Literal['cpu', 'cuda'])[source]

Send model to device.

Parameters:

device (Literal['cpu', 'cuda']) – Device to send model on.

class detectools.models.YoloDetection(architecture: str = 'yolov8m', num_classes: int = 1, pretrained=True, confidence_thr: float = 0.5, max_detection: int = 300, nms_threshold: float = 0.45, *args, **kwargs)[source]

YOLO detection model class in detectools. This class inheriths from DetectionModel (Ultralytics) and BaseModel (detectools). Load yolo architecture from ultralytics repository. If pretrained load a pretrain model from ultralytics.

Parameters:
  • architecture (str, optional) – Architecture to use to build YOLO model. Check Ultralytics availables architectures . Defaults to “yolov8m”.

  • num_classes (int, optional) – Number of classes in the task. Defaults to 1.

  • pretrained (bool, optional) – To use pretrained weights. Defaults to True.

  • confidence_thr (float, optional) – Confidence score threshold to consider object as true prediction. Defaults to 0.5.

  • max_detection (int, optional) – Maximum number of object to predict on one image. Defaults to 300.

  • nms_threshold (float, optional) – IoU threshold to consider 2 boxes as overlapping for Non Max Suppression algorithm.. Defaults to 0.45.

Attributes:

confidence_thr

Confidence score threshold to consider object as true prediction.

Type:

float

max_detection

Maximum number of object to predict on one image.

Type:

int

nms_threshold

IoU threshold to consider 2 boxes as overlapping for Non Max Suppression algorithm.

Type:

float

num_classes

Number of classes.

Type:

int

Methods:

build_results(raw_outputs: List[Tensor], prebuild_outputs: Tensor) BatchedFormats[source]

Transform model outputs into Batch DetectionFormat for results.

Parameters:
  • raw_outputs (List[Tensor]) – Model outputs.

  • prebuild_outputs (Tensor) – Extracted boxes from YOLO raw outputs.

Returns:

  • Batched predictions.

Return type:

BatchedFormats

compute_loss(raw_outputs: Tensor, targets: Dict[str, Tensor]) Dict[str, Tensor][source]

Compute loss with predictions & targets.

Parameters:
  • raw_outputs (Any) – Raw output of model.

  • targets (DetectionFormat) – Targets in YOLO format.

Returns:

  • Loss dict with total loss (key: “loss”) & sublosses.

Return type:

Dict[str, Tensor]

get_predictions(images: Tensor) BatchedFormats[source]

Prepare images, Apply YOLO forward pass and build results.

Parameters:

images (Tensor) – RGB images Tensor.

Returns:

  • Predictions for images as BatchedFormats.

Return type:

BatchedFormats

prepare(images: Tensor, targets: BatchedFormats | None = None) Tensor | Tuple[Tensor, Dict[str, Tensor]][source]

Transform images and targets into YOLO specific format for prediction & loss computation.

Parameters:
  • images (Tensor) – Batch images.

  • targets (BatchedFormats, optional) – Batched targets from DetectionDataset.

Returns:

  • Images data prepared for YOLO.

  • If targets: images + targets prepared for YOLO.

Return type:

Union[Tensor, Tuple[Tensor, Dict[str, Tensor]]]

prepare_image(images: Tensor) Tuple[Tensor, Tuple[int]][source]

Pad images if needed & return padding values.

Parameters:

images (Tensor) – Batch_images.

Returns:

  • Padded images.

  • Padding values.

Return type:

Tuple[Tensor, Tuple[int]]

prepare_target(targets: BatchedFormats) Dict[str, Tensor][source]

Transform DetectionFormat targets into yolo targets format.

Parameters:

targets (BatchedFormats) – Batch targets.

Returns:

  • Targets in YOLO format.

Return type:

Dict[str, Tensor]

retrieve_spatial_size(raw_outputs: List[Tensor]) Tuple[int][source]

Retrieve image shape from raw_outputs and stride values.

Parameters:

raw_outputs (List[Tensor]) – Raw ouptuts from YOLO model.

Returns:

  • Size of input image (H, W).

Return type:

Tuple[int]

run_forward(images: Tensor, targets: BatchedFormats, predict: bool = False) Dict[str, Tensor] | Tuple[Dict[str, Tensor], BatchedFormats][source]

Compute loss from images and if target passed, compute loss & return both loss dict and results.

Parameters:
  • images (Tensor) – Batch RGB images.

  • targets (BatchedFormats) – Batch targets.

  • predict (bool, optional) – To return predictions or not. Defaults to False.

Returns:

  • Loss dict.

  • If predict: predictions.

Return type:

Union[Dict[str, Tensor], Tuple[Dict[str, Tensor], BatchedFormats]]

to_device(device: Literal['cpu', 'cuda'])[source]

Send model & criterion to device.

Parameters:

device (Literal['cpu', 'cuda']) – Device to send model on.

yolo_pad_requirements(input_object: Tensor | DetectionFormat) List[int][source]

Return values for padding to fit ‘divisible by 32’ requirement.

Parameters:

input_object (Union[Tensor, DetectionFormat]) – Input to pad (image or DetectionFormat).

Returns:

  • Padding values.

Return type:

List[int]

class detectools.models.Yolov8Segmentation(architecture: str = 'yolov8n-seg', pretrained=True, confidence_thr: float = 0.5, max_detection: int = 300, nms_threshold: float = 0.45, num_classes: int = 1, *args, **kwargs)[source]

YOLO segmentation model class in detectools. This class inheriths from SegmentationModel (Ultralytics) and BaseModel (detectools). Load yolo architecture from ultralytics repository. If pretrained load a pretrain model from ultralytics.

Parameters:
  • architecture (str, optional) – Architecture to use to build YOLO model. Check Ultralytics availables architectures . Defaults to “yolov8m”.

  • num_classes (int, optional) – Number of classes in the task. Defaults to 1.

  • pretrained (bool, optional) – To use pretrained weights. Defaults to True.

  • confidence_thr (float, optional) – Confidence score threshold to consider object as true prediction. Defaults to 0.5.

  • max_detection (int, optional) – Maximum number of object to predict on one image. Defaults to 300.

  • nms_threshold (float, optional) – IoU threshold to consider 2 boxes as overlapping for Non Max Suppression algorithm.. Defaults to 0.45.

Attributes:

confidence_thr

Confidence score threshold to consider object as true prediction.

Type:

float

max_detection

Maximum number of object to predict on one image.

Type:

int

nms_threshold

IoU threshold to consider 2 boxes as overlapping for Non Max Suppression algorithm.

Type:

float

num_classes

Number of classes.

Type:

int

Methods:

build_results(raw_output: Tuple[Tensor, ...]) BatchedFormats[source]

Transform model outputs into Batch SegmentationFormat for results.

Parameters:
  • raw_outputs (List[Tensor]) – Model outputs.

  • prebuild_outputs (Tensor) – Extracted boxes from YOLO raw outputs.

Returns:

  • Batched predictions.

Return type:

BatchedFormats

compute_loss(predictions: Tuple, target: Dict) Dict[str, Tensor][source]

Compute loss with predictions & targets.

Parameters:
  • raw_outputs (Any) – Raw output of model.

  • targets (DetectionFormat) – Targets in YOLO format.

Returns:

  • Loss dict with total loss (key: “loss”) & sublosses.

Return type:

Dict[str, Tensor]

get_predictions(images: Tensor) BatchedFormats[source]

Prepare images, Apply YOLO forward pass and build results.

Parameters:

images (Tensor) – RGB images Tensor.

Returns:

  • Predictions for images as BatchedFormats.

Return type:

BatchedFormats

mask2yolo(mask: Tensor) Tensor[source]

Convert stacked binary to yolo mask, i.e (1, h, w) with values in [0, … , Nobjs] This shape is suitable for yolov8 loss.

Parameters:

mask (Tensor) – Stacked binary mask (N, H, W).

Returns:

  • YOLO segmentation mask.

Return type:

Tensor

prebuild_output(raw_output: Tuple[Tensor, ...]) Tuple[Tensor, ...][source]

Unpack Yolov8-seg (eval mode) raw results.

Parameters:

raw_output (Tuple[Tensor, ...]) – Yolov8 raw eval mode results.

Returns:

  • boxes (N_batch, N_obj, cxcywh).

  • cls_scores (N_batch, N_cls).

  • mask_weights (N_batch, N_obj, 32).

  • protos (N_batch, protos).

Return type:

Tuple[Tensor, ...]

prepare(images: Tensor, targets: BatchedFormats | None = None) Any | Tuple[Any][source]

Transform images and targets into YOLO specific format for prediction & loss computation.

Parameters:
  • images (Tensor) – Batch images.

  • targets (BatchedFormats, optional) – Batched targets from DetectionDataset.

Returns:

  • Images data prepared for YOLO.

  • If targets: images + targets prepared for YOLO.

Return type:

Union[Tensor, Tuple[Tensor, Dict[str, Tensor]]]

prepare_image(images: Tensor) Tuple[Tensor, int][source]

Pad images if needed & return padding values.

Parameters:

images (Tensor) – Batch_images.

Returns:

  • Padded images.

  • Padding values.

Return type:

Tuple[Tensor, Tuple[int]]

prepare_target(target: BatchedFormats) Dict[str, Tensor][source]

Transform SegmentationFormat targets into yolo-seg targets format.

Parameters:

targets (BatchedFormats) – Batch targets.

Returns:

  • Targets in YOLO format.

Return type:

Dict[str, Tensor]

proto2mask(protos: Tensor, weights: Tensor, boxes: Tensor, shape: Tuple[int]) Tensor[source]

Combine protos and weights to get masks, then crop instances from boxes (Useful in predictions).

Parameters:
  • protos (Tensor) – Sub masks (32, …).

  • weights (Tensor) – YOLO mask weights (32, …).

  • boxes (Tensor) – Boxes (N, 4) in XYXY format.

  • shape (Tuple[int]) – Original image size (H, W).

Returns:

  • YOLO segmentation mask.

Return type:

Tensor

retrieve_spatial_size(raw_outputs: List[Tensor]) Tuple[int, int][source]

Retrieve image shape from raw_outputs and stride values.

Parameters:

raw_outputs (List[Tensor]) – Raw ouptuts from YOLO model.

Returns:

  • Size of input image (H, W).

Return type:

Tuple[int]

run_forward(images: Tensor, targets: BatchedFormats, predict: bool = False) Dict[str, Tensor] | Tuple[Dict[str, Tensor], BatchedFormats][source]

Compute loss from images and if target passed, compute loss & return both loss dict and results.

Parameters:
  • images (Tensor) – Batch RGB images.

  • targets (BatchedFormats) – Batch targets.

  • predict (bool, optional) – To return predictions or not. Defaults to False.

Returns:

  • Loss dict.

  • If predict: predictions.

Return type:

Union[Dict[str, Tensor], Tuple[Dict[str, Tensor], BatchedFormats]]

to_device(device: Literal['cpu', 'cuda'])[source]

Send model & criterion to device.

Parameters:

device (Literal['cpu', 'cuda']) – Device to send model on.

yolo_pad_requirements(input_object: Tensor | SegmentationFormat) Tuple[int, ...][source]

Return values for padding to fit ‘divisible by 32’ requirement.

Parameters:

input_object (Union[Tensor, DetectionFormat]) – Input to pad (image or DetectionFormat).

Returns:

  • Padding values.

Return type:

List[int]