Tutorial 0 : getting started

This turial presents the basic usage of deepvisiontools. Each items goes step by step but you have a fully functionnal example in the section Putting everything together.

deepvisiontools is a library, overlay of pytorch, that provides high and low levels functionnalities for training deep learning detection models (bounding boxes and/or instance segmentation).

to install deepvisiontools, make sure to create a new virtual python environnment. deepvisiontools has mostly been tested under python 3.11 and 3.12.9 and therefore should work in this version range.

In your freshly created environnment run

pip install deepvisiontools

Alternatively you can clone the git repo from https://forgemia.inra.fr/ue-apc/librairies/python/deepvisiontools

Let’s get started with deepvisiontools !

Library configuration

deepvisiontools is using an overall configuration parametrization. You can setup some parameters that will control your model training, metrics computation etc.

[1]:

from deepvisiontools import Configuration

config = Configuration(device="cuda", data_type="bbox", num_classes=1)
config.model_confidence_threshold = 0.5
config.model_max_detection = 200

/home/jbernigauds/miniconda3/envs/deepvisiontools_v1/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

As you can see, you can declare your configuration when initializing for the first time the Configuration class. Later on, if you want to modify the configuration you’ll need to change the different attributes/properties of the class (like in the second and third lines example).

The three most important parameters to play with are the ones provided in the first line : - device : can be either “cuda” or “cpu” to either run on your GPU or on your cpu - data_type : can be either “bbox” or “instance_mask”. Here we want to work with bounding boxes. Please note that additional data_types might be included in the future. Note as well that everything that works with bbox will work with instance mask (for example you can use “bbox” on your instance mask dataset) but the contrary is not true (you can always infer bounding boxes from mask but not in the other way). Using instance masks as data_type even if training a bounding box detection model can be useful, in particular for handling augmentations such as rotations for example. - num_classes : is the number of different classes of your objects.

Finally, let’s precise that all parameters in Configuration are detailed in the documentation.

Format, Dataset, Reader and Dataloader

To train a model on your dataset, you need a way to load your images and annotations and provide them to the model. In pytorch this is typically done through the implementation of your own Dataset class. Here we provide an already prepared DeepVisionDataset class. There is a default structure that is recognized for reading your data and it goes as follow :

. Dataset Name
        ├── images
            ├── name01.png
            ├── name02.png
            └── ...
        └── coco_annotations.json

where coco_annotations.json is a json file that contains your annotation as per COCO format (a dict with 3 keys : images, annotations, categories. Each key leads to a list and each element of the list is a dict with various information.). If you wish to adapt to your own dataset structures you can create your own Reader class and pass it as a DeepVisionDataset’s argument, but this require some python knowledge (a tutorial dedicated to this aspect is available).

To create your dataset :

[ ]:

from deepvisiontools import DeepVisionDataset


path_to_dataset = "path/to/dataset"
dataset = DeepVisionDataset(path_to_dataset)

for image, target, image_name in dataset:
    print(type(image), type(target), image_name)
    break

<class 'torch.Tensor'> <class 'deepvisiontools.formats.formats.BboxFormat'> Crop__20TO_IPHARD__20-07-2020__001_1_1_DSC011525__.png

You can see that the dataset will return a triplet of items for each index containing the image as a torch Tensor, the target as a deepvisiontools format and the associated image. We are going to investigate a bit later the format in deepvisiontools, but for now let’s just say that it contains all information of your annotation (here it’s a bounding box).

It’s important to note that DeepVisionDataset are by default preprocessing images (Normalizing as per ImageNet dataset). You can modify this, either implementing your own preprocessing or switching it to None when declaring your dataset or after by modifying the corresponding attribute.

DeepVisionDataset has it’s own useful methods. One of the most useful one is the possibility to randomly split the dataset :

[3]:

print("size of original dataset : ", len(dataset))
train_set, val_set, test_set = dataset.split((0.6, 0.2, 0.2))
print("size of the newly splitted datasets", len(train_set), len(val_set), len(test_set))

size of original dataset :  425
size of the newly splitted datasets 255 85 85

DeepVisionDataset can also be exported to the default structure (a folder containing an image folder and a coco_annotations.json file) by doing

[ ]:

test_set.preprocessing = None
test_set.export_dataset("path/to/export/dataset")    # Careful to not give the same name as original dataset

This function also creates visualizations by default for your exported dataset (you can choose the number of visualization by modifying the corresponding number_visu parameter in the function).

drawing

Now let’s have a look at the Format class in deepvisiontools. The key idea is that all data are handled by dedicated classes that take care of augmentation, padding, cropping, labels managements.

[5]:

img, target, _ = dataset[13]    # Load the 13 item
print("target type : ", type(target))
print("target nb of objects : ", target.nb_object)
print("target canvas_size (size of associated image) : ", target.canvas_size)
print("target data type : ", type(target.data))
print("target data value : ", target.data.value)
print("target objects class labels: ", target.labels)
print("target object scores : ", target.scores)

target type :  <class 'deepvisiontools.formats.formats.BboxFormat'>
target nb of objects :  24
target canvas_size (size of associated image) :  (768, 768)
target data type :  <class 'deepvisiontools.formats.base_data.BboxData'>
target data value :  BoundingBoxes([[  0,   0,  67,  65],
               [ 93,   0, 105,  64],
               [182,   0, 115,  75],
               [309,   0, 120,  69],
               [547,   0,  81,  46],
               [676,   0,  46,  32],
               [711, 190,  56,  77],
               [605, 213,  53,  57],
               [466, 185, 137, 114],
               [360, 187, 120, 124],
               [273, 191,  99, 121],
               [145, 200, 104, 103],
               [  0, 209,  78, 134],
               [107, 443, 113, 113],
               [202, 436, 134, 129],
               [326, 435, 120, 130],
               [453, 447, 103, 101],
               [751, 475,  16,  13],
               [703, 703,  64,  59],
               [589, 705,  77,  62],
               [449, 677, 120,  90],
               [369, 697,  80,  70],
               [135, 708, 104,  59],
               [ 54, 707,  77,  60]], device='cuda:0', format=BoundingBoxFormat.XYWH, canvas_size=(768, 768))
target objects class labels:  tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
target object scores :  None

As you can see, the BboxFormat class has several attributes, and more generally every formats inherits from a BaseFormat class and therefore exhibits the same attributes and methods : - nb_object : is the number of object present in the format. - canvas_size : is the size of the associated image - data : this is the data saved as a particular class, child class of BaseData. To access his actual tensor value you can to format.data.value. Here you can see that the value is a torch.Tensor and more specifically a BoundingBoxes tensor from torchvision. - labels : it’s a 1D torch.Tensor that gives the class of the associated object. - scores : it’s a 1D torch.Tensor that gives the prediction score (confidence) of the associated object. Scores is not None in case of a prediction (output of model).

A Format encapsulate many objects and usually is associated to a given image. They are used in both targets for the models and predictions of the models.

Let’s now have a look at the DeepVisionLoader class. In Pytorch, you create your dataset that will load every image - target from your data and later use a dataloader that will create batches with your dataset. In deepvisiontools we do the same but the loader is already defined so it fits the formats described above.

[6]:

from deepvisiontools import DeepVisionLoader

train_loader = DeepVisionLoader(train_set, batch_size = 4)
val_loader = DeepVisionLoader(val_set, batch_size=4)
for imgs, targs, names in train_loader:
    print(imgs.shape)
    print(type(targs))
    print(targs.formats)
    print(names)
    break

torch.Size([4, 3, 768, 768])
<class 'deepvisiontools.formats.formats.BatchedFormat'>
[<deepvisiontools.formats.formats.BboxFormat object at 0x710c0f2320d0>, <deepvisiontools.formats.formats.BboxFormat object at 0x710c0f232310>, <deepvisiontools.formats.formats.BboxFormat object at 0x710c0f232690>, <deepvisiontools.formats.formats.BboxFormat object at 0x710c0f2327d0>]
{0: 'Crop__21TE42__28-07-2021__Y02X11_DSC02087__.png', 1: 'Crop__23TE43__25-05-2023__Y02X003_DJI_202305251437150646__.png', 2: 'Crop__21TE42__03-08-2021__Y03X25_DSC03342__.png', 3: 'Crop__21TE42__28-07-2021__Y02X34_DSC02059__.png'}

As you can see, each batch of the DeepVisionLoader returns a set of stacked image (you can see that the first dim is 4 which corresponds to the batch_size we chose), a BatchedFormat that contains a format list within its formats attribute and a batch of names as a dictionnary where the key is the index in the batch.

Data augmentation

Data augmentation in deepvisiontools is handled directly in the dataset. You simply needs to provide torchvision.transforms.v2.Transform objects as a list to your dataset and everything will go smoothly. You can also include one of the additional augmentation present in deepvisiontools.

[ ]:

import torchvision.transforms.v2 as T
from deepvisiontools.data.additional_augmentations import RandomPadAndResize

augment = [T.RandomHorizontalFlip(), T.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2), RandomPadAndResize((300, 300, 300, 300), (768, 768), p=0.3)]

augmented_dataset = DeepVisionDataset(path_to_dataset, preprocessing=None, augmentation=augment)
augmented_train, _, _ = augmented_dataset.split((0.1, 0.9, 0.0))
augmented_train.export_dataset("path/to/export/dataset")

Exporting dataset : 100%|██████████| 42/42 [00:44<00:00,  1.05s/it]
Grouping jsons : 100%|██████████| 78/78 [00:00<00:00, 16548.09it/s]

drawing drawing

If you wish to change your augmentation after dataset creation you can simply change the corresponding attribute

[ ]:

augmented_train.augmentation = [ T.RandomRotation(45, expand=True),T.Resize((768, 768))]
augmented_train.export_dataset("/new/exported/dataset")

Exporting dataset : 100%|██████████| 42/42 [00:33<00:00,  1.24it/s]
Grouping jsons : 100%|██████████| 78/78 [00:00<00:00, 16951.07it/s]

drawing

Important : You can see that rotations are not super nice on boxes !! Indeed, they are now larger than the actual object. This was expected : un-oriented Bounding boxes are not invariant under the rotations, being defined by only 2 points in space. Therefore we recommend to not use rotation on bounding boxes. However rotations are perfectly fine when dealing with instance masks (check configuration).

Models, Trainer, metrics and monitoring

We can now reach the core of deepvisiontools functionnality : training and evaluating your models. Most of these aspects are handled by the Trainer class. You need however to choose the deepvisiontools model you wish to train and your favourite pytorch optimizer first.

[9]:

from deepvisiontools.models import Yolo
from torch.optim import Adam

model = Yolo("yolox")
optim = Adam(model.parameters(), lr=1e-4)   # here you tell Adam to optimize your model parameters as well as the used learning rate

Overriding model.yaml nc=80 with nc=1

                   from  n    params  module                                       arguments
  0                  -1  1      2320  ultralytics.nn.modules.conv.Conv             [3, 80, 3, 2]
  1                  -1  1    115520  ultralytics.nn.modules.conv.Conv             [80, 160, 3, 2]
  2                  -1  3    436800  ultralytics.nn.modules.block.C2f             [160, 160, 3, True]
  3                  -1  1    461440  ultralytics.nn.modules.conv.Conv             [160, 320, 3, 2]
  4                  -1  6   3281920  ultralytics.nn.modules.block.C2f             [320, 320, 6, True]
  5                  -1  1   1844480  ultralytics.nn.modules.conv.Conv             [320, 640, 3, 2]
  6                  -1  6  13117440  ultralytics.nn.modules.block.C2f             [640, 640, 6, True]
  7                  -1  1   3687680  ultralytics.nn.modules.conv.Conv             [640, 640, 3, 2]
  8                  -1  3   6969600  ultralytics.nn.modules.block.C2f             [640, 640, 3, True]
  9                  -1  1   1025920  ultralytics.nn.modules.block.SPPF            [640, 640, 5]
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 12                  -1  3   7379200  ultralytics.nn.modules.block.C2f             [1280, 640, 3]
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 15                  -1  3   1948800  ultralytics.nn.modules.block.C2f             [960, 320, 3]
 16                  -1  1    922240  ultralytics.nn.modules.conv.Conv             [320, 320, 3, 2]
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 18                  -1  3   7174400  ultralytics.nn.modules.block.C2f             [960, 640, 3]
 19                  -1  1   3687680  ultralytics.nn.modules.conv.Conv             [640, 640, 3, 2]
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 21                  -1  3   7379200  ultralytics.nn.modules.block.C2f             [1280, 640, 3]
 22        [15, 18, 21]  1   8718931  ultralytics.nn.modules.head.Detect           [1, [320, 640, 640]]
YOLOv8x summary: 365 layers, 68,153,571 parameters, 68,153,555 gradients, 258.1 GFLOPs

Transferred 589/595 items from pretrained weights

Be careful that some models are unusable depending of the data_type you chose in Configuration (for exemple YoloSeg cannot be used on bounding boxes). deepvisiontools did not create the models you are going to use. Please repsect the developpers of the different models and cite them in the proper way in case you want to publish results based on them. Details on every models can be find in the documentation. Specifics models will have their own tutorial as they rely on tricky configurations that requires a bit of knowledge in models architecture.

We can now choose specific metrics to monitor our training. You can choose amoung deepvisiontools list or create your own but this require a bit of work (dedicated tutorial for that).

Once that’s done, you can instantiate your trainer :

[ ]:

from deepvisiontools import Trainer
from deepvisiontools.metrics import DetectF1score, DetectPrecision, DetectRecall

metrics = [DetectF1score(), DetectPrecision(), DetectRecall()]
trainer = Trainer(model, optim, metrics=metrics, log_dir="path/to/logdir")

The log_dir parameter is the directory name that will be created with the monitoring data will be saved. We use tensorboard to monitor the training. If you are using VScode you can install the tensorboard plugin or in your terminal with your python environment activated, where deepvisiontools is installed, go to the location of the log_dir (after starting the training) and simply run

tensorboard --logdir=log_dir

It will print you a localhost https link that you can open to monitor your training (needs to refresh to update during the epochs).

Now we have to write the training loop. To do so let’s ensure first that : - We have a training set with preprocessing and augmentation switched on - We have a valid set with preprocessing on and augmentation off

Then we can simply train our model using :

[11]:

train_set, valid_set, _ = DeepVisionDataset(path_to_dataset, augmentation=augment).split((0.5, 0.2, 0.3))
Nb_epoch = 3   # Number of epochs

valid_set.augmentation = None

train_loader = DeepVisionLoader(train_set, batch_size = 4)
valid_loader = DeepVisionLoader(valid_set, batch_size = 4)

for e in range(Nb_epoch):
    trainer.train_epoch(train_loader, e)
    trainer.valid_epoch(valid_loader, e)

Epoch 0/Train: 100%|██████████| 53/53 [00:23<00:00,  2.26it/s, loss : 2.706 loss_box : 0.207 loss_cls : 0.213 loss_dfl : 0.256 ]
Epoch 0/Valid: 100%|██████████| 22/22 [00:06<00:00,  3.62it/s, loss : 2.145 loss_box : 0.174 loss_cls : 0.136 loss_dfl : 0.244 DetectF1score : 0.927 DetectPrecision : 0.946 DetectRecall : 0.909 ]
Epoch 1/Train: 100%|██████████| 53/53 [00:22<00:00,  2.33it/s, loss : 2.044 loss_box : 0.169 loss_cls : 0.11 loss_dfl : 0.232 ]
Epoch 1/Valid: 100%|██████████| 22/22 [00:05<00:00,  3.72it/s, loss : 2.028 loss_box : 0.172 loss_cls : 0.106 loss_dfl : 0.246 DetectF1score : 0.928 DetectPrecision : 0.968 DetectRecall : 0.89 ]
Epoch 2/Train: 100%|██████████| 53/53 [00:22<00:00,  2.33it/s, loss : 1.964 loss_box : 0.164 loss_cls : 0.096 loss_dfl : 0.231 ]
Epoch 2/Valid: 100%|██████████| 22/22 [00:06<00:00,  3.66it/s, loss : 1.959 loss_box : 0.162 loss_cls : 0.104 loss_dfl : 0.24 DetectF1score : 0.935 DetectPrecision : 0.959 DetectRecall : 0.912 ]

Putting everything together and saving your model

Let’s now put all pieces together.

[ ]:

from deepvisiontools import DeepVisionDataset, DeepVisionLoader, Trainer, Configuration
from deepvisiontools.models import Yolo
from deepvisiontools.data.additional_augmentations import RandomCropAndResize, RandomPadAndResize
from deepvisiontools.metrics import DetectF1score, DetectRecall, DetectPrecision
import torchvision.transforms.v2 as T
from torch.optim import Adam
import torch
from torch.optim.lr_scheduler import ExponentialLR

config = Configuration(device="cuda", data_type="bbox", num_classes=1)  # you can give it the name that you want, the important part is that the class is instanciated at the begining

data_path = "path/to/dataset"    # path to your dataset

augment = [
           T.RandomHorizontalFlip(), # randomly flip image
           T.RandomVerticalFlip(), # randomly flip image
           T.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2), # change colorimetry
           T.RandomGrayscale(p=0.05), # transform to gray scale on 5% of time
           RandomCropAndResize((650, 650), (768, 768), p=0.1), # randomly crop then resize to original img size (768, 768)
           RandomPadAndResize(150, (768, 768), p=0.1)   # Randomly pad then resize
           ]

train_set, valid_set, test_set = DeepVisionDataset(data_path, augmentation = augment).split((0.6, 0.2, 0.2))    # Create your train, valid, test sets
valid_set.augmentation = None   # switch off augmentation on valid set

train_loader = DeepVisionLoader(train_set, batch_size = 4)  # create the loaders
valid_loader = DeepVisionLoader(valid_set, batch_size = 4)

model = Yolo("yolon")   # choose a model and a specific architecture

optim = Adam(model.parameters(), lr = 1e-4) # create a weight optimizer with learning rate (lr) = 1e-4

scheduler = ExponentialLR(optim, gamma = 0.95) # Exponential decrease of lr per epoch

metrics = [DetectF1score(), DetectRecall(), DetectPrecision()]  # create a metric list to be used

trainer = Trainer(model, optim, metrics = metrics, log_dir="path/to/logdir/training_model")    # create the trainer

# ======== Training loop ========
N_epoch = 5 # number of epoch
best_loss = None    # used to save best model
for e in range(N_epoch):
    trainer.train_epoch(train_loader, e)    # train epoch e
    epoch_dict = trainer.valid_epoch(valid_loader, e)   # valid epoch e -> generate an epoch dict containing metrics and losses
    scheduler.step()    # Activate decrease of lr
    loss = epoch_dict["loss"]   # extract loss from dict
    if best_loss == None : # initialize the best loss
        best_loss = loss
    elif loss < best_loss:  # if the loss is smaller than the current best loss save the model
        torch.save(model, "/path/to/save/model/best_model.pth")

/home/jbernigauds/miniconda3/envs/deepvisiontools_v1/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Overriding model.yaml nc=80 with nc=1

                   from  n    params  module                                       arguments
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
 22        [15, 18, 21]  1    751507  ultralytics.nn.modules.head.Detect           [1, [64, 128, 256]]
YOLOv8n summary: 225 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolon.pt to 'yolon.pt'...

100%|██████████| 6.25M/6.25M [00:00<00:00, 11.0MB/s]

Transferred 319/355 items from pretrained weights

Epoch 0/Train: 100%|██████████| 64/64 [00:13<00:00,  4.86it/s, loss : 3.699 loss_box : 0.287 loss_cls : 0.353 loss_dfl : 0.287 ]
Epoch 0/Valid: 100%|██████████| 22/22 [00:03<00:00,  7.05it/s, loss : 3.805 loss_box : 0.244 loss_cls : 0.494 loss_dfl : 0.245 DetectF1score : 0.098 DetectRecall : 0.052 DetectPrecision : 1.0 ]
Epoch 1/Train: 100%|██████████| 64/64 [00:11<00:00,  5.39it/s, loss : 2.715 loss_box : 0.227 loss_cls : 0.21 loss_dfl : 0.245 ]
Epoch 1/Valid: 100%|██████████| 22/22 [00:03<00:00,  5.56it/s, loss : 2.76 loss_box : 0.233 loss_cls : 0.235 loss_dfl : 0.245 DetectF1score : 0.774 DetectRecall : 0.635 DetectPrecision : 0.991 ]
Epoch 2/Train: 100%|██████████| 64/64 [00:12<00:00,  5.23it/s, loss : 2.547 loss_box : 0.215 loss_cls : 0.186 loss_dfl : 0.239 ]
Epoch 2/Valid: 100%|██████████| 22/22 [00:04<00:00,  5.21it/s, loss : 2.553 loss_box : 0.231 loss_cls : 0.184 loss_dfl : 0.244 DetectF1score : 0.883 DetectRecall : 0.801 DetectPrecision : 0.984 ]
Epoch 3/Train: 100%|██████████| 64/64 [00:11<00:00,  5.37it/s, loss : 2.459 loss_box : 0.209 loss_cls : 0.172 loss_dfl : 0.237 ]
Epoch 3/Valid: 100%|██████████| 22/22 [00:04<00:00,  5.32it/s, loss : 2.449 loss_box : 0.223 loss_cls : 0.17 loss_dfl : 0.24 DetectF1score : 0.892 DetectRecall : 0.818 DetectPrecision : 0.981 ]
Epoch 4/Train: 100%|██████████| 64/64 [00:11<00:00,  5.41it/s, loss : 2.403 loss_box : 0.207 loss_cls : 0.163 loss_dfl : 0.234 ]
Epoch 4/Valid: 100%|██████████| 22/22 [00:04<00:00,  5.28it/s, loss : 2.347 loss_box : 0.21 loss_cls : 0.159 loss_dfl : 0.237 DetectF1score : 0.904 DetectRecall : 0.844 DetectPrecision : 0.974 ]

Don’t forget that you can monitor your training via tensorboard (the following is a saved image, numbers might not match …)

drawing

Inference

We can now use our model to do some predictions using deepvisiontools. To do so we are going to use the Predictor class. Note that the Predictor includes a preprocessing by default, so you need to setup the same as your training set. In our little example we are going to use the test_set from the previous piece of code and switching off the preprocessing (by default in deepvisiontools predictor and dataset have the same).

[ ]:

from deepvisiontools import Predictor

test_set.preprocessing = None   # switching off preprocessing of dataset we are going to use the predictor one
test_set.augmentation = None
image, target, _ = test_set[15] # load image and target

model = torch.load("path/to/model/best_model.pth")   # loading model
predictor = Predictor(model)    # creating the predictor with no preprocessing (image already preprocessed)

result = predictor.predict(image)   # result is a format object (as discussed previously in the first section)

print("number of predicted objects : ", result.nb_object) # number of predicted objects

# let's create a visualization !

_ = predictor.predict(image, visu_path="/path/to/visu/visu.png")

number of predicted objects :  16

Here you got the visualization :

drawing

As you can see the model still makes mistakes. Of course you can now use larger models, more data, better augmentation and play with others hyper-parameters ! Have fun !