Personal Project

Anomaly Detection Model for Novel Antarctic Species Discovery

Overview

I am fascinated by embedding models, so for this project I used the PatchCore method to develop an unsupervised anomaly detection tool with the aim of identifying potentially new species in subsea towed camera imagery. The work builds on research conducted by British Antarctic Survey using imagery from the Weddell Sea Benthic Dataset, collected during expedition PS118 aboard the RV Polarstern.

The model was later tested on imagery from other transects collected during the same cruise, where it successfully highlighted several unusual organisms either not present or not prevalent in the training set. Whether any of these are genuinely undescribed species, I have no idea.

How It Works

PatchCore learns the visual characteristics of normal imagery without requiring large labelled datasets of anomalies or defects. I implemented the model using Anomalib, an open-source library for unsupervised anomaly detection and localisation. A pretrained neural network extracts feature embeddings from image patches, in this case using layers 2 and 3 of a pretrained ResNet backbone. A subset of these embeddings are stored as a representation of normal appearance.

During inference, new image patches are compared against this memory bank, and regions that differ significantly are assigned higher anomaly scores. These scores are then reconstructed into pixel-level anomaly heatmaps, allowing unusual features to be localised within an image.

Training Set

Researchers at BAS previously used imagery from expedition PS118 to train an object detection model for large-scale biodiversity monitoring. The dataset comprises 100 high-resolution OFOBS images from PS118, standardised into 25 morphotypes for object detection models. As part of this work, a team of expert ecologists spent over 800 hours labelling 100 images selected for their ecological diversity.

My assumption was that if genuinely novel organisms existed within these images, they likely would have already been identified. I therefore used this labelled dataset as the model's normal training set.

Normal Antarctic seabed training image Normal Antarctic seabed training image Normal Antarctic seabed training image Normal Antarctic seabed training image
Examples from the normal training set used to build the PatchCore memory bank.

Testing

As a sanity check, I first ran the training imagery back through the anomaly detection pipeline. Surprisingly, several strong anomalies were still identified. The most visually striking was this purple elasipodid. There is only a single example of an organism like this within the training data.

Purple elasipodid highlighted by the anomaly detection pipeline.

Testing On New Transects

To test the method further, I applied it to imagery not used for training and collected during cruise sections 69-1 & 6-9. Initially, many high-scoring anomalies turned out to be blurry imagery, poorly illuminated frames, or images taken too far from the seabed to contain useful ecological detail.

I refined the test dataset using the image metadata, filtering for similar depths and geographic locations to those used in training. This improved the proportion of in-distribution imagery used during inference. Given my fairly modest GPU resources I only tested around 300 images, but there were still some interesting detections.

Suspected worm highlighted by the anomaly detection pipeline.

The strongest result was the identification of what I believe to be a worm, as I could not find anything visually similar within the training set. Worm tubes were present, but no visible organisms themselves.

Other organisms identified as anomalies.

The model also identified a less colourful elasipodid, a ring-shaped anemone, and what I think is a sea pen. These were less anomalous overall because related examples existed within the training data, but only in very small numbers. Out of roughly 31,000 organisms used during training, there were fewer than 20 similar anemones, around 10 sea pens, and only two comparable elasipodids.

Next Steps

I am currently exploring several ways to improve the method:

  • Balancing the training data so that equal numbers of patches capture seabed and benthic organisms.
  • Augmenting patches containing underrepresented organisms to improve their representation within the memory bank.
  • Training on much larger volumes of unfiltered cruise imagery rather than expert pre-screened datasets. Rare organisms still tend to be flagged as anomalous even when present in training, so this may still allow genuinely unusual species to be identified.

References

  1. K. Roth, L. Pemula, J. Zepeda, B. Schoelkopf, T. Brox, and P. V. Gehler, "Towards Total Recall in Industrial Anomaly Detection," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 14298-14308. doi: 10.1109/CVPR52688.2022.01392.
  2. S. Akcay, D. Ameln, A. Vaidya, B. Lakshmanan, N. Ahuja, and U. Genc, "Anomalib: A Deep Learning Library for Anomaly Detection," arXiv:2202.08341, 2022. doi: 10.48550/arXiv.2202.08341.
  3. C. Trotter, H. J. Griffiths, T. M. Khan, and R. J. Whittle, "Automated Detection of Antarctic Benthic Organisms in High-Resolution In Situ Imagery to Aid Biodiversity Monitoring," in 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 2066-2076. IEEE Xplore.
  4. C. Trotter, H. J. Griffiths, T. M. Khan, A. Purser, and R. J. Whittle, The Weddell Sea Benthic Dataset: A computer vision-ready object detection dataset for in situ benthic biodiversity monitoring model development, Version 1.0, NERC EDS UK Polar Data Centre, 2025. doi: 10.5285/1ba97e4b-efb7-460b-9f2d-90437e33ce09.
  5. A. Purser, L. Hehemann, S. Dreutter, B. Dorschel, and A. Nordhausen, OFOBS seafloor images from the Antarctic Peninsula and Powell Basin, collected during RV POLARSTERN cruise PS118, PANGAEA, 2020. doi: 10.1594/PANGAEA.911904.