Go back
Title: DEVELOPING A DEEP LEARNING MODEL FOR SCENE INTERPRETATION IN TOTAL LAPAROSCOPIC HYSTERECTOMY USING A SMALL VIDEO DATASET
e-poster Number: EP 116
Category: Endoscopy and Gynaecologic Surgery
Author Name: Dr. Naoki Ito
Institute:
Co-Author Name:
Abstract :
Introduction Task automation is an emerging need to help busy surgeons focus on their essential duties. Reviewing surgical videos is a time-consuming task that can benefit from automation. Aims and Objectives Our goal was to develop a deep learning model for surgical scene understanding to enable scene extraction or highlighting in surgical videos, facilitating efficient review by surgeons. We focused on total laparoscopic hysterectomy (TLH) as a proof of concept. Materials and Methods We collected 36 TLH surgery videos from Inselspital, Switzerland, and divided them into 21 training, 7 evaluation, and 8 test datasets. Board-certified gynecologists classified all frames into seven categories: ?Preparation,? ?Adhesiolysis, dissection, and mobilization,? ?Colpotomy,? ?Specimen removal,? ?Washing and hemostasis,? ?Vaginal vault closure,? and ?Other.? We trained a deep learning model combining a Vision Transformer and a temporal convolutional network on the training data, optimized hyperparameters using the evaluation data, and assessed performance on the test data. Results The model achieved a top-1 scene label prediction accuracy of 75.3% (95% CI: 74.9%?75.8%). Sensitivity exceeded 90% for ?Preparation? (95%) and ?Vaginal vault closure? (93%), while ?Colpotomy? had a sensitivity of 50%, with 44% of ?Colpotomy? frames misclassified as ?Adhesiolysis, dissection, and mobilization.? The area under the receiver operating characteristic curve (AUROC) was over 0.90 for all categories except ?Other.? Qualitative analysis indicated that mispredictions often occurred when short-duration scene labels were inserted in the ground truth or predictions. Conclusions We developed and evaluated a deep learning model for scene understanding in TLH using a small dataset. Despite the limited data, the model demonstrated promising performance. We anticipate that increasing the dataset size will significantly enhance the model's accuracy and enable application to other surgical procedures. Improved scene understanding can streamline surgical video reviews and contribute to better surgical outcomes for patients.