FastSAMで動画をマスク表示する – 流山おおたかの森Techブログ

動画データをFastSAMに流してBounding Box表示するサンプルは見つかったのですが、静止画のようにマスク表示するサンプルが無かったのでやってみました。

Table of Contents

インストール@Windows

公式通りの手順では何か所かエラーが出てしまったので、下記の手順でインストールしました。

git clone https://github.com/CASIA-IVA-Lab/FastSAM.git
conda create -n FastSAM python=3.9

管理者権限でpytrochをインストール

(参考:金子邦彦研究室 FastSAM のインストールと動作確認)

conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 cudnn -c pytorch -c nvidia
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"

pythonコマンドが認識されない場合はanacondaのパスを通す。

(C:\Users\自分\anaconda3;C:\Users\自分\anaconda3\condabin)

requirements.txtの編集

既にインストールしたものを削除する。(torch,torchvision)

# Base-----------------------------------
matplotlib>=3.2.2
opencv-python>=4.6.0
Pillow>=7.1.2
PyYAML>=5.3.1
requests>=2.23.0
scipy>=1.4.1
tqdm>=4.64.0

pandas>=1.1.4
seaborn>=0.11.0

gradio==3.35.2

# Ultralytics-----------------------------------
ultralytics == 8.0.120

公式手順に戻る

conda activate FastSAM
cd FastSAM
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git

モデルのダウンロード

mkdir weights

モデルのダウンロードリンクからFastSAM-s.pt(速度重視モデル)とFastSAM-x.pt(精密モデル)をダウンロードし、モデルフォルダに入れる。

実行

Inference.pyを実行してエラーが出なければインストール成功です。

python Inference.py --model_path ./weights/FastSAM-x.pt --img_path ./images/dogs.jpg

エラー対応

エラー | cannot import name ‘COMMON_SAFE_ASCII_CHARACTERS’ が出る場合

chardetをインストールする。(pandas系エラー：PythonでImportError: cannot import name となる場合の対応)

pip install chardet

エラー | OMP: Error #15:Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. が出る場合

暫定処置としてOMPの重複ロードを許可する。(scipy系エラー参考：spaCy)

実行コード(Inference.py)の先頭に下記を追加する。

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

マスク処理結果を動画で表示させる

本題のコードは下記。

マスク描画をしてくれるFastSAMPromptは動画のフレームが読み込まれる度にインスタンス生成されてしまうので念のため一応使い終わったらNoneを代入しています。

import torch
import numpy as np
from fastsam import FastSAM, FastSAMPrompt
import time
import cv2

def main():
    # モデルのロード
    FAST_SAM_CHECKPOINT = "./weights/FastSAM-x.pt"
    print("FAST_SAM_CHECKPOINT:{}".format(FAST_SAM_CHECKPOINT))
    model = FastSAM(FAST_SAM_CHECKPOINT)
    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("DEVICE:{}".format(DEVICE))
    
    # ビデオオープン
    cap = cv2.VideoCapture("./videos/drone720.mp4")
    prev_frame_time = 0
    new_frame_time = 0

    while cap.isOpened():
        ret, frame = cap.read()
        if ret == True:
            new_frame_time = time.time()
            
            # セグメンテーション実行
            everything_results = model(
                frame,
                device=DEVICE,
                retina_masks=True,
                imgsz=1024,
                conf=0.4,
                iou=0.9,
            )

            inference_frame = frame.copy()

            prompt_process = FastSAMPrompt(inference_frame, everything_results)
            ann = prompt_process.everything_prompt()

            if(len(ann)>0):
                result = prompt_process.plot_to_result(
                        annotations=ann,
                        bboxes = None,
                        points = None,
                        point_label = None,
                        withContours= False,
                        better_quality= False,
                    )
            else:
                result = inference_frame

            # FPSの計算
            currentFPS = 1 / (new_frame_time - prev_frame_time)
            prev_frame_time = new_frame_time
            currentFPS = int(currentFPS)
            cv2.putText(result, "FPS : "+str(currentFPS), (380, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (100, 255, 0), 3, cv2.LINE_AA)

            # 結果の表示
            cv2.imshow("input", frame)
            cv2.imshow("Inference", result)

            prompt_process = None
            if cv2.waitKey(25) & 0xFF == ord("q"):
                break

        else:
            break

    # openCV終了処理
    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

実行結果

入力動画の解像度次第でVRAM消費量が変わりますが、参考程度に1280×720での結果をまとめました。

実行環境

Windows11

RAM 64GB

RTX 4090 laptop

FastSAM-x.pt | 1フレーム当たり30~25msec。(VRAM 8GB/RAM 約2GB)

FastSAM-s.pt | 1フレーム当たり5~25msec。(VRAM 10GB/RAM 約2GB)

ランダムカラーリングなので色がバタバタ変わりますが、動画にするとこんな感じの出力になります。

インストール@Windows

管理者権限でpytrochをインストール

requirements.txtの編集

公式手順に戻る

モデルのダウンロード

実行

エラー対応

エラー | cannot import name ‘COMMON_SAFE_ASCII_CHARACTERS’ が出る場合

エラー | OMP: Error #15:Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. が出る場合

マスク処理結果を動画で表示させる

実行結果

sam

コメントを残す コメントをキャンセル

ORB-SLAM3のコードリーディング #3 | メイン処理の概要

ORB-SLAM3のコードリーディング #2 | ORB-SLAM3の使い方

ORB-SLAM3のコードリーディング #1 | 概要把握

ORB-SLAM3をUbuntu22.04[WSL2]でビルドする

jetson XavierとAX210[wifiモジュール]のセットアップ方法まとめ

three.js[r158]によるVMDファイルの切替 | MMD

コメントを残すコメントをキャンセル