DeepSpeed[v0.11.2]をWindowsにインストールする方法

今回はDeepSpeedのv0.11.2をビルド&インストールする方法を紹介します。

ビルドがめんどくさい人はv0.11.1のwhlファイルが公開されているのでDLしてpip installしてください。[deepspeed-0.11.1+e9503fe-cp311-cp311-win_amd64.rar.zip]

(参考情報 : Installing DeepSpeed on Windows)

(v0.8.3のインストール方法はこちら)

テスト環境

OS : Windows 10

CUDA : 11.8

Visual C++ build tools : Visual Studio 2022 community

Python : 3.11.7 (3.10系でも動くそう, Anaconda環境下)

PyTroch : 2.1.1

※CUDA系のパス(CUDA_HOME / CUDA_PATH)は通ってる前提です。

1 : DeepSpeedのclone

git clone –branch v0.11.2 https://github.com/microsoft/DeepSpeed.git

2 : build_win.batの編集

set DS_BUILD_EVOFORMER_ATTN=0

を”set DS_BUILD_SPARSE_ATTN=0″の後ろに追加

3 : コードの修正

・DeepSpeed/csrc/quantization/pt_binding.cpp

244-250行目を下記に修正

    std::vector<int64_t> sz_vector(input_vals.sizes().begin(), input_vals.sizes().end());
    sz_vector[sz_vector.size() - 1] = sz_vector.back() / devices_per_node;  // num of GPU per nodes
    at::IntArrayRef sz(sz_vector);
    auto output = torch::empty(sz, output_options);

    const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node;
    const int elems_per_in_group = elems_per_in_tensor / (in_groups / devices_per_node);
    const int elems_per_out_group = elems_per_in_tensor / out_groups;

・DeepSpeed/csrc/transformer/inference/csrc/pt_binding.cpp

541-542行目を下記に修正

									 {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
									  static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),

550-551行目を下記に修正

						 {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
						  static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),

1581行目を下記に修正

		at::from_blob(intermediate_ptr, {input.size(0), input.size(1), static_cast<int64_t>(mlp_1_out_neurons)}, options);

・DeepSpeed/deepspeed/env_report.py

10行目に下記を追加

import psutil

83-100行目の関数を下記で置き換え(10行目に追加されているので、行数ズレに注意)

def get_shm_size():
    try:
        temp_dir = os.getenv('TEMP') or os.getenv('TMP') or os.path.join(os.path.expanduser('~'), 'tmp')
        shm_stats = psutil.disk_usage(temp_dir)
        shm_size = shm_stats.total
        shm_hbytes = human_readable_size(shm_size)
        warn = []
        if shm_size < 512 * 1024**2:
            warn.append(
                f" {YELLOW} [WARNING] Shared memory size might be too small, consider increasing it. {END}"
            )
            # Add additional warnings specific to your use case if needed.
        return shm_hbytes, warn
    except Exception as e:
        return "UNKNOWN", [f"Error getting shared memory size: {e}"]

4 : ビルド実行

管理者権限でコマンドプロンプトを開き、build_win.batを実行。(結構待つ)

5 : install

ビルドに成功するとdistフォルダにwhlファイルが生成されるので、pip installする。

cd dist

pip install deepspeed-0.11.2+f0604078-cp311-cp311-win_amd64.whll

※.whlのファイル名は環境によって違います。

DeepSpeed[v0.11.2]をWindowsにインストールする方法

テスト環境

1 : DeepSpeedのclone

2 : build_win.batの編集

3 : コードの修正

4 : ビルド実行

5 : install

sam

コメントを残すコメントをキャンセル

テスト環境

1 : DeepSpeedのclone

2 : build_win.batの編集

3 : コードの修正

4 : ビルド実行

5 : install

sam

コメントを残す コメントをキャンセル

ORB-SLAM3のコードリーディング #3 | メイン処理の概要

ORB-SLAM3のコードリーディング #2 | ORB-SLAM3の使い方

ORB-SLAM3のコードリーディング #1 | 概要把握

ORB-SLAM3をUbuntu22.04[WSL2]でビルドする

jetson XavierとAX210[wifiモジュール]のセットアップ方法まとめ

three.js[r158]によるVMDファイルの切替 | MMD

コメントを残すコメントをキャンセル