► 程式碼範例 / Keras 快速入門 / 使用 TFServing 部署 TensorFlow 模型

使用 TFServing 部署 TensorFlow 模型

作者： Dimitre Oliveira
建立日期 2023/01/02
上次修改日期 2023/01/02
描述： 如何使用 TensorFlow Serving 部署 TensorFlow 模型。

ⓘ 這個範例使用 Keras 3

簡介

一旦您建置了機器學習模型，下一步就是部署它。您可能會想將模型公開為端點服務。您可以使用許多框架來執行此操作，但 TensorFlow 生態系統有其自己的解決方案，稱為 TensorFlow Serving。

來自 TensorFlow Serving GitHub 頁面

TensorFlow Serving 是一個靈活、高效能的機器學習模型部署系統，專為生產環境而設計。它處理機器學習的推論方面，在訓練後取得模型並管理其生命週期，透過高效能、參考計數的查詢表為用戶端提供版本化的存取權。TensorFlow Serving 提供與 TensorFlow 模型的開箱即用整合，但可以輕鬆擴充以部署其他類型的模型和資料。」

以下是幾個值得注意的功能

它可以同時部署多個模型，或同一模型的多個版本
它會公開 gRPC 以及 HTTP 推論端點
它允許部署新的模型版本，而無需變更任何用戶端程式碼
它支援新版本的 Canary 部署和 A/B 測試實驗模型
由於有效率、低負擔的實作，它會將推論時間的延遲降至最低
它具有一個排程器，可將個別的推論要求分組為批次，以便在 GPU 上聯合執行，並具有可設定的延遲控制
它支援許多可部署項目：Tensorflow 模型、嵌入、詞彙、特徵轉換，甚至是基於非 Tensorflow 的機器學習模型

本指南使用 Keras 應用程式 API 建立簡單的 MobileNet 模型，然後使用 TensorFlow Serving 來部署它。重點在於 TensorFlow Serving，而不是 TensorFlow 中的建模和訓練。

注意：您可以在此連結找到包含完整運作程式碼的 Colab 筆記本。

相依性

import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import json
import shutil
import requests
import numpy as np
import tensorflow as tf
import keras
import matplotlib.pyplot as plt

模型

這裡我們從 Keras 應用程式載入預訓練的 MobileNet，這就是我們要部署的模型。

model = keras.applications.MobileNet()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_1_0_224_tf.h5
 17225924/17225924 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step

預處理

大多數模型無法直接處理原始資料，它們通常需要一些預處理步驟來調整資料以符合模型的需求，以這個 MobileNet 為例，我們可以從其 API 頁面看到，它的輸入影像需要三個基本步驟

像素值正規化到 [0, 1] 範圍
像素值縮放到 [-1, 1] 範圍
影像的形狀為 (224, 224, 3)，表示 (高度, 寬度, 通道)

我們可以使用下列函式來完成所有這些操作

def preprocess(image, mean=0.5, std=0.5, shape=(224, 224)):
    """Scale, normalize and resizes images."""
    image = image / 255.0  # Scale
    image = (image - mean) / std  # Normalize
    image = tf.image.resize(image, shape)  # Resize
    return image

關於使用「keras.applications」API 進行預處理和後處理的注意事項

Keras 應用程式 API 上提供的所有模型也都提供 preprocess_input 和 decode_predictions 函式，這些函式分別負責每個模型的預處理和後處理，並且已經包含這些步驟所需的所有邏輯。這是使用 Keras 應用程式模型時處理輸入和輸出的建議方式。在本指南中，我們不使用它們來更清楚地呈現自訂簽章的優點。

後處理

在相同的內容中，大多數模型的輸出值都需要額外處理才能滿足使用者的需求，例如，使用者不想知道給定影像的每個類別的 logits 值，使用者想要知道它屬於哪個類別。對於我們的模型，這會轉換為模型輸出之上的下列轉換

取得具有最高預測值的類別索引
從該索引取得類別名稱

# Download human-readable labels for ImageNet.
imagenet_labels_url = (
    "https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt"
)
response = requests.get(imagenet_labels_url)
# Skipping background class
labels = [x for x in response.text.split("\n") if x != ""][1:]
# Convert the labels to the TensorFlow data format
tf_labels = tf.constant(labels, dtype=tf.string)


def postprocess(prediction, labels=tf_labels):
    """Convert from probs to labels."""
    indices = tf.argmax(prediction, axis=-1)  # Index with highest prediction
    label = tf.gather(params=labels, indices=indices)  # Class name
    return label

現在讓我們下載一張香蕉圖片，看看所有內容如何結合在一起。

response = requests.get("https://i.imgur.com/j9xCCzn.jpeg", stream=True)

with open("banana.jpeg", "wb") as f:
    shutil.copyfileobj(response.raw, f)

sample_img = plt.imread("./banana.jpeg")
print(f"Original image shape: {sample_img.shape}")
print(f"Original image pixel range: ({sample_img.min()}, {sample_img.max()})")
plt.imshow(sample_img)
plt.show()

preprocess_img = preprocess(sample_img)
print(f"Preprocessed image shape: {preprocess_img.shape}")
print(
    f"Preprocessed image pixel range: ({preprocess_img.numpy().min()},",
    f"{preprocess_img.numpy().max()})",
)

batched_img = tf.expand_dims(preprocess_img, axis=0)
batched_img = tf.cast(batched_img, tf.float32)
print(f"Batched image shape: {batched_img.shape}")

model_outputs = model(batched_img)
print(f"Model output shape: {model_outputs.shape}")
print(f"Predicted class: {postprocess(model_outputs)}")

Original image shape: (540, 960, 3)
Original image pixel range: (0, 255)

png

Preprocessed image shape: (224, 224, 3)
Preprocessed image pixel range: (-1.0, 1.0)
Batched image shape: (1, 224, 224, 3)
Model output shape: (1, 1000)
Predicted class: [b'banana']

儲存模型

若要將我們訓練的模型載入 TensorFlow Serving，我們首先需要將其儲存為 SavedModel 格式。這會在定義完善的目錄階層中建立 protobuf 檔案，並且會包含版本號碼。TensorFlow Serving 允許我們選取在發出推論要求時要使用的模型或「可部署項目」的版本。每個版本都會匯出到指定路徑下的不同子目錄。

model_dir = "./model"
model_version = 1
model_export_path = f"{model_dir}/{model_version}"

tf.saved_model.save(
    model,
    export_dir=model_export_path,
)

print(f"SavedModel files: {os.listdir(model_export_path)}")

INFO:tensorflow:Assets written to: ./model/1/assets

INFO:tensorflow:Assets written to: ./model/1/assets

SavedModel files: ['variables', 'saved_model.pb', 'assets', 'fingerprint.pb']

檢查您儲存的模型

我們將使用命令列公用程式 saved_model_cli 來檢視 MetaGraphDefs（模型）和 SignatureDefs（您可以呼叫的方法）在我們的 SavedModel 中。請參閱 TensorFlow 指南中關於 SavedModel CLI 的討論。

!saved_model_cli show --dir {model_export_path} --tag_set serve --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['inputs'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 224, 224, 3)
      name: serving_default_inputs:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['output_0'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1000)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

這告訴我們很多關於我們模型的事情！例如，我們可以看見它的輸入具有 4D 形狀 (-1, 224, 224, 3)，表示 (batch_size, height, width, channels)，另請注意，此模型需要特定的影像形狀 (224, 224, 3)，這表示我們可能需要在將影像傳送到模型之前調整其形狀。我們也可以看到模型的輸出具有 (-1, 1000) 形狀，這些是 ImageNet 資料集 1000 個類別的 logits。

此資訊並未告訴我們所有內容，例如像素值需要處於 [-1, 1] 範圍內，但這是一個很好的開始。

使用 TensorFlow Serving 部署模型

安裝 TFServing

由於這個 Colab 在 Debian 環境中執行，因此我們準備使用 Aptitude 來安裝 TensorFlow Serving。我們會將 tensorflow-model-server 套件新增至 Aptitude 知道的套件清單。請注意，我們是以根使用者身分執行。

注意：此範例是在本機執行 TensorFlow Serving，但您也可以在 Docker 容器中執行，這是開始使用 TensorFlow Serving 的最簡單方式之一。

wget 'http://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-universal-2.8.0/t/tensorflow-model-server-universal/tensorflow-model-server-universal_2.8.0_all.deb'
dpkg -i tensorflow-model-server-universal_2.8.0_all.deb

開始執行 TensorFlow Serving

這就是我們開始執行 TensorFlow Serving 並載入模型的位置。載入後，我們可以開始使用 REST 發出推論要求。有一些重要的參數

port：您將用於 gRPC 要求的連接埠。
rest_api_port：您將用於 REST 要求的連接埠。
model_name：您將在 REST 要求的 URL 中使用此名稱。它可以是任何內容。
model_base_path：這是您儲存模型之目錄的路徑。

請檢查 TFServing API 參考，以取得所有可用的參數。

# Environment variable with the path to the model
os.environ["MODEL_DIR"] = f"{model_dir}"

%%bash --bg
nohup tensorflow_model_server \
  --port=8500 \
  --rest_api_port=8501 \
  --model_name=model \
  --model_base_path=$MODEL_DIR >server.log 2>&1

# We can check the logs to the server to help troubleshooting
!cat server.log

輸出

[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...

# Now we can check if tensorflow is in the active services
!sudo lsof -i -P -n | grep LISTEN

輸出

node         7 root   21u  IPv6  19100      0t0  TCP *:8080 (LISTEN)
kernel_ma   34 root    7u  IPv4  18874      0t0  TCP 172.28.0.12:6000 (LISTEN)
colab-fil   63 root    5u  IPv4  17975      0t0  TCP *:3453 (LISTEN)
colab-fil   63 root    6u  IPv6  17976      0t0  TCP *:3453 (LISTEN)
jupyter-n   81 root    6u  IPv4  18092      0t0  TCP 172.28.0.12:9000 (LISTEN)
python3    101 root   23u  IPv4  18252      0t0  TCP 127.0.0.1:44915 (LISTEN)
python3    132 root    3u  IPv4  20548      0t0  TCP 127.0.0.1:15264 (LISTEN)
python3    132 root    4u  IPv4  20549      0t0  TCP 127.0.0.1:37977 (LISTEN)
python3    132 root    9u  IPv4  20662      0t0  TCP 127.0.0.1:40689 (LISTEN)
tensorflo 1101 root    5u  IPv4  35543      0t0  TCP *:8500 (LISTEN)
tensorflo 1101 root   12u  IPv4  35548      0t0  TCP *:8501 (LISTEN)

在 TensorFlow Serving 中向您的模型發出要求

現在讓我們建立推論要求的 JSON 物件，並看看我們的模型分類效果如何

REST API

可部署項目的最新版本

我們會將預測要求做為 POST 傳送至我們伺服器的 REST 端點，並將其做為範例傳遞。我們會要求我們的伺服器提供我們最新版本的可部署項目，而不指定特定版本。

data = json.dumps(
    {
        "signature_name": "serving_default",
        "instances": batched_img.numpy().tolist(),
    }
)
url = "https://127.0.0.1:8501/v1/models/model:predict"


def predict_rest(json_data, url):
    json_response = requests.post(url, data=json_data)
    response = json.loads(json_response.text)
    rest_outputs = np.array(response["predictions"])
    return rest_outputs

rest_outputs = predict_rest(data, url)

print(f"REST output shape: {rest_outputs.shape}")
print(f"Predicted class: {postprocess(rest_outputs)}")

輸出

REST output shape: (1, 1000)
Predicted class: [b'banana']

gRPC API

gRPC 是以遠端程序呼叫 (RPC) 模型為基礎，並且是一項實作 RPC API 的技術，它使用 HTTP 2.0 做為其基礎傳輸通訊協定。gRPC 通常是低延遲、高度可擴充和分散式系統的首選。如果您想進一步了解 REST 與 gRPC 的取捨，請查看這篇文章。

import grpc

# Create a channel that will be connected to the gRPC port of the container
channel = grpc.insecure_channel("localhost:8500")

pip install -q tensorflow_serving_api

from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc

# Create a stub made for prediction
# This stub will be used to send the gRPCrequest to the TF Server
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

# Get the serving_input key
loaded_model = tf.saved_model.load(model_export_path)
input_name = list(
    loaded_model.signatures["serving_default"].structured_input_signature[1].keys()
)[0]

def predict_grpc(data, input_name, stub):
    # Create a gRPC request made for prediction
    request = predict_pb2.PredictRequest()

    # Set the name of the model, for this use case it is "model"
    request.model_spec.name = "model"

    # Set which signature is used to format the gRPC query
    # here the default one "serving_default"
    request.model_spec.signature_name = "serving_default"

    # Set the input as the data
    # tf.make_tensor_proto turns a TensorFlow tensor into a Protobuf tensor
    request.inputs[input_name].CopyFrom(tf.make_tensor_proto(data.numpy().tolist()))

    # Send the gRPC request to the TF Server
    result = stub.Predict(request)
    return result


grpc_outputs = predict_grpc(batched_img, input_name, stub)
grpc_outputs = np.array([grpc_outputs.outputs['predictions'].float_val])

print(f"gRPC output shape: {grpc_outputs.shape}")
print(f"Predicted class: {postprocess(grpc_outputs)}")

輸出

gRPC output shape: (1, 1000)
Predicted class: [b'banana']

自訂簽章

請注意，對於這個模型，我們始終需要預處理和後處理所有樣本才能取得所需的輸出，如果您正在維護和部署由大型團隊開發的數個模型，且每個模型可能需要不同的處理邏輯，這可能會變得相當棘手。

TensorFlow 允許我們客製化模型圖，以嵌入所有的處理邏輯，這使得模型部署更加容易。雖然有多種方法可以實現這一點，但由於我們將使用 TFServing 來部署模型，我們可以將模型圖直接客製化到服務簽名中。

我們可以簡單地使用以下程式碼來匯出相同的模型，該模型已經包含預處理和後處理邏輯作為預設簽名。這使得該模型可以直接對原始資料進行預測。

def export_model(model, labels):
    @tf.function(input_signature=[tf.TensorSpec([None, None, None, 3], tf.float32)])
    def serving_fn(image):
        processed_img = preprocess(image)
        probs = model(processed_img)
        label = postprocess(probs)
        return {"label": label}

    return serving_fn


model_sig_version = 2
model_sig_export_path = f"{model_dir}/{model_sig_version}"

tf.saved_model.save(
    model,
    export_dir=model_sig_export_path,
    signatures={"serving_default": export_model(model, labels)},
)

!saved_model_cli show --dir {model_sig_export_path} --tag_set serve --signature_def serving_default

INFO:tensorflow:Assets written to: ./model/2/assets

INFO:tensorflow:Assets written to: ./model/2/assets

The given SavedModel SignatureDef contains the following input(s):
  inputs['image'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, 3)
      name: serving_default_image:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['label'] tensor_info:
      dtype: DT_STRING
      shape: (-1)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

請注意，此模型具有不同的簽名。它的輸入仍然是 4D，但現在具有 (-1, -1, -1, 3) 的形狀，這表示它支援任何高度和寬度的圖像。它的輸出也具有不同的形狀，不再輸出長度為 1000 的 logits。

我們可以使用以下 API 來測試模型使用特定簽名的預測結果

batched_raw_img = tf.expand_dims(sample_img, axis=0)
batched_raw_img = tf.cast(batched_raw_img, tf.float32)

loaded_model = tf.saved_model.load(model_sig_export_path)
loaded_model.signatures["serving_default"](**{"image": batched_raw_img})

{'label': <tf.Tensor: shape=(1,), dtype=string, numpy=array([b'banana'], dtype=object)>}

使用特定版本的可服務模型進行預測

現在讓我們指定可服務模型的特定版本。請注意，當我們使用自訂簽名儲存模型時，我們使用了不同的資料夾。第一個模型儲存在 /1 資料夾（版本 1），而具有自訂簽名的模型儲存在 /2 資料夾（版本 2）。依預設，TFServing 將部署所有共享相同基本父資料夾的模型。

REST API

data = json.dumps(
    {
        "signature_name": "serving_default",
        "instances": batched_raw_img.numpy().tolist(),
    }
)
url_sig = "https://127.0.0.1:8501/v1/models/model/versions/2:predict"

print(f"REST output shape: {rest_outputs.shape}")
print(f"Predicted class: {rest_outputs}")

輸出

REST output shape: (1,)
Predicted class: ['banana']

gRPC API

channel = grpc.insecure_channel("localhost:8500")
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

input_name = list(
    loaded_model.signatures["serving_default"].structured_input_signature[1].keys()
)[0]

grpc_outputs = predict_grpc(batched_raw_img, input_name, stub)
grpc_outputs = np.array([grpc_outputs.outputs['label'].string_val])

print(f"gRPC output shape: {grpc_outputs.shape}")
print(f"Predicted class: {grpc_outputs}")

輸出

gRPC output shape: (1, 1)
Predicted class: [[b'banana']]

其他資源

使用 TFServing 部署 TensorFlow 模型

◆ 簡介

◆ 相依性

◆ 模型

◆ 預處理

◆ 後處理

◆ 儲存模型

◆ 檢查您已儲存的模型

◆ 使用 TensorFlow Serving 部署模型

安裝 TFServing

開始執行 TensorFlow Serving

◆ 向 TensorFlow Serving 中的模型發出請求

REST API

gRPC API

◆ 自訂簽名

◆ 使用特定版本的可服務模型進行預測

REST API

gRPC API

◆ 其他資源