► 程式碼範例 / 自然語言處理 / 使用 KerasNLP 進行語義相似性分析

使用 KerasNLP 進行語義相似性分析

作者： Anshuman Mishra
建立日期 2023/02/25
最後修改日期 2023/02/25
說明：使用 KerasNLP 中的預先訓練模型進行語義相似性任務。

ⓘ 此範例使用 Keras 3

簡介

語義相似度是指判斷兩個句子在語義上的相似程度。在此範例中，我們已經了解如何使用 SNLI（史丹佛自然語言推論）語料庫，搭配 HuggingFace Transformers 函式庫來預測句子語義相似度。在本教學課程中，我們將學習如何使用 KerasNLP（核心 Keras API 的延伸）來執行相同的任務。此外，我們還將了解 KerasNLP 如何有效減少樣板程式碼，並簡化建構和使用模型的過程。如需 KerasNLP 的詳細資訊，請參閱 KerasNLP 官方文件。

本指南分為以下幾個部分

設定、任務定義和建立基準。
使用 BERT 建立基準。
儲存和載入模型。
使用模型執行推論。5 使用 RoBERTa 提高準確度

設定

以下指南使用 Keras Core 在 tensorflow、jax 或 torch 中運作。KerasNLP 內建 Keras Core 支援，只需變更下方的 KERAS_BACKEND 環境變數，即可變更您要使用的後端。我們在下方選擇 jax 後端，這將使我們在下方獲得特別快速的訓練步驟。

!pip install -q --upgrade keras-nlp
!pip install -q --upgrade keras  # Upgrade to Keras 3.

import numpy as np
import tensorflow as tf
import keras
import keras_nlp
import tensorflow_datasets as tfds

為了載入 SNLI 資料集，我們使用 tensorflow-datasets 函式庫，其中總共包含超過 550,000 個樣本。但是，為了確保此範例能快速執行，我們只使用 20% 的訓練樣本。

SNLI 資料集概覽

資料集中的每個樣本都包含三個組成部分：hypothesis、premise 和 label。表示提供給配對作者的原始說明，而 hypothesis 則是指配對作者建立的假設說明。標籤由標註者指派，用於指示兩個句子之間的相似度。

資料集包含三個可能的相似度標籤值：矛盾、蘊涵和中性。矛盾表示完全不同的句子，而蘊涵表示語義相似的句子。最後，中性是指無法在句子之間建立明確相似度或差異性的句子。

snli_train = tfds.load("snli", split="train[:20%]")
snli_val = tfds.load("snli", split="validation")
snli_test = tfds.load("snli", split="test")

# Here's an example of how our training samples look like, where we randomly select
# four samples:
sample = snli_test.batch(4).take(1).get_single_element()
sample

{'hypothesis': <tf.Tensor: shape=(4,), dtype=string, numpy=
 array([b'A girl is entertaining on stage',
        b'A group of people posing in front of a body of water.',
        b"The group of people aren't inide of the building.",
        b'The people are taking a carriage ride.'], dtype=object)>,
 'label': <tf.Tensor: shape=(4,), dtype=int64, numpy=array([0, 0, 0, 0])>,
 'premise': <tf.Tensor: shape=(4,), dtype=string, numpy=
 array([b'A girl in a blue leotard hula hoops on a stage with balloon shapes in the background.',
        b'A group of people taking pictures on a walkway in front of a large body of water.',
        b'Many people standing outside of a place talking to each other in front of a building that has a sign that says "HI-POINTE."',
        b'Three people are riding a carriage pulled by four horses.'],
       dtype=object)>}

預處理

在我們的資料集中，我們發現有些樣本的資料遺失或標籤錯誤，以值 -1 表示。為了確保模型的準確性和可靠性，我們只需從資料集中篩選掉這些樣本。

def filter_labels(sample):
    return sample["label"] >= 0

以下是一個工具函式，可將範例拆分為適用於 model.fit() 的 (x, y) 元組。根據預設，keras_nlp.models.BertClassifier 會在訓練期間使用 "[SEP]" 權重標記對原始字串進行權重標記和打包。因此，此標籤拆分是我們需要執行的所有資料準備工作。

def split_labels(sample):
    x = (sample["hypothesis"], sample["premise"])
    y = sample["label"]
    return x, y


train_ds = (
    snli_train.filter(filter_labels)
    .map(split_labels, num_parallel_calls=tf.data.AUTOTUNE)
    .batch(16)
)
val_ds = (
    snli_val.filter(filter_labels)
    .map(split_labels, num_parallel_calls=tf.data.AUTOTUNE)
    .batch(16)
)
test_ds = (
    snli_test.filter(filter_labels)
    .map(split_labels, num_parallel_calls=tf.data.AUTOTUNE)
    .batch(16)
)

使用 BERT 建立基準。

我們使用 KerasNLP 的 BERT 模型為語義相似度任務建立基準。keras_nlp.models.BertClassifier 類別會將分類標頭附加到 BERT 骨幹，將骨幹輸出映射到適合分類任務的 logit 輸出。這顯著減少了對自訂程式碼的需求。

KerasNLP 模型具有內建的權重標記功能，可根據所選模型預設處理權重標記。但是，使用者也可以根據其特定需求使用自訂預處理技術。如果我們傳遞一個元組作為輸入，模型將對所有字串進行權重標記，並使用 "[SEP]" 分隔符號將其串聯起來。

我們將此模型與預先訓練的權重一起使用，並且可以使用 from_preset() 方法來使用我們自己的預處理器。對於 SNLI 資料集，我們將 num_classes 設定為 3。

bert_classifier = keras_nlp.models.BertClassifier.from_preset(
    "bert_tiny_en_uncased", num_classes=3
)

請注意，BERT Tiny 模型只有 4,386,307 個可訓練參數。

KerasNLP 任務模型附帶編譯預設值。我們現在可以透過呼叫 fit() 方法來訓練剛才實例化的模型。

bert_classifier.fit(train_ds, validation_data=val_ds, epochs=1)

 6867/6867 ━━━━━━━━━━━━━━━━━━━━ 61s 8ms/step - loss: 0.8732 - sparse_categorical_accuracy: 0.5864 - val_loss: 0.5900 - val_sparse_categorical_accuracy: 0.7602

<keras.src.callbacks.history.History at 0x7f4660171fc0>

我們的 BERT 分類器在驗證集上達到了約 76% 的準確率。現在，讓我們評估其在測試集上的效能。

評估訓練模型在測試資料上的效能。

bert_classifier.evaluate(test_ds)

 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - loss: 0.5815 - sparse_categorical_accuracy: 0.7628

[0.5895748734474182, 0.7618078589439392]

我們的基準 BERT 模型在測試集上也達到了約 76% 的準確率。現在，讓我們嘗試透過使用稍高的學習率重新編譯模型來提高其效能。

bert_classifier = keras_nlp.models.BertClassifier.from_preset(
    "bert_tiny_en_uncased", num_classes=3
)
bert_classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    metrics=["accuracy"],
)

bert_classifier.fit(train_ds, validation_data=val_ds, epochs=1)
bert_classifier.evaluate(test_ds)

 6867/6867 ━━━━━━━━━━━━━━━━━━━━ 59s 8ms/step - accuracy: 0.6007 - loss: 0.8636 - val_accuracy: 0.7648 - val_loss: 0.5800
 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7700 - loss: 0.5692

[0.578984260559082, 0.7686278820037842]

僅僅調整學習率不足以提升效能，效能仍然停留在 76% 左右。讓我們再試一次，但這次使用 keras.optimizers.AdamW 和學習率排程。

class TriangularSchedule(keras.optimizers.schedules.LearningRateSchedule):
    """Linear ramp up for `warmup` steps, then linear decay to zero at `total` steps."""

    def __init__(self, rate, warmup, total):
        self.rate = rate
        self.warmup = warmup
        self.total = total

    def get_config(self):
        config = {"rate": self.rate, "warmup": self.warmup, "total": self.total}
        return config

    def __call__(self, step):
        step = keras.ops.cast(step, dtype="float32")
        rate = keras.ops.cast(self.rate, dtype="float32")
        warmup = keras.ops.cast(self.warmup, dtype="float32")
        total = keras.ops.cast(self.total, dtype="float32")

        warmup_rate = rate * step / self.warmup
        cooldown_rate = rate * (total - step) / (total - warmup)
        triangular_rate = keras.ops.minimum(warmup_rate, cooldown_rate)
        return keras.ops.maximum(triangular_rate, 0.0)


bert_classifier = keras_nlp.models.BertClassifier.from_preset(
    "bert_tiny_en_uncased", num_classes=3
)

# Get the total count of training batches.
# This requires walking the dataset to filter all -1 labels.
epochs = 3
total_steps = sum(1 for _ in train_ds.as_numpy_iterator()) * epochs
warmup_steps = int(total_steps * 0.2)

bert_classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.AdamW(
        TriangularSchedule(1e-4, warmup_steps, total_steps)
    ),
    metrics=["accuracy"],
)

bert_classifier.fit(train_ds, validation_data=val_ds, epochs=epochs)

Epoch 1/3
 6867/6867 ━━━━━━━━━━━━━━━━━━━━ 59s 8ms/step - accuracy: 0.5457 - loss: 0.9317 - val_accuracy: 0.7633 - val_loss: 0.5825
Epoch 2/3
 6867/6867 ━━━━━━━━━━━━━━━━━━━━ 55s 8ms/step - accuracy: 0.7291 - loss: 0.6515 - val_accuracy: 0.7809 - val_loss: 0.5399
Epoch 3/3
 6867/6867 ━━━━━━━━━━━━━━━━━━━━ 55s 8ms/step - accuracy: 0.7708 - loss: 0.5695 - val_accuracy: 0.7918 - val_loss: 0.5214

<keras.src.callbacks.history.History at 0x7f45645b3370>

成功了！透過學習率排程器和 AdamW 優化器，我們的驗證準確率提升到了約 79%。

現在，讓我們在測試集上評估我們的最終模型，看看它的效能如何。

bert_classifier.evaluate(test_ds)

 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7956 - loss: 0.5128

[0.5245093703269958, 0.7890879511833191]

我們的 Tiny BERT 模型在使用學習率排程器的情況下，在測試集上達到了約 79% 的準確率。與我們之前的結果相比，這是一個顯著的進步。微調預先訓練的 BERT 模型是自然語言處理任務中的一個強大工具，即使是像 Tiny BERT 這樣的小模型也能夠取得令人印象深刻的成果。

讓我們先儲存我們的模型，然後繼續學習如何使用它進行推論。

儲存和載入模型

bert_classifier.save("bert_classifier.keras")
restored_model = keras.models.load_model("bert_classifier.keras")
restored_model.evaluate(test_ds)

 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - loss: 0.5128 - sparse_categorical_accuracy: 0.7956

[0.5245093703269958, 0.7890879511833191]

使用模型進行推論。

讓我們看看如何使用 KerasNLP 模型進行推論

# Convert to Hypothesis-Premise pair, for forward pass through model
sample = (sample["hypothesis"], sample["premise"])
sample

(<tf.Tensor: shape=(4,), dtype=string, numpy=
 array([b'A girl is entertaining on stage',
        b'A group of people posing in front of a body of water.',
        b"The group of people aren't inide of the building.",
        b'The people are taking a carriage ride.'], dtype=object)>,
 <tf.Tensor: shape=(4,), dtype=string, numpy=
 array([b'A girl in a blue leotard hula hoops on a stage with balloon shapes in the background.',
        b'A group of people taking pictures on a walkway in front of a large body of water.',
        b'Many people standing outside of a place talking to each other in front of a building that has a sign that says "HI-POINTE."',
        b'Three people are riding a carriage pulled by four horses.'],
       dtype=object)>)

KerasNLP 模型中的預設預處理器會自動處理輸入分詞，因此我們不需要明確地執行分詞。

predictions = bert_classifier.predict(sample)


def softmax(x):
    return np.exp(x) / np.exp(x).sum(axis=0)


# Get the class predictions with maximum probabilities
predictions = softmax(predictions)

 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 711ms/step

使用 RoBERTa 提高準確率

現在我們已經建立了一個基準，我們可以嘗試透過實驗不同的模型來改善我們的結果。感謝 KerasNLP，只需幾行程式碼即可輕鬆地在相同資料集上微調 RoBERTa 检查點。

# Inittializing a RoBERTa from preset
roberta_classifier = keras_nlp.models.RobertaClassifier.from_preset(
    "roberta_base_en", num_classes=3
)

roberta_classifier.fit(train_ds, validation_data=val_ds, epochs=1)

roberta_classifier.evaluate(test_ds)

 6867/6867 ━━━━━━━━━━━━━━━━━━━━ 2049s 297ms/step - loss: 0.5509 - sparse_categorical_accuracy: 0.7740 - val_loss: 0.3292 - val_sparse_categorical_accuracy: 0.8789
 614/614 ━━━━━━━━━━━━━━━━━━━━ 56s 88ms/step - loss: 0.3307 - sparse_categorical_accuracy: 0.8784

[0.33771008253097534, 0.874796450138092]

RoBERTa 基礎模型的可訓練參數數量遠遠超過 BERT Tiny 模型，參數數量幾乎是後者的 30 倍，達到 124,645,635 個。因此，在 P100 GPU 上訓練大約需要 1.5 小時。然而，效能提升非常顯著，驗證集和測試集上的準確率都提升到了 88%。使用 RoBERTa，我們能夠在 P100 GPU 上擬合最大批次大小為 16 的資料。

儘管使用了不同的模型，但使用 RoBERTa 進行推論的步驟與 BERT 相同！

predictions = roberta_classifier.predict(sample)
print(tf.math.argmax(predictions, axis=1).numpy())

 1/1 ━━━━━━━━━━━━━━━━━━━━ 4s 4s/step
[0 0 0 0]

我們希望本教程能夠幫助您了解使用 KerasNLP 和 BERT 執行語義相似性任務的簡便性和有效性。

在本教程中，我們演示了如何使用預先訓練的 BERT 模型建立基準，並透過僅使用幾行程式碼訓練更大的 RoBERTa 模型來提高效能。

KerasNLP 工具箱提供了一系列用於預處理文字的模組化構建塊，包括預先訓練的最先進模型和低階 Transformer 編碼器層。我們相信，這將使自然語言解決方案的實驗變得更加容易和高效。

使用 KerasNLP 進行語義相似性分析

◆ 簡介

◆ 設定

預處理

評估訓練模型在測試資料上的效能。