開發人員指南 / 超參數調整 / 在 KerasTuner 中處理失敗的試驗

在 KerasTuner 中處理失敗的試驗

作者:Haifeng Jin
建立日期 2023/02/28
上次修改日期 2023/02/28
說明:KerasTuner 中容錯設定的基本知識。

在 Colab 中檢視 GitHub 原始碼


簡介

KerasTuner 程式可能會需要很長時間才能執行完畢,因為每個模型可能都需要很長時間才能訓練完成。我們不希望程式只因為某些試驗隨機失敗就失敗。

在本指南中,我們將說明如何處理 KerasTuner 中失敗的試驗,包括:

  • 如何在搜尋期間容忍失敗的試驗
  • 如何在建置和評估模型期間將試驗標記為失敗
  • 如何透過引發 `FatalError` 來終止搜尋

設定

!pip install keras-tuner -q
import keras
from keras import layers
import keras_tuner
import numpy as np

容忍失敗的試驗

我們將在初始化調節器時使用 max_retries_per_trialmax_consecutive_failed_trials 參數。

max_retries_per_trial 控制如果試驗持續失敗,將執行的最大重試次數。例如,如果設定為 3,則試驗可能會運行 4 次(1 次失敗的運行 + 3 次失敗的重試),然後才會最終標記為失敗。max_retries_per_trial 的預設值為 0。

max_consecutive_failed_trials 控制在終止搜尋之前允許出現多少次連續失敗的試驗(此處的失敗試驗是指所有重試均失敗的試驗)。例如,如果設定為 3,並且試驗 2、試驗 3 和試驗 4 全部失敗,則搜尋將會終止。但是,如果設定為 3,並且只有試驗 2、試驗 3、試驗 5 和試驗 6 失敗,則搜尋不會終止,因為失敗的試驗不是連續的。max_consecutive_failed_trials 的預設值為 3。

以下程式碼顯示了這兩個參數如何運作。

  • 我們定義了一個具有 2 個超參數的搜尋空間,用於 2 個密集層中的單元數。
  • 當它們的乘積大於 800 時,我們會針對模型過大引發 ValueError
def build_model(hp):
    # Define the 2 hyperparameters for the units in dense layers
    units_1 = hp.Int("units_1", 10, 40, step=10)
    units_2 = hp.Int("units_2", 10, 30, step=10)

    # Define the model
    model = keras.Sequential(
        [
            layers.Dense(units=units_1, input_shape=(20,)),
            layers.Dense(units=units_2),
            layers.Dense(units=1),
        ]
    )
    model.compile(loss="mse")

    # Raise an error when the model is too large
    num_params = model.count_params()
    if num_params > 1200:
        raise ValueError(f"Model too large! It contains {num_params} params.")
    return model

我們按如下方式設定調節器。

  • 我們設定 max_retries_per_trial=3
  • 我們設定 max_consecutive_failed_trials=8
  • 我們使用 GridSearch 列舉所有超參數值組合。
tuner = keras_tuner.GridSearch(
    hypermodel=build_model,
    objective="val_loss",
    overwrite=True,
    max_retries_per_trial=3,
    max_consecutive_failed_trials=8,
)

# Use random data to train the model.
tuner.search(
    x=np.random.rand(100, 20),
    y=np.random.rand(100, 1),
    validation_data=(
        np.random.rand(100, 20),
        np.random.rand(100, 1),
    ),
    epochs=10,
)

# Print the results.
tuner.results_summary()
Trial 12 Complete [00h 00m 00s]
Best val_loss So Far: 0.12375041842460632
Total elapsed time: 00h 00m 08s
Results summary
Results in ./untitled_project
Showing 10 best trials
Objective(name="val_loss", direction="min")
Trial 0003 summary
Hyperparameters:
units_1: 20
units_2: 10
Score: 0.12375041842460632
Trial 0001 summary
Hyperparameters:
units_1: 10
units_2: 20
Score: 0.12741881608963013
Trial 0002 summary
Hyperparameters:
units_1: 10
units_2: 30
Score: 0.13982832431793213
Trial 0000 summary
Hyperparameters:
units_1: 10
units_2: 10
Score: 0.1433391124010086
Trial 0005 summary
Hyperparameters:
units_1: 20
units_2: 30
Score: 0.14747518301010132
Trial 0006 summary
Hyperparameters:
units_1: 30
units_2: 10
Score: 0.15092280507087708
Trial 0004 summary
Hyperparameters:
units_1: 20
units_2: 20
Score: 0.21962997317314148
Trial 0007 summary
Hyperparameters:
units_1: 30
units_2: 20
Traceback (most recent call last):
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
    model = self._try_build(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
    model = self._build_hypermodel(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
    model = self.hypermodel.build(hp)
  File "/tmp/ipykernel_21713/966577796.py", line 19, in build_model
    raise ValueError(f"Model too large! It contains {num_params} params.")
ValueError: Model too large! It contains 1271 params.
Trial 0008 summary
Hyperparameters:
units_1: 30
units_2: 30
Traceback (most recent call last):
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
    model = self._try_build(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
    model = self._build_hypermodel(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
    model = self.hypermodel.build(hp)
  File "/tmp/ipykernel_21713/966577796.py", line 19, in build_model
    raise ValueError(f"Model too large! It contains {num_params} params.")
ValueError: Model too large! It contains 1591 params.
Trial 0009 summary
Hyperparameters:
units_1: 40
units_2: 10
Traceback (most recent call last):
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
    model = self._try_build(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
    model = self._build_hypermodel(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
    model = self.hypermodel.build(hp)
  File "/tmp/ipykernel_21713/966577796.py", line 19, in build_model
    raise ValueError(f"Model too large! It contains {num_params} params.")
ValueError: Model too large! It contains 1261 params.

將試驗標記為失敗

當模型過大時,我們不需要重試。無論我們使用相同的超參數嘗試多少次,它總是過大。

我們可以設定 max_retries_per_trial=0 來做到這一點。但是,無論引發什麼錯誤,它都不會重試,而我們可能仍然希望針對其他非預期錯誤重試。是否有更好的方法來處理這種情況?

我們可以引發 FailedTrialError 來跳過重試。每當引發此錯誤時,都不會重試該試驗。當發生其他錯誤時,重試仍會執行。範例如下所示。

def build_model(hp):
    # Define the 2 hyperparameters for the units in dense layers
    units_1 = hp.Int("units_1", 10, 40, step=10)
    units_2 = hp.Int("units_2", 10, 30, step=10)

    # Define the model
    model = keras.Sequential(
        [
            layers.Dense(units=units_1, input_shape=(20,)),
            layers.Dense(units=units_2),
            layers.Dense(units=1),
        ]
    )
    model.compile(loss="mse")

    # Raise an error when the model is too large
    num_params = model.count_params()
    if num_params > 1200:
        # When this error is raised, it skips the retries.
        raise keras_tuner.errors.FailedTrialError(
            f"Model too large! It contains {num_params} params."
        )
    return model


tuner = keras_tuner.GridSearch(
    hypermodel=build_model,
    objective="val_loss",
    overwrite=True,
    max_retries_per_trial=3,
    max_consecutive_failed_trials=8,
)

# Use random data to train the model.
tuner.search(
    x=np.random.rand(100, 20),
    y=np.random.rand(100, 1),
    validation_data=(
        np.random.rand(100, 20),
        np.random.rand(100, 1),
    ),
    epochs=10,
)

# Print the results.
tuner.results_summary()
Trial 12 Complete [00h 00m 00s]
Best val_loss So Far: 0.08265472948551178
Total elapsed time: 00h 00m 05s
Results summary
Results in ./untitled_project
Showing 10 best trials
Objective(name="val_loss", direction="min")
Trial 0002 summary
Hyperparameters:
units_1: 10
units_2: 30
Score: 0.08265472948551178
Trial 0005 summary
Hyperparameters:
units_1: 20
units_2: 30
Score: 0.11731438338756561
Trial 0006 summary
Hyperparameters:
units_1: 30
units_2: 10
Score: 0.13600358366966248
Trial 0004 summary
Hyperparameters:
units_1: 20
units_2: 20
Score: 0.1465979516506195
Trial 0000 summary
Hyperparameters:
units_1: 10
units_2: 10
Score: 0.15967626869678497
Trial 0001 summary
Hyperparameters:
units_1: 10
units_2: 20
Score: 0.1646396517753601
Trial 0003 summary
Hyperparameters:
units_1: 20
units_2: 10
Score: 0.1696309596300125
Trial 0007 summary
Hyperparameters:
units_1: 30
units_2: 20
Traceback (most recent call last):
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
    model = self._try_build(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
    model = self._build_hypermodel(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
    model = self.hypermodel.build(hp)
  File "/tmp/ipykernel_21713/2463037569.py", line 20, in build_model
    raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: Model too large! It contains 1271 params.
Trial 0008 summary
Hyperparameters:
units_1: 30
units_2: 30
Traceback (most recent call last):
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
    model = self._try_build(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
    model = self._build_hypermodel(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
    model = self.hypermodel.build(hp)
  File "/tmp/ipykernel_21713/2463037569.py", line 20, in build_model
    raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: Model too large! It contains 1591 params.
Trial 0009 summary
Hyperparameters:
units_1: 40
units_2: 10
Traceback (most recent call last):
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
    model = self._try_build(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
    model = self._build_hypermodel(hp)
  File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
    model = self.hypermodel.build(hp)
  File "/tmp/ipykernel_21713/2463037569.py", line 20, in build_model
    raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: Model too large! It contains 1261 params.

以程式設計方式終止搜尋

當程式碼中存在錯誤時,我們應該立即終止搜尋並修復錯誤。您可以在滿足您定義的條件時以程式設計方式終止搜尋。引發 FatalError(或其子類別 FatalValueErrorFatalTypeErrorFatalRuntimeError)將會終止搜尋,而不受 max_consecutive_failed_trials 參數的影響。

以下是在模型過大時終止搜尋的範例。

def build_model(hp):
    # Define the 2 hyperparameters for the units in dense layers
    units_1 = hp.Int("units_1", 10, 40, step=10)
    units_2 = hp.Int("units_2", 10, 30, step=10)

    # Define the model
    model = keras.Sequential(
        [
            layers.Dense(units=units_1, input_shape=(20,)),
            layers.Dense(units=units_2),
            layers.Dense(units=1),
        ]
    )
    model.compile(loss="mse")

    # Raise an error when the model is too large
    num_params = model.count_params()
    if num_params > 1200:
        # When this error is raised, the search is terminated.
        raise keras_tuner.errors.FatalError(
            f"Model too large! It contains {num_params} params."
        )
    return model


tuner = keras_tuner.GridSearch(
    hypermodel=build_model,
    objective="val_loss",
    overwrite=True,
    max_retries_per_trial=3,
    max_consecutive_failed_trials=8,
)

try:
    # Use random data to train the model.
    tuner.search(
        x=np.random.rand(100, 20),
        y=np.random.rand(100, 1),
        validation_data=(
            np.random.rand(100, 20),
            np.random.rand(100, 1),
        ),
        epochs=10,
    )
except keras_tuner.errors.FatalError:
    print("The search is terminated.")
Trial 7 Complete [00h 00m 01s]
val_loss: 0.14219732582569122
Best val_loss So Far: 0.09755773097276688
Total elapsed time: 00h 00m 04s
Search: Running Trial #8
Value             |Best Value So Far |Hyperparameter
30                |10                |units_1
20                |20                |units_2
The search is terminated.

重點摘要

在本指南中,您學習了如何在 KerasTuner 中處理失敗的試驗

  • 使用 max_retries_per_trial 指定失敗試驗的重試次數。
  • 使用 max_consecutive_failed_trials 指定要容忍的最大連續失敗試驗次數。
  • 引發 FailedTrialError 以將試驗直接標記為失敗並跳過重試。
  • 引發 FatalErrorFatalValueErrorFatalTypeErrorFatalRuntimeError 以立即終止搜尋。