머신러닝 모델 하이퍼파라미터를 수동으로 최적화하는 방법

_ 2022년 12월 19일_ NEPIRITY

머신러닝 모델 하이퍼파라미터를 수동으로 최적화하는 방법

머신러닝 알고리즘에는 알고리즘을 특정 데이터 세트에 맞게 조정할 수 있는 하이퍼 매개 변수가 있습니다.

하이퍼파라미터의 영향은 일반적으로 이해될 수 있지만 데이터 세트에 대한 구체적인 영향과 학습 중 상호 작용은 알려지지 않을 수 있습니다. 따라서 머신러닝 프로젝트의 일부로 알고리즘 하이퍼파라미터의 값을 조정하는 것이 중요합니다.

Naive Optimization 알고리즘을 사용하여 그리드 검색 및 임의 검색과 같은 하이퍼 매개 변수를 조정하는 것이 일반적입니다. 다른 접근 방식은 확률적 언덕 등반 알고리즘(stochastic hill climbing algorithm)과 같은 확률적 최적화 알고리즘을 사용하는 것입니다.

이 자습서에서는 머신러닝 알고리즘의 하이퍼파라미터를 수동으로 최적화하는 방법을 알아봅니다.

이 자습서를 완료하면 다음을 알 수 있습니다.

확률적 최적화 알고리즘은 하이퍼파라미터 최적화를 위해 그리드 및 임의 검색 대신 사용할 수 있습니다.
확률적 언덕 등반 알고리즘을 사용하여 Perceptron 알고리즘의 하이퍼파라미터를 조정하는 방법.
XGBoost 그래디언트 부스팅 알고리즘의 하이퍼파라미터를 수동으로 최적화하는 방법.

튜토리얼 개요

이 자습서는 다음과 같이 세 부분으로 나뉩니다.

수동 하이퍼파라미터 최적화
퍼셉트론 하이퍼파라미터 최적화
XGBoost 하이퍼파라미터 최적화

수동 하이퍼파라미터 최적화

머신러닝 모델에는 모델을 데이터 세트에 맞게 사용자 지정하기 위해 설정해야 하는 하이퍼파라미터가 있습니다.

모델에 대한 하이퍼파라미터의 일반적인 효과는 알려져 있지만, 주어진 데이터 세트에 대해 하이퍼파라미터와 상호 작용하는 하이퍼파라미터의 조합을 가장 잘 설정하는 방법은 어렵습니다.

더 나은 방법은 모델 하이퍼파라미터에 대해 서로 다른 값을 객관적으로 검색하고 지정된 데이터 세트에서 최상의 성능을 달성하는 모델을 생성하는 하위 집합을 선택하는 것입니다. 이를 하이퍼파라미터 최적화 또는 하이퍼파라미터 튜닝이라고 합니다.

다양한 최적화 알고리즘이 사용될 수 있지만 가장 간단하고 가장 일반적인 두 가지 방법은 무작위 검색과 그리드 검색입니다.

무작위 검색. 검색 공간을 하이퍼파라미터 값의 제한된 도메인으로 정의하고 해당 도메인에서 무작위로 샘플 포인트를 정의합니다.
그리드 검색. 검색 공간을 하이퍼파라미터 값의 그리드로 정의하고 그리드의 모든 위치를 평가합니다.

그리드 검색은 일반적으로 성능이 좋은 것으로 알려진 스팟 검사 조합에 적합합니다. 임의 검색은 직관적으로 추측하지 못했던 하이퍼파라미터 조합을 검색하고 가져오는 데 적합하지만 실행하는 데 더 많은 시간이 필요한 경우가 많습니다.

그리드 및 하이퍼파라미터 튜닝을 위한 임의 검색에 대한 자세한 내용은 자습서를 참조하세요.

무작위 검색과 그리드 검색을 통한 하이퍼파라미터 최적화

그리드 및 임의 검색은 기본 최적화 알고리즘이며 머신러닝 알고리즘의 성능을 조정하기 위해 원하는 최적화를 사용할 수 있습니다. 예를 들어, 확률 최적화 알고리즘을 사용할 수 있습니다. 이는 양호하거나 우수한 성능이 필요하고 모델을 조정하는 데 사용할 수 있는 충분한 리소스가 있는 경우에 바람직할 수 있습니다.

다음으로, 확률적 언덕 등반 알고리즘을 사용하여 Perceptron 알고리즘의 성능을 조정하는 방법을 살펴보겠습니다.

퍼셉트론 하이퍼파라미터 최적화

퍼셉트론 알고리즘은 가장 간단한 유형의 인공 신경망입니다.

이것은 2 클래스 분류 문제에 사용할 수있는 단일 뉴런의 모델이며 나중에 훨씬 더 큰 네트워크를 개발하기위한 기반을 제공합니다.

이 섹션에서는 Perceptron 모델의 하이퍼파라미터를 수동으로 최적화하는 방법을 살펴보겠습니다.

먼저 모델 최적화의 초점으로 사용할 수 있는 합성 이진 분류 문제를 정의해 보겠습니다.

make_classification()함수를 사용하여 1,000 개의 행과 5 개의 입력 변수가있는 이진 분류 문제를 정의 할 수 있습니다.

아래 예제에서는 데이터 집합을 만들고 데이터의 모양을 요약합니다.

1
2
3
4
5
6
# define a binary classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# summarize the shape of the dataset
print(X.shape, y.shape)

예제를 실행하면 생성된 데이터 세트의 모양이 인쇄되어 기대치를 확인합니다.

1
(1000, 5) (1000,)

scikit-learn은 퍼셉트론 클래스를 통해 퍼셉트론 모델의 구현을 제공합니다.

모델의 하이퍼파라미터를 튜닝하기 전에 기본 하이퍼파라미터를 사용하여 성능의 기준을 설정할 수 있습니다.

우리는 RepeatedStratifiedKFold 클래스를 통해 반복되는 계층화 된 k- 폴드 교차 검증의 모범 사례를 사용하여 모델을 평가할 것입니다.

합성 이진 분류 데이터 세트에서 기본 하이퍼파라미터를 사용하여 Perceptron 모델을 평가하는 전체 예는 다음과 같습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# perceptron default hyperparameters for binary classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import Perceptron
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# define model
model = Perceptron()
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring=‘accuracy’, cv=cv, n_jobs=–1)
# report result
print(‘Mean Accuracy: %.3f (%.3f)’ % (mean(scores), std(scores)))

예제 보고서를 실행하면 모델이 평가되고 분류 정확도의 평균 및 표준 편차가 보고됩니다.

참고: 결과는 알고리즘 또는 평가 절차의 확률적 특성 또는 수치 정밀도의 차이에 따라 달라질 수 있습니다. 예제를 몇 번 실행하고 평균 결과를 비교하는 것이 좋습니다.

이 경우 기본 하이퍼파라미터가 있는 모델이 약 78.5%의 분류 정확도를 달성했음을 알 수 있습니다.

최적화된 하이퍼파라미터로 이보다 더 나은 성능을 얻을 수 있기를 바랍니다.

1
Mean Accuracy: 0.786 (0.069)

다음으로, 확률적 언덕 등반 알고리즘을 사용하여 Perceptron 모델의 하이퍼파라미터를 최적화할 수 있습니다.

최적화할 수 있는 하이퍼파라미터가 많이 있지만, 모델의 학습 동작에 가장 큰 영향을 미치는 두 가지에 초점을 맞출 것입니다. 그들은:

학습률 (eta0).
정규화(알파).

학습률은 예측 오류를 기반으로 모델이 업데이트되는 양을 제어하고 학습 속도를 제어합니다. eta의 기본값은 1.0입니다. 합리적인 값이 0보다 크고(예: 1E-8 또는 1E-10보다 큼) 1.0보다 작을 수 있습니다.

기본적으로 Perceptron은 정규화를 사용하지 않지만 학습 중에 L1 및 L2 정규화를 모두 적용하는 “탄력적 네트워크” 정규화를 활성화합니다. 이렇게 하면 모델이 작은 모델 가중치를 추구하고 결과적으로 더 나은 성능을 찾도록 장려합니다.

정규화의 가중치를 제어하는 “알파” 하이퍼파라미터(예: 학습에 영향을 미치는 양)를 조정합니다. 0.0으로 설정하면 정규화가 사용되지 않는 것과 같습니다. 적절한 값은 0.0에서 1.0 사이입니다.

먼저 최적화 알고리즘에 대한 목적 함수를 정의해야 합니다. 반복되는 계층화된 k-폴드 교차 검증을 통해 평균 분류 정확도를 사용하여 구성을 평가합니다. 우리는 구성의 정확성을 극대화하기 위해 노력할 것입니다.

아래의 objective()함수는 데이터 세트와 구성 값 목록을 사용하여이를 구현합니다. 구성 값(학습률 및 정규화 가중치)의 압축이 풀리고 모델을 구성하는 데 사용된 다음 평가되고 평균 정확도가 반환됩니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
# objective function
def objective(X, y, cfg):
 # unpack config
 eta, alpha = cfg
 # define model
 model = Perceptron(penalty=‘elasticnet’, alpha=alpha, eta0=eta)
 # define evaluation procedure
 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
 # evaluate model
 scores = cross_val_score(model, X, y, scoring=‘accuracy’, cv=cv, n_jobs=–1)
 # calculate mean accuracy
 result = mean(scores)
 return result

다음으로 검색 공간에서 한 걸음 더 나아가는 기능이 필요합니다.

검색 공간은 두 개의 변수(eta 및 alpha)로 정의됩니다. 검색 공간의 단계는 이전 값과 어느 정도 관계가 있어야 하며 합리적인 값(예: 0과 1 사이)에 바인딩되어야 합니다.

알고리즘이 기존 구성에서 이동할 수 있는 거리를 제어하는 “단계 크기” 하이퍼파라미터를 사용합니다. 새로운 구성은 현재 값을 분포의 평균으로, 단계 크기를 분포의 표준 편차로 사용하는 가우스 분포를 사용하여 확률적으로 선택됩니다.

randn() NumPy 함수를 사용하여 가우스 분포로 난수를 생성할 수 있습니다.

아래의 step() 함수는 이를 구현하고 검색 공간에서 한 단계를 수행하고 기존 구성을 사용하여 새 구성을 생성합니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# take a step in the search space
def step(cfg, step_size):
    # unpack the configuration
    eta, alpha = cfg
    # step eta
    new_eta = eta + randn() * step_size
    # check the bounds of eta
    if new_eta <= 0.0:
        new_eta = 1e–8
    # step alpha
    new_alpha = alpha + randn() * step_size
    # check the bounds of alpha
    if new_alpha < 0.0:
        new_alpha = 0.0
    # return the new configuration
    return [new_eta, new_alpha]           

다음으로, objective() 함수를 호출하여 후보 솔루션을 평가하고 step() 함수를 호출하여 검색 공간에서 한 걸음 더 나아가는 확률적 언덕 등반 알고리즘을 구현해야 합니다.

검색은 먼저 무작위 초기 해를 생성하며, 이 경우 eta 및 alpha 값이 0과 1 범위에 있습니다. 그런 다음 초기 솔루션이 평가되고 현재 최상의 솔루션으로 간주됩니다.

1
2
3
4
5
...
# starting point for the search
solution = [rand(), rand()]
# evaluate the initial point
solution_eval = objective(X, y, solution)

다음으로, 알고리즘은 검색에 하이퍼파라미터로 제공된 고정된 반복 횟수를 반복합니다. 각 반복에는 단계를 수행하고 새 후보 솔루션을 평가하는 작업이 포함됩니다.

1
2
3
4
5
...
# take a step
candidate = step(solution, step_size)
# evaluate candidate point
candidate_eval = objective(X, y, candidate)

새 솔루션이 현재 작업 솔루션보다 우수하면 현재 작업 중인 새 솔루션으로 간주됩니다.

1
2
3
4
5
6
7
...
# check if we should keep the new point
if candidate_eval >= solution_eval:
 # store the new point
 solution, solution_eval = candidate, candidate_eval
 # report progress
 print(‘>%d, cfg=%s %.5f’ % (i, solution, solution_eval))

검색이 끝나면 최상의 솔루션과 성능이 반환됩니다.

이것을 함께 묶어 아래의 hillclimbing() 함수는 데이터 세트, 목적 함수, 반복 횟수 및 단계 크기를 인수로 사용하여 Perceptron 알고리즘을 조정하기 위한 확률적 언덕 등반 알고리즘을 구현합니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# hill climbing local search algorithm
def hillclimbing(X, y, objective, n_iter, step_size):
    # starting point for the search
    solution = [rand(), rand()]
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = step(solution, step_size)
        # evaluate candidate point
        candidate_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidate_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidate_eval
        # report progress
        print(‘>%d, cfg=%s %.5f’ % (i, solution, solution_eval))
    return [solution, solution_eval]    

그런 다음 알고리즘을 호출하고 검색 결과를보고 할 수 있습니다.

이 경우 알고리즘을 100회 반복하고 약간의 시행착오 끝에 선택한 0.1의 단계 크기를 사용합니다.

1
2
3
4
5
6
7
8
9
...
# define the total iterations
n_iter = 100
# step size in the search space
step_size = 0.1
# perform the hill climbing search
cfg, score = hillclimbing(X, y, objective, n_iter, step_size)
print(‘Done!’)
print(‘cfg=%s: Mean Accuracy: %f’ % (cfg, score))

이를 함께 묶어 Perceptron 알고리즘을 수동으로 튜닝하는 전체 예제는 다음과 같습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# manually search perceptron hyperparameters for binary classification
from numpy import mean
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import Perceptron

# objective function
def objective(X, y, cfg):
    # unpack config
    eta, alpha = cfg
    # define model
    model = Perceptron(penalty=’elasticnet’, alpha=alpha, eta0=eta)
    # define evaluation procedure
    cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
    # evaluate model
    scores = cross_val_score(model, X, y, scoring=’accuracy’, cv=cv, n_jobs=–1)
    # calculate mean accuracy
    result = mean(scores)
    return result

# take a step in the search space
def step(cfg, step_size):
    # unpack the configuration
    eta, alpha = cfg
    # step eta
    new_eta = eta + randn() * step_size
    # check the bounds of eta
    if new_eta <= 0.0:
        new_eta = 1e–8
    # step alpha
    new_alpha = alpha + randn() * step_size
    # check the bounds of alpha
    if new_alpha < 0.0:
        new_alpha = 0.0
    # return the new configuration
    return [new_eta, new_alpha]

# hill climbing local search algorithm
def hillclimbing(X, y, objective, n_iter, step_size):
 # starting point for the search
 solution = [rand(), rand()]
 # evaluate the initial point
 solution_eval = objective(X, y, solution)
 # run the hill climb
 for i in range(n_iter):
    # take a step
    candidate = step(solution, step_size)
    # evaluate candidate point
    candidate_eval = objective(X, y, candidate)
    # check if we should keep the new point
    if candidate_eval >= solution_eval:
        # store the new point
        solution, solution_eval = candidate, candidate_eval
        # report progress
        print(‘>%d, cfg=%s %.5f’ % (i, solution, solution_eval))
    return [solution, solution_eval]
:
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# define the total iterations
n_iter = 100
# step size in the search space
step_size = 0.1
# perform the hill climbing search
cfg, score = hillclimbing(X, y, objective, n_iter, step_size)
print(‘Done!’)
print(‘cfg=%s: Mean Accuracy: %f’ % (cfg, score))

예제를 실행하면 검색 중에 개선 사항이 표시될 때마다 구성 및 결과가 보고됩니다. 실행이 끝나면 최상의 구성과 결과가 보고됩니다.

이 경우 1.004에서 1보다 약간 높은 학습률과 약 0.002의 정규화 가중치를 사용하여 약 78.5%의 정확도를 달성한 기본 구성보다 나은 약 79.1%의 평균 정확도를 달성하는 것이 가장 좋은 결과를 얻을 수 있음을 알 수 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
>0, cfg=[0.5827274503894747, 0.260872709578015] 0.70533
>4, cfg=[0.5449820307807399, 0.3017271170801444] 0.70567
>6, cfg=[0.6286475606495414, 0.17499090243915086] 0.71933
>7, cfg=[0.5956196828965779, 0.0] 0.78633
>8, cfg=[0.5878361167354715, 0.0] 0.78633
>10, cfg=[0.6353507984485595, 0.0] 0.78633
>13, cfg=[0.5690530537610675, 0.0] 0.78633
>17, cfg=[0.6650936023999641, 0.0] 0.78633
>22, cfg=[0.9070451625704087, 0.0] 0.78633
>23, cfg=[0.9253366187387938, 0.0] 0.78633
>26, cfg=[0.9966143540220266, 0.0] 0.78633
>31, cfg=[1.0048613895650054, 0.002162219228449132] 0.79133
Done!
cfg=[1.0048613895650054, 0.002162219228449132]: Mean Accuracy: 0.791333

확률적 언덕 등반 알고리즘을 사용하여 간단한 머신러닝 알고리즘의 하이퍼파라미터를 튜닝하는 방법에 익숙해졌으므로 이제 XGBoost와 같은 고급 알고리즘을 튜닝해 보겠습니다.

XGBoost 하이퍼파라미터 최적화

XGBoost는 익스트림 그래디언트 부스팅의 약자이며 확률적 그래디언트 부스팅 머신러닝 알고리즘을 효율적으로 구현한 것입니다.

그래디언트 부스팅 기계 또는 트리 부스팅이라고도 하는 확률적 그래디언트 부스팅 알고리즘은 광범위한 까다로운 머신러닝 문제에서 잘 또는 가장 잘 수행되는 강력한 머신러닝 기술입니다.

먼저 XGBoost 라이브러리를 설치해야 합니다.

다음과 같이 pip를 사용하여 설치할 수 있습니다.

1
sudo pip install xgboost

설치가 완료되면 다음 코드를 실행하여 성공적으로 설치되었으며 최신 버전을 사용하고 있는지 확인할 수 있습니다.

1
2
3
# xgboost
import xgboost
print(“xgboost”, xgboost.__version__)

코드를 실행하면 다음 버전 번호 이상이 표시됩니다.

1
xgboost 1.0.1

XGBoost 라이브러리에는 자체 파이썬 API가 있지만, XGBClassifier 래퍼 클래스를 통해 scikit-learn API와 함께 XGBoost 모델을 사용할 수 있습니다.

모델의 인스턴스는 모델 평가를 위해 다른 scikit-learn 클래스처럼 인스턴스화되고 사용될 수 있습니다. 예를 들어:

1
2
3
...
# define model
model = XGBClassifier()

XGBoost의 하이퍼파라미터를 튜닝하기 전에 기본 하이퍼파라미터를 사용하여 성능 기준을 설정할 수 있습니다.

이전 섹션의 동일한 합성 이진 분류 데이터 세트와 반복되는 계층화된 k-fold 교차 검증의 동일한 테스트 하네스를 사용합니다.

기본 하이퍼파라미터를 사용하여 XGBoost의 성능을 평가하는 전체 예제는 다음과 같습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# xgboost with default hyperparameters for binary classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from xgboost import XGBClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# define model
model = XGBClassifier()
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring=‘accuracy’, cv=cv, n_jobs=–1)
# report result
print(‘Mean Accuracy: %.3f (%.3f)’ % (mean(scores), std(scores)))

예제를 실행하면 모델이 평가되고 분류 정확도의 평균 및 표준 편차가 보고됩니다.

이 경우 기본 하이퍼파라미터가 있는 모델이 약 84.9%의 분류 정확도를 달성했음을 알 수 있습니다.

최적화된 하이퍼파라미터로 이보다 더 나은 성능을 얻을 수 있기를 바랍니다.

1
Mean Accuracy: 0.849 (0.040)

다음으로, 확률적 언덕 등반 최적화 알고리즘을 조정하여 XGBoost 모델의 하이퍼파라미터를 조정할 수 있습니다.

XGBoost 모델에 대해 최적화할 수 있는 많은 하이퍼파라미터가 있습니다.

XGBoost 모델을 튜닝하는 방법에 대한 개요는 다음 자습서를 참조하십시오.

그래디언트 부스팅 알고리즘을 구성하는 방법

우리는 다음과 같이 네 가지 주요 하이퍼 매개 변수에 초점을 맞출 것입니다.

학습률 (learning_rate)
나무 수 (n_estimators)
하위 표본 백분율(하위 표본)
트리 깊이 (max_depth)

학습률은 앙상블에 대한 각 트리의 기여도를 제어합니다. 합리적인 값은 1.0보다 작고 0.0보다 약간 높습니다(예: 1e-8).

나무의 수는 앙상블의 크기를 제어하며, 종종 더 많은 나무가 수익이 감소하는 지점까지 더 좋습니다. 합리적인 값은 1 나무와 수백 또는 수천 그루 사이입니다.

하위 표본 백분율은 각 트리를 훈련하는 데 사용되는 무작위 표본 크기를 정의하며, 원래 데이터 세트 크기의 백분율로 정의됩니다. 값은 0.0보다 약간 큰 값(예: 1e-8)과 1.0 사이입니다.

트리 깊이는 각 트리의 레벨 수입니다. 더 깊은 트리는 학습 데이터 세트에 더 구체적이며 과적합될 수 있습니다. 짧은 나무는 종종 더 잘 일반화됩니다. 합리적인 값은 1과 10 또는 20 사이입니다.

먼저 objective() 함수를 업데이트하여 XGBoost 모델의 하이퍼파라미터를 풀고 구성한 다음 평균 분류 정확도를 평가해야 합니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
# objective function
def objective(X, y, cfg):
 # unpack config
 lrate, n_tree, subsam, depth = cfg
 # define model
 model = XGBClassifier(learning_rate=lrate, n_estimators=n_tree, subsample=subsam, max_depth=depth)
 # define evaluation procedure
 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
 # evaluate model
 scores = cross_val_score(model, X, y, scoring=‘accuracy’, cv=cv, n_jobs=–1)
 # calculate mean accuracy
 result = mean(scores)
 return result

다음으로 검색 공간에서 단계를 수행하는 데 사용되는 step() 함수를 정의해야 합니다.

각 하이퍼파라미터는 범위가 상당히 다르므로 각 하이퍼파라미터에 대해 스텝 크기(분포의 표준 편차)를 별도로 정의합니다. 또한 간단한 작업을 유지하기 위해 함수에 대한 인수가 아닌 줄로 단계 크기를 정의합니다.

나무의 수와 깊이는 정수이므로 계단식 값은 반올림됩니다.

선택한 단계 크기는 임의적이며 약간의 시행 착오 후에 선택됩니다.

업데이트된 단계 기능은 다음과 같습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# take a step in the search space
def step(cfg):
    # unpack config
    lrate, n_tree, subsam, depth = cfg
    # learning rate
    lrate = lrate + randn() * 0.01
    if lrate <= 0.0:
        lrate = 1e–8
    if lrate > 1:
        lrate = 1.0
    # number of trees
    n_tree = round(n_tree + randn() * 50)
    if n_tree <= 0.0:
        n_tree = 1
    # subsample percentage
    subsam = subsam + randn() * 0.1
    if subsam <= 0.0:
        subsam = 1e–8
    if subsam > 1:
    subsam = 1.0
    # max tree depth
    depth = round(depth + randn() * 7)
    if depth <= 1:
        depth = 1
    # return new config
    return [lrate, n_tree, subsam, depth] 

마지막으로 hillclimbing() 알고리즘을 업데이트하여 적절한 값을 가진 초기 솔루션을 정의해야 합니다.

이 경우 초기 솔루션을 합리적인 기본값으로 정의하거나 기본 하이퍼 매개 변수와 일치하거나 이에 가깝게 정의합니다.

1
2
3
...
# starting point for the search
solution = step([0.1, 100, 1.0, 7])

이를 함께 묶어 확률적 언덕 등반 알고리즘을 사용하여 XGBoost 알고리즘의 하이퍼파라미터를 수동으로 튜닝하는 전체 예가 아래에 나열되어 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# xgboost manual hyperparameter optimization for binary classification                                                                     
from numpy import mean
from numpy.random import randn
from numpy.random import rand
from numpy.random import randint
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from xgboost import XGBClassifier
 
# objective function
def objective(X, y, cfg):
     # unpack config
    lrate, n_tree, subsam, depth = cfg
     # define model
     model = XGBClassifier(learning_rate=lrate, n_estimators=n_tree, subsample=subsam, max_depth=depth)
     # define evaluation procedure
     cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
     # evaluate model
     scores = cross_val_score(model, X, y, scoring=’accuracy’, cv=cv, n_jobs=–1)
     # calculate mean accuracy
     result = mean(scores)
     return result
 
# take a step in the search space
def step(cfg):
    # unpack config
    lrate, n_tree, subsam, depth = cfg
    # learning rate
    lrate = lrate + randn() * 0.01
    if lrate <= 0.0:
        lrate = 1e–8
    if lrate > 1:
        lrate = 1.0
    # number of trees
    n_tree = round(n_tree + randn() * 50)
    if n_tree <= 0.0:
        n_tree = 1
    # subsample percentage
    subsam = subsam + randn() * 0.1
    if subsam <= 0.0:
        subsam = 1e–8
    if subsam > 1:
        subsam = 1.0
    # max tree depth
    depth = round(depth + randn() * 7)
    if depth <= 1:
        depth = 1
    # return new config
    return [lrate, n_tree, subsam, depth]
 
# hill climbing local search algorithm
def hillclimbing(X, y, objective, n_iter):
    # starting point for the search
    solution = step([0.1, 100, 1.0, 7])
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = step(solution)
        # evaluate candidate point
        candidate_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidate_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidate_eval
            # report progress
            print(‘>%d, cfg=[%s] %.5f’ % (i, solution, solution_eval))
    return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# define the total iterations
n_iter = 200
# perform the hill climbing search
cfg, score = hillclimbing(X, y, objective, n_iter)
print(‘Done!’)
print(‘cfg=[%s]: Mean Accuracy: %f’ % (cfg, score))

예제를 실행하면 검색 중에 개선 사항이 표시될 때마다 구성 및 결과가 보고됩니다. 실행이 끝나면 최상의 구성과 결과가 보고됩니다.

이 경우 약 0.02의 학습률, 52개의 트리, 약 50%의 하위 샘플 비율 및 53개의 큰 깊이를 사용하여 최상의 결과를 얻을 수 있음을 알 수 있습니다.

이 구성의 평균 정확도는 약 87.3%로, 약 84.9%의 정확도를 달성한 기본 구성보다 우수합니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
>0, cfg=[[0.1058242692126418, 67, 0.9228490731610172, 12]] 0.85933
>1, cfg=[[0.11060813799692253, 51, 0.859353656735739, 13]] 0.86100
>4, cfg=[[0.11890247679234153, 58, 0.7135275461723894, 12]] 0.86167
>5, cfg=[[0.10226257987735601, 61, 0.6086462443373852, 17]] 0.86400
>15, cfg=[[0.11176962034280596, 106, 0.5592742266405146, 13]] 0.86500
>19, cfg=[[0.09493587069112454, 153, 0.5049124222437619, 34]] 0.86533
>23, cfg=[[0.08516531024154426, 88, 0.5895201311518876, 31]] 0.86733
>46, cfg=[[0.10092590898175327, 32, 0.5982811365027455, 30]] 0.86867
>75, cfg=[[0.099469211050998, 20, 0.36372573610040404, 32]] 0.86900
>96, cfg=[[0.09021536590375884, 38, 0.4725379807796971, 20]] 0.86900
>100, cfg=[[0.08979482274655906, 65, 0.3697395430835758, 14]] 0.87000
>110, cfg=[[0.06792737273465625, 89, 0.33827505722318224, 17]] 0.87000
>118, cfg=[[0.05544969684589669, 72, 0.2989721608535262, 23]] 0.87200
>122, cfg=[[0.050102976159097, 128, 0.2043203965148931, 24]] 0.87200
>123, cfg=[[0.031493266763680444, 120, 0.2998819062922256, 30]] 0.87333
>128, cfg=[[0.023324201169625292, 84, 0.4017169945431015, 42]] 0.87333
>140, cfg=[[0.020224220443108752, 52, 0.5088096815056933, 53]] 0.87367
Done!
cfg=[[0.020224220443108752, 52, 0.5088096815056933, 53]]: Mean Accuracy: 0.873667

추가 정보

이 섹션에서는 더 자세히 알아보려는 경우 주제에 대한 더 많은 리소스를 제공합니다.

자습서

증권 시세 표시기

기사

요약

이 자습서에서는 머신러닝 알고리즘의 하이퍼파라미터를 수동으로 최적화하는 방법을 알아보았습니다.

특히 다음 내용을 배웠습니다.

확률적 최적화 알고리즘은 하이퍼파라미터 최적화를 위해 그리드 및 임의 검색 대신 사용할 수 있습니다.
확률적 언덕 등반 알고리즘을 사용하여 Perceptron 알고리즘의 하이퍼파라미터를 조정하는 방법.
XGBoost 그래디언트 부스팅 알고리즘의 하이퍼파라미터를 수동으로 최적화하는 방법.

Gallery

Contacts

Blog

머신러닝 모델 하이퍼파라미터를 수동으로 최적화하는 방법

튜토리얼 개요

수동 하이퍼파라미터 최적화

퍼셉트론 하이퍼파라미터 최적화

XGBoost 하이퍼파라미터 최적화

추가 정보

자습서

증권 시세 표시기

기사

요약