신경망 모델을 수동으로 최적화하는 방법

_ 2022년 12월 22일_ NEPIRITY

신경망 모델을 수동으로 최적화하는 방법

딥러닝 신경망 모델은 확률적 경사하강법 최적화 알고리즘을 사용하는 훈련 데이터에 적합합니다.

모델의 가중치에 대한 업데이트는 오류 알고리즘의 역전파를 사용하여 수행됩니다. 최적화 알고리즘과 가중치 업데이트 알고리즘의 조합은 신중하게 선택되었으며 신경망에 적합한 것으로 알려진 가장 효율적인 접근 방식입니다.

그럼에도 불구하고 대체 최적화 알고리즘을 사용하여 신경망 모델을 훈련 데이터 세트에 맞출 수 있습니다. 이는 신경망이 작동하는 방식과 응용 머신러닝에서 최적화의 중심 특성에 대해 자세히 알아보는 데 유용한 연습이 될 수 있습니다. 또한 비전통적인 모델 아키텍처와 미분 불가능한 전달 함수를 가진 신경망에도 필요할 수 있습니다.

이 자습서에서는 신경망 모델의 가중치를 수동으로 최적화하는 방법을 알아봅니다.

이 자습서를 완료하면 다음을 알 수 있습니다.

신경망 모델에 대한 순방향 추론 패스를 처음부터 개발하는 방법.
이진 분류를 위해 퍼셉트론(Perceptron) 모델의 가중치를 최적화하는 방법.
확률적 hill climbing을 사용하여 다층 퍼셉트론 모델의 가중치를 최적화하는 방법.

튜토리얼 개요

이 자습서는 다음과 같이 세 부분으로 나뉩니다.

신경망 최적화
퍼셉트론 모델 최적화
다층 퍼셉트론 최적화

신경망 최적화

딥 러닝 또는 신경망은 유연한 유형의 머신러닝입니다.

그들은 뇌의 구조와 기능에서 영감을 얻은 노드와 레이어로 구성된 모델입니다. 신경망 모델은 하나 이상의 계층을 통해 주어진 입력 벡터를 전파하여 분류 또는 회귀 예측 모델링에 해석할 수 있는 숫자 출력값을 생성하는 방식으로 작동합니다.

모델은 입력 및 출력의 예에 모델을 반복적으로 노출하고 가중치를 조정하여 예상 출력과 비교하여 모델 출력의 오류를 최소화하여 학습됩니다. 이를 확률적 경사하강법 최적화 알고리즘이라고 합니다. 모델의 가중치는 네트워크의 각 가중치에 비례하여 오차를 할당하는 미적분학의 특정 규칙을 사용하여 조정됩니다. 이를 역전파 알고리즘이라고 합니다.

역전파를 사용하여 가중치를 업데이트한 확률적 경사하강법 최적화 알고리즘은 신경망 모델을 훈련하는 가장 좋은 방법입니다. 그러나 신경망을 훈련시키는 유일한 방법은 아닙니다.

임의의 최적화 알고리즘을 사용하여 신경망 모델을 훈련시킬 수 있습니다.

즉, 신경망 모델 아키텍처를 정의하고 주어진 최적화 알고리즘을 사용하여 최소 예측 오류 또는 최대 분류 정확도를 초래하는 모델에 대한 가중치 집합을 찾을 수 있습니다.

대체 최적화 알고리즘을 사용하는 것은 역전파와 함께 확률적 경사하강법을 사용하는 것보다 평균적으로 덜 효율적일 것으로 예상됩니다. 그럼에도 불구하고 비표준 네트워크 아키텍처 또는 비차등 전송 기능과 같은 일부 특정 경우에 더 효율적일 수 있습니다.

또한 머신러닝 알고리즘, 특히 신경망 훈련에서 최적화의 중심 특성을 보여주는 흥미로운 연습이 될 수 있습니다.

다음으로, 확률적 hill climbing을 사용하여 Perceptron 모델이라는 간단한 1노드 신경망을 훈련시키는 방법을 살펴보겠습니다.

퍼셉트론 모델 최적화

퍼셉트론 알고리즘은 가장 간단한 유형의 인공 신경망입니다.

이것은 2 클래스 분류 문제에 사용할 수있는 단일 뉴런의 모델이며 나중에 훨씬 더 큰 네트워크를 개발하기위한 기반을 제공합니다.

이 섹션에서는 Perceptron 신경망 모델의 가중치를 최적화합니다.

먼저 모델 최적화의 초점으로 사용할 수 있는 합성 이진 분류 문제를 정의해 보겠습니다.

make_classification()함수를 사용하여 1,000 개의 행과 5 개의 입력 변수가있는 이진 분류 문제를 정의할 수 있습니다.

아래 예제에서는 데이터 집합을 만들고 데이터의 모양을 요약합니다.

1
2
3
4
5
6
# define a binary classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# summarize the shape of the dataset
print(X.shape, y.shape)

예제를 실행하면 생성된 데이터 세트의 모양이 인쇄되어 기대치를 확인합니다.

1
(1000, 5) (1000,)

다음으로 퍼셉트론 모델을 정의해야 합니다.

Perceptron 모델에는 데이터 집합의 각 열에 대해 하나의 입력 가중치가 있는 단일 노드가 있습니다.

각 입력에 해당 가중치를 곱하여 가중 합계를 제공하고 회귀 모델의 절편 계수와 같은 편향 가중치가 추가됩니다. 이 가중 합계를 활성화라고 합니다. 마지막으로, 활성화는 해석되고 클래스 레이블을 예측하는 데 사용되며, 양성 활성화의 경우 1, 음수 활성화의 경우 0입니다.

모델 가중치를 최적화하기 전에 모델과 작동 방식에 대한 확신을 개발해야 합니다.

먼저 모델의 활성화를 해석하는 함수를 정의해 보겠습니다.

이를 활성화 함수 또는 전달 함수라고 합니다. 후자의 이름은 더 전통적이며 내 취향입니다.

아래의 transfer() 함수는 모델의 활성화를 취하고 클래스 레이블, 양수 또는 0 활성화의 경우 class=1, 음수 활성화의 경우 class=0을 반환합니다. 이를 단계 전달 함수라고 합니다.

1
2
3
4
5
# transfer function
def transfer(activation):
 if activation >= 0.0:
 return 1
 return 0

다음으로, 데이터 세트에서 주어진 입력 데이터 행에 대한 모델의 활성화를 계산하는 함수를 개발할 수 있습니다.

이 함수는 모델의 데이터 행과 가중치를 가져와 바이어스 가중치를 추가하여 입력의 가중 합계를 계산합니다. 아래의 activate() 함수는 이를 구현합니다.

참고 : 우리는 파이썬 초보자가 코드를 더 읽기 쉽게하기 위해 의도적으로 NumPy 배열 또는 목록 압축 대신 간단한 파이썬 목록과 명령형 프로그래밍 스타일을 사용하고 있습니다. 자유롭게 최적화하고 아래 의견에 코드를 게시하십시오.

1
2
3
4
5
6
7
8
# activation function
def activate(row, weights):
    # add the bias, the last weight
    activation = weights[–1]
    # add the weighted input
    for i in range(len(row)):
        activation += weights[i] * row[i]
    return activation 

다음으로 activate() 및 transfer() 함수를 함께 사용하여 주어진 데이터 행에 대한 예측을 생성할 수 있습니다. 아래의 predict_row() 함수는 이를 구현합니다.

1
2
3
4
5
6
# use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):
 # activate for input
 activation = activate(row, weights)
 # transfer for activation
 return transfer(activation)

다음으로 주어진 데이터 세트의 각 행에 대해 predict_row()함수를 호출 할 수 있습니다. 아래의 predict_dataset() 함수는 이를 구현합니다.

다시 말하지만, 우리는 의도적으로 목록 압축 대신 가독성을 위해 간단한 명령형 코딩 스타일을 사용하고 있습니다.

1
2
3
4
5
6
7
# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):
    yhats = list()
    for row in X:
        yhat = predict_row(row, weights)
        yhats.append(yhat)
    return yhats       

마지막으로, 모델을 사용하여 합성 데이터 세트에 대한 예측을 수행하여 모든 것이 올바르게 작동하는지 확인할 수 있습니다.

rand()함수를 사용하여 임의의 모델 가중치 세트를 생성 할 수 있습니다.

각 입력에 대해 하나의 가중치(이 데이터 세트의 5개 입력)와 바이어스 가중치에 대한 추가 가중치가 필요하다는 것을 상기하십시오.

1
2
3
4
5
6
7
...
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of weights
n_weights = X.shape[1] + 1
# generate random weights
weights = rand(n_weights)

그런 다음 이러한 가중치를 데이터 세트와 함께 사용하여 예측을 수행할 수 있습니다.

1
2
3
...
# generate predictions for dataset
yhat = predict_dataset(X, weights)

이러한 예측의 분류 정확도를 평가할 수 있습니다.

1
2
3
4
...
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

이 모든 것을 하나로 묶고 분류를 위한 간단한 퍼셉트론 모델을 시연할 수 있습니다. 전체 예제는 다음과 같습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# simple perceptron model for binary classification
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
 
# transfer function
def transfer(activation):
 if activation >= 0.0:
 return 1
 return 0
 
# activation function
def activate(row, weights):
 # add the bias, the last weight
 activation = weights[–1]
 # add the weighted input
 for i in range(len(row)):
 activation += weights[i] * row[i]
 return activation
 
# use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):
 # activate for input
 activation = activate(row, weights)
 # transfer for activation
 return transfer(activation)
 
# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):
 yhats = list()
 for row in X:
 yhat = predict_row(row, weights)
 yhats.append(yhat)
 return yhats
 
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of weights
n_weights = X.shape[1] + 1
# generate random weights
weights = rand(n_weights)
# generate predictions for dataset
yhat = predict_dataset(X, weights)
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

예제를 실행하면 학습 데이터 세트의 각 예제에 대한 예측이 생성된 다음, 예측에 대한 분류 정확도가 출력됩니다.

참고: 결과는 알고리즘 또는 평가 절차의 확률적 특성 또는 수치 정밀도의 차이에 따라 달라질 수 있습니다. 예제를 몇 번 실행하고 평균 결과를 비교하는 것이 좋습니다.

우리는 무작위 가중치 세트와 각 클래스에서 동일한 수의 예제가있는 데이터 세트가 주어지면 약 50 %의 정확도를 기대할 수 있으며,이 경우 대략 볼 수 있습니다.

1
0.548

이제 이 데이터 세트에서 좋은 정확도를 달성하기 위해 데이터 세트의 가중치를 최적화할 수 있습니다.

먼저 데이터 세트를 학습 집합과 테스트 집합으로 분할해야 합니다. 새 데이터에 대한 예측을 수행하는 데 사용될 때 모델 성능에 대한 합리적인 추정치를 준비할 수 있도록 모델 최적화에 사용되지 않는 일부 데이터를 보류하는 것이 중요합니다.

데이터의 67%는 학습에 사용하고 나머지 33%는 모델 성능을 평가하기 위한 테스트 집합으로 사용합니다.

1
2
3
...
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

다음으로 확률적 hill climbing 알고리즘을 개발할 수 있습니다.

최적화 알고리즘에는 최적화를 위한 목적 함수가 필요합니다. 가중치 집합을 가져와 더 나은 모델에 따라 최소화하거나 최대화할 점수를 반환해야 합니다.

이 경우 주어진 가중치 집합으로 모델의 정확도를 평가하고 최대화해야 하는 분류 정확도를 반환합니다.

아래의 objective()함수는 데이터 세트와 가중치 집합이 주어진이를 구현하고 모델의 정확도를 반환합니다.

1
2
3
4
5
6
7
# objective function
def objective(X, y, weights):
 # generate predictions for dataset
 yhat = predict_dataset(X, weights)
 # calculate accuracy
 score = accuracy_score(y, yhat)
 return score

다음으로 확률적 hill climbing 알고리즘을 정의할 수 있습니다.

알고리즘에는 초기 솔루션 (예 : 임의 가중치)이 필요하며 솔루션을 반복적으로 변경하고 더 나은 성능의 모델이 생성되는지 확인합니다. 현재 솔루션에 대한 변경 정도는 step_size 하이퍼파라미터에 의해 제어됩니다. 이 프로세스는 고정된 반복 횟수 동안 계속되며 하이퍼 매개 변수로도 제공됩니다.

아래의 hillclimbing() 함수는 데이터 세트, 목적 함수, 초기 솔루션 및 하이퍼파라미터를 인수로 사용하여 이를 구현하고 발견된 최상의 가중치 집합과 예상 성능을 반환합니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = solution + randn(len(solution)) * step_size
        # evaluate candidate point
        candidte_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidte_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidte_eval
            # report progress
            print(‘>%d %.5f’ % (i, solution_eval))
    return [solution, solution_eval] 

그런 다음 이 함수를 호출하여 가중치 집합을 초기 솔루션으로 전달하고 학습 데이터 세트를 모델을 최적화할 데이터 세트로 전달할 수 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
...
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.05
# determine the number of weights
n_weights = X.shape[1] + 1
# define the initial solution
solution = rand(n_weights)
# perform the hill climbing search
weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print(‘Done!’)
print(‘f(%s) = %f’ % (weights, score))

마지막으로 테스트 데이터 세트에서 최상의 모델을 평가하고 성능을 보고할 수 있습니다.

1
2
3
4
5
6
...
# generate predictions for the test dataset
yhat = predict_dataset(X_test, weights)
# calculate accuracy
score = accuracy_score(y_test, yhat)
print(‘Test Accuracy: %.5f’ % (score * 100))

이를 함께 묶어 합성 이진 최적화 데이터 세트에서 Perceptron 모델의 가중치를 최적화하는 전체 예가 아래에 나열되어 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# hill climbing to optimize weights of a perceptron model for classification
from numpy import asarray
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
 
# transfer function
def transfer(activation):
    if activation >= 0.0:
        return 1
    return 0
 
# activation function
def activate(row, weights):
    # add the bias, the last weight
    activation = weights[–1]
    # add the weighted input
    for i in range(len(row)):
        activation += weights[i] * row[i]
    return activation
 
# # use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):
    # activate for input
    activation = activate(row, weights)
    # transfer for activation
    return transfer(activation)
 
# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):
    yhats = list()
    for row in X:
    yhat = predict_row(row, weights)
    yhats.append(yhat)
    return yhats
 
# objective function
def objective(X, y, weights):
    # generate predictions for dataset
    yhat = predict_dataset(X, weights)
    # calculate accuracy
    score = accuracy_score(y, yhat)
    return score
 
# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = solution + randn(len(solution)) * step_size
        # evaluate candidate point
        candidte_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidte_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidte_eval
            # report progress
            print(‘>%d %.5f’ % (i, solution_eval))
    return [solution, solution_eval]  

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.05
# determine the number of weights
n_weights = X.shape[1] + 1
# define the initial solution
solution = rand(n_weights)
# perform the hill climbing search
weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print(‘Done!’)
print(‘f(%s) = %f’ % (weights, score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, weights)
# calculate accuracy
score = accuracy_score(y_test, yhat)
print(‘Test Accuracy: %.5f’ % (score * 100)) 

예제를 실행하면 모델이 개선될 때마다 반복 횟수와 분류 정확도가 보고됩니다.

검색이 끝나면 학습 데이터 세트에서 최상의 가중치 집합의 성능이 보고되고 테스트 데이터 세트에서 동일한 모델의 성능이 계산 및 보고됩니다.

이 경우 최적화 알고리즘이 훈련 데이터 세트에서 약 88.5%의 정확도를 달성하고 테스트 데이터 세트에서 약 81.8%의 정확도를 달성한 가중치 집합을 찾았음을 알 수 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
…
>111 0.88060
>119 0.88060
>126 0.88209
>134 0.88209
>205 0.88209
>262 0.88209
>280 0.88209
>293 0.88209
>297 0.88209
>336 0.88209
>373 0.88209
>437 0.88358
>463 0.88507
>630 0.88507
>701 0.88507
Done!
f([ 0.0097317 0.13818088 1.17634326 -0.04296336 0.00485813 -0.14767616]) = 0.885075
Test Accuracy: 81.81818

이제 Perceptron 모델의 가중치를 수동으로 최적화하는 방법에 익숙해졌으므로 예제를 확장하여 MLP(다층 퍼셉트론) 모델의 가중치를 최적화하는 방법을 살펴보겠습니다.

다층 퍼셉트론 최적화

다층 퍼셉트론(MLP) 모델은 하나 이상의 계층이 있는 신경망으로, 각 계층에는 하나 이상의 노드가 있습니다.

이것은 Perceptron 모델의 확장이며 아마도 가장 널리 사용되는 신경망 (딥 러닝) 모델 일 것입니다.

이 섹션에서는 이전 섹션에서 배운 내용을 기반으로 레이어당 임의의 수의 레이어와 노드로 MLP 모델의 가중치를 최적화합니다.

먼저 모델을 개발하고 무작위 가중치로 테스트 한 다음 확률 적 hill climbing을 사용하여 모델 가중치를 최적화합니다.

이진 분류에 MLP를 사용하는 경우 퍼셉트론에서 사용되는 계단 전달 함수 대신 시그모이드 전달 함수(로지스틱 함수라고도 함)를 사용하는 것이 일반적입니다.

이 함수는 이항 확률 분포를 나타내는 0-1 사이의 실수 값을 출력합니다(예: 예제가 class=1에 속할 확률). 아래의 transfer() 함수는 이를 구현합니다.

1
2
3
4
# transfer function
def transfer(activation):
 # sigmoid transfer function
 return 1.0 / (1.0 + exp(–activation))

이전 섹션과 동일한 activate() 함수를 사용할 수 있습니다. 여기에서는 주어진 레이어의 각 노드에 대한 활성화를 계산하는 데 사용합니다.

predict_row()함수는 더 정교한 버전으로 대체되어야합니다.

이 함수는 데이터 행과 네트워크를 가져와 네트워크의 출력값을 반환합니다.

네트워크를 목록 목록으로 정의합니다. 각 계층은 노드 목록이고 각 노드는 가중치 목록 또는 배열입니다.

네트워크의 예측을 계산하려면 계층을 열거하고 노드를 열거한 다음 각 노드에 대한 활성화 및 전송 출력값을 계산합니다. 이 경우 네트워크의 모든 노드에 대해 동일한 전송 기능을 사용하지만 반드시 그럴 필요는 없습니다.

계층이 두 개 이상인 네트워크의 경우 이전 계층의 출력값이 다음 계층의 각 노드에 대한 입력값으로 사용됩니다. 그러면 네트워크의 최종 계층에서 출력값이 반환됩니다.

아래의 predict_row()함수는이를 구현합니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# activation function for a network
def predict_row(row, network):
    inputs = row 
    # enumerate the layers in the network from input to output
    for layer in network:
        new_inputs = list()
    # enumerate nodes in the layer
    for node in layer:
        # activate the node
        activation = activate(inputs, node)
        # transfer activation
        output = transfer(activation)
        # store output
        new_inputs.append(output)
        # output from this layer is input to the next layer
        inputs = new_inputs
    return inputs[0]

마지막으로 사용할 네트워크를 정의해야 합니다.

예를 들어 다음과 같이 단일 노드가 있는 단일 숨겨진 레이어가 있는 MLP를 정의할 수 있습니다.

1
2
3
4
5
...
# create a one node network
node = rand(n_inputs + 1)
layer = [node]
network = [layer]

이것은 실제로 퍼셉트론이지만 시그모이드 전달 함수가 있습니다.

하나의 숨겨진 레이어와 하나의 출력 레이어로 MLP를 정의해 보겠습니다. 첫 번째 숨겨진 레이어에는 10개의 노드가 있으며 각 노드는 데이터셋에서 입력 패턴(예: 5개의 입력)을 가져옵니다. 출력 계층에는 첫 번째 숨겨진 계층의 출력에서 입력을 가져온 다음 예측을 출력하는 단일 노드가 있습니다.

1
2
3
4
5
6
...
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]

그런 다음 모델을 사용하여 데이터 세트에 대한 예측을 수행할 수 있습니다.

1
2
3
...
# generate predictions for dataset
yhat = predict_dataset(X, network)

분류 정확도를 계산하기 전에 예측을 클래스 레이블 0과 1로 반올림해야 합니다.

1
2
3
4
5
6
...
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

이 모든 것을 종합하면 합성 이진 분류 데이터 세트에서 무작위 초기 가중치로 MLP를 평가하는 전체 예가 아래에 나열되어 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# develop an mlp model for classification
from math import exp
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
 
# transfer function
def transfer(activation):
 # sigmoid transfer function
 return 1.0 / (1.0 + exp(–activation))
 
# activation function
def activate(row, weights):
 # add the bias, the last weight
 activation = weights[–1]
 # add the weighted input
 for i in range(len(row)):
 activation += weights[i] * row[i]
 return activation
 
# activation function for a network
def predict_row(row, network):
 inputs = row
 # enumerate the layers in the network from input to output
 for layer in network:
 new_inputs = list()
 # enumerate nodes in the layer
  for node in layer:
 # activate the node
 activation = activate(inputs, node)
 # transfer activation
 output = transfer(activation)
 # store output
 new_inputs.append(output)
 # output from this layer is input to the next layer
 inputs = new_inputs
 return inputs[0]
 
# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, network):
 yhats = list()
 for row in X:
 yhat = predict_row(row, network)
 yhats.append(yhat)
 return yhats
 
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of inputs
n_inputs = X.shape[1]
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]
# generate predictions for dataset
yhat = predict_dataset(X, network)
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

예제를 실행하면 학습 데이터 세트의 각 예제에 대한 예측이 생성된 다음, 예측에 대한 분류 정확도가 인쇄됩니다.

다시 말하지만, 우리는 무작위 가중치 세트와 각 클래스에서 동일한 수의 예제가 있는 데이터 세트가 주어지면 약 50%의 정확도를 기대할 수 있으며, 이것이 이 경우에 대략 볼 수 있는 것입니다.

1
0.499

다음으로 확률적 hill climbing 알고리즘을 데이터 세트에 적용할 수 있습니다.

퍼셉트론 모델에 힐 클라이밍을 적용하는 것과 매우 유사하지만, 이 경우 네트워크의 모든 가중치를 수정해야 하는 단계를 제외하고는.

이를 위해 네트워크의 복사본을 만들고 복사본을 만드는 동안 네트워크의 각 가중치를 변경하는 새로운 기능을 개발할 것입니다.

아래의 step() 함수는 이를 구현합니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# take a step in the search space                                                                                                       
def step(network, step_size):
    new_net = list()
    # enumerate layers in the network
    for layer in network:
        new_layer = list()
    # enumerate nodes in this layer
    for node in layer:
        # mutate the node
        new_node = node.copy() + randn(len(node)) * step_size
        # store node in layer
        new_layer.append(new_node)
        # store layer in network
        new_net.append(new_layer)
    return new_net

네트워크의 모든 가중치를 수정하는 것은 공격적입니다.

검색 공간에서 덜 공격적인 단계는 하이퍼 매개 변수에 의해 제어되는 모델의 가중치 하위 집합을 약간 변경하는 것입니다. 이것은 확장으로 남아 있습니다.

그런 다음 hill climbing() 함수에서 이 새로운 step() 함수를 호출할 수 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = step(solution, step_size)
        # evaluate candidate point
        candidte_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidte_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidte_eval
            # report progress
            print(‘>%d %f’ % (i, solution_eval))
    return [solution, solution_eval] 

이를 함께 묶어 확률적 hill climbing을 적용하여 이진 분류를 위한 MLP 모델의 가중치를 최적화하는 전체 예가 아래에 나열되어 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# stochastic hill climbing to optimize a multilayer perceptron for classification                                                       
from math import exp
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# transfer function
def transfer(activation):
    # sigmoid transfer function
    return 1.0 / (1.0 + exp(–activation))

# activation function
def activate(row, weights):
    # add the bias, the last weight
    activation = weights[–1]
    # add the weighted input
        for i in range(len(row)):
        seyt activation += weights[i] * row[i]
    return activation

# activation function for a network
def predict_row(row, network):
    inputs = row
    # enumerate the layers in the network from input to output
    for layer in network:
        new_inputs = list()
    # enumerate nodes in the layer
    for node in layer:
        # activate the node
        activation = activate(inputs, node)
        # transfer activation
        output = transfer(activation)
        # store output
        new_inputs.append(output)
        # output from this layer is input to the next layer
        inputs = new_inputs
    return inputs[0]

# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, network):
    yhats = list()
    for row in X:
        yhat = predict_row(row, network)
        yhats.append(yhat)
    return yhats

# objective function
def objective(X, y, network):
    # generate predictions for dataset
    yhat = predict_dataset(X, network)
    # round the predictions
    yhat = [round(y) for y in yhat]
    # calculate accuracy
    score = accuracy_score(y, yhat)
    return score
 
# take a step in the search space
def step(network, step_size):
    new_net = list()
    # enumerate layers in the network
    for layer in network:
        new_layer = list()
    # enumerate nodes in this layer
    for node in layer:
        # mutate the node
        new_node = node.copy() + randn(len(node)) * step_size
        # store node in layer
        new_layer.append(new_node)
        # store layer in network
        new_net.append(new_layer)
    return new_net
 
# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = step(solution, step_size)
        # evaluate candidate point
        candidte_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidte_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidte_eval
            # report progress
            print(‘>%d %f’ % (i, solution_eval))
    return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.1
# determine the number of inputs
n_inputs = X.shape[1]
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]
# perform the hill climbing search
network, score = hillclimbing(X_train, y_train, objective, network, n_iter, step_size)
print(‘Done!’)
print(‘Best: %f’ % (score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, network)
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y_test, yhat)
print(‘Test Accuracy: %.5f’ % (score * 100))

예제를 실행하면 모델이 개선될 때마다 반복 횟수와 분류 정확도가 보고됩니다.

검색이 끝나면 학습 데이터 세트에서 최상의 가중치 집합의 성능이 보고되고 테스트 데이터 세트에서 동일한 모델의 성능이 계산 및 보고됩니다.

이 경우 최적화 알고리즘이 훈련 데이터 세트에서 약 87.3%의 정확도를 달성하고 테스트 데이터 세트에서 약 85.1%의 정확도를 달성한 가중치 집합을 찾았음을 알 수 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
…
>55 0.755224
>56 0.765672
>59 0.794030
>66 0.805970
>77 0.835821
>120 0.838806
>165 0.840299
>188 0.841791
>218 0.846269
>232 0.852239
>237 0.852239
>239 0.855224
>292 0.867164
>368 0.868657
>823 0.868657
>852 0.871642
>889 0.871642
>892 0.871642
>992 0.873134
Done!
Best: 0.873134
Test Accuracy: 85.15152

추가 정보

이 섹션에서는 더 자세히 알아보려는 경우 주제에 대한 더 많은 리소스를 제공합니다.

자습서

증권 시세 표시기

요약

이 자습서에서는 신경망 모델의 가중치를 수동으로 최적화하는 방법을 알아보았습니다.

특히 다음 내용을 배웠습니다.

신경망 모델에 대한 순방향 추론 패스를 처음부터 개발하는 방법.
이진 분류를 위해 Perceptron 모델의 가중치를 최적화하는 방법.
확률적 hill climbing을 사용하여 다층 퍼셉트론 모델의 가중치를 최적화하는 방법.

Gallery

Contacts

Blog

신경망 모델을 수동으로 최적화하는 방법

튜토리얼 개요

신경망 최적화

퍼셉트론 모델 최적화

다층 퍼셉트론 최적화

추가 정보

자습서

증권 시세 표시기

요약