AdaBoost 이진분류 알고리즘

Notice

Recent Posts

Recent Comments

Link

깃헙

« 2024/11 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

경주장

AdaBoost 이진분류 알고리즘 본문

인공지능

AdaBoost 이진분류 알고리즘

달리는치타 2021. 9. 18. 21:58

고려대학교 산업경영공학부 DSBA 연구실 강필성 교수님의 강의를 보고 정리한 것입니다.

AdaBoosting

. A weak model could be boosted in to arbitralily accurate strong model

. New classifiers should focus on difficult cases.

here weak model - slightly better than random gussing

Ensemble size T 만큼 반복 :

Get Some rule of thumb (= weak model)

Reweight the examples of the training set, concentrate on hard cases previous rule

Derive the next rule of thumb

=> Build a single, accurate predictor by combining the rules of thumb

위 알고리즘은 2차원 좌표 데이터 셋 S의 Binary Classification에 AdaBoosting을 적용한 의사코드입니다.

input : T - 앙상블 사이즈 (반복 횟수)
input : S - 2차원 좌표 데이터 셋
input : D1(i) - 첫 번째 반복에서 i-th data가 선택될 확률 / Uniform Distribution으로 initialize 합니다.

for t =1 to T do
각 반복의 목적은 distribution Dt 를 활용하여 t-th rule of thumb인 ht를 생성하는 것입니다.
epsilone_t 는 ht의 오분류 율입니다. 이때 오분류율이 0.5를 넘으면
(즉, 이진분류의 경우에 Random Guessing 보다 성능이 좋지 않으면) skip합니다.

epsilone_t (weak model의 오분류율)에 따른 최종 반영율 alpha_t를 결정합니다.
Logic에 따라 D_(t+1)을 update합니다.
end for
weak model을 합친 최종 Model H(x)는 각 weak model(ht)의 반영율(alpha_t)에 의해 결정됩니다.

이때 오분류율에 따라 결정되는 반영율의 식을 보면 random guessing에 가까울 수록 (오분류율이 0.5에 가까울 수록)

반영률 alpha_t가 0에 가까운것을 확인 할 수 있습니다. 반대로 0 잘 맞출수록 최대한 많이 반영합니다.

Data Selection Probability Distribution인 D의 update는

. 맞춘 data의 선택 확률은 줄이고

. 틀린 data의 선택 확률은 높이는

방향으로 update 하는 것을 확인 할 수 있습니다.

update식의 분자에서 y_i 와 h_i(x_i)의 곱은 항상 1 or -1입니다.

이때 1은 정답 case, -1은 오답 case임을 알 수 있습니다.

'인공지능' 카테고리의 다른 글

Random Forests 정리 (0)	2021.09.18
Ensemble Learning 정리 (0)	2021.09.18
Data Preprocessing/ Weight Initialization (0)	2021.07.19

'인공지능' Related Articles

경주장

AdaBoost 이진분류 알고리즘 본문

AdaBoost 이진분류 알고리즘

AdaBoosting

'인공지능' 카테고리의 다른 글

티스토리툴바