Service

WOW Onboarding WOW CLASS

Activity

Guide

🤖

[AI 정규 스터디 3주차 과제] 과적합 확인

들어가기 앞서

3주차 스터디를 듣느라 고생하셨습니다!

3주차 과제는 수업시간에 실습한 내용을 바탕으로 진행됩니다.

단순히 코드를 옮겨 적는 것이 아니라, 각 코드가 머신러닝 과정의 어떤 단계에 해당하는지를 이해해 주세요.

먼저, 수업시간에 작성한 코드를 그대로 실행하여 전체 흐름을 다시 확인해주세요. 이후, 제공된 코드 아래에 이어서 직접 코드를 작성하며 과제를 수행해주세요.

모르는 부분이 생긴거나 궁금하다면 디스코드 #정규스터디-질문방 에 질문해주세요!

3주차 스터디 영상과 강의록

강의 녹화본

YouTube[GDG Hongik] AI 정규 스터디 3주차 (2026-1)

강의록

3주차 정규스터디.pdf

79.5 MiB

과제

목표

•

수업시간의 내용을 직접 실습을 통해 익혀보아요.

•

과적합을 규제해보아요.

제출해야 할 파일

AI_정규_3주차.ipynb & wil.md

•

AI_정규_3주차.ipynb - 코드 (실습 + 과제)

•

wil.md- 배운 점 & 느낀 점 (300자 이상)

과제 명세서 - AI_정규_3주차.ipynb

수업시간에 실습한 내용과 과제를 하나의 파일 AI_정규_3주차.ipynb에 작성하여 제출해주세요.

수업시간 내용 실습

import numpy as np

perch_length = np.array(
    [8.4, 13.7, 15.0, 16.2, 17.4, 18.0, 18.7, 19.0, 19.6, 20.0,
     21.0, 21.0, 21.0, 21.3, 22.0, 22.0, 22.0, 22.0, 22.0, 22.5,
     22.5, 22.7, 23.0, 23.5, 24.0, 24.0, 24.6, 25.0, 25.6, 26.5,
     27.3, 27.5, 27.5, 27.5, 28.0, 28.7, 30.0, 32.8, 34.5, 35.0,
     36.5, 36.0, 37.0, 37.0, 39.0, 39.0, 39.0, 40.0, 40.0, 40.0,
     40.0, 42.0, 43.0, 43.0, 43.5, 44.0]
     )
perch_weight = np.array(
    [5.9, 32.0, 40.0, 51.5, 70.0, 100.0, 78.0, 80.0, 85.0, 85.0,
     110.0, 115.0, 125.0, 130.0, 120.0, 120.0, 130.0, 135.0, 110.0,
     130.0, 150.0, 145.0, 150.0, 170.0, 225.0, 145.0, 188.0, 180.0,
     197.0, 218.0, 300.0, 260.0, 265.0, 250.0, 250.0, 300.0, 320.0,
     514.0, 556.0, 840.0, 685.0, 700.0, 700.0, 690.0, 900.0, 650.0,
     820.0, 850.0, 900.0, 1015.0, 820.0, 1100.0, 1000.0, 1100.0,
     1000.0, 1000.0]
     )
Python
복사

import matplotlib.pyplot as plt

plt.scatter(perch_length, perch_weight)
plt.xlabel('length')
plt.ylabel('weight')
plt.show()
Python
복사

from sklearn.model_selection import train_test_split

train_input, test_input, train_target, test_target = train_test_split(
    perch_length, perch_weight, random_state=42)
Python
복사

print(train_input.shape, test_input.shape)
Python
복사

test_array = np.array([1,2,3,4])
print(test_array.shape)
Python
복사

test_array = test_array.reshape(2, 2)
print(test_array.shape)
Python
복사

train_input = train_input.reshape(-1, 1)
test_input = test_input.reshape(-1, 1)
print(train_input.shape, test_input.shape)
Python
복사

from sklearn.neighbors import KNeighborsRegressor

knr = KNeighborsRegressor()
knr.fit(train_input, train_target)
Python
복사

knr.score(test_input, test_target)
Python
복사

from sklearn.metrics import mean_absolute_error

test_prediction = knr.predict(test_input)
mae = mean_absolute_error(test_target, test_prediction)
print(mae)
Python
복사

print(knr.score(train_input, train_target))
Python
복사

knr.n_neighbors = 3

knr.fit(train_input, train_target)
print(knr.score(train_input, train_target))
Python
복사

print(knr.score(test_input, test_target))
Python
복사

import numpy as np

perch_length = np.array(
    [8.4, 13.7, 15.0, 16.2, 17.4, 18.0, 18.7, 19.0, 19.6, 20.0,
     21.0, 21.0, 21.0, 21.3, 22.0, 22.0, 22.0, 22.0, 22.0, 22.5,
     22.5, 22.7, 23.0, 23.5, 24.0, 24.0, 24.6, 25.0, 25.6, 26.5,
     27.3, 27.5, 27.5, 27.5, 28.0, 28.7, 30.0, 32.8, 34.5, 35.0,
     36.5, 36.0, 37.0, 37.0, 39.0, 39.0, 39.0, 40.0, 40.0, 40.0,
     40.0, 42.0, 43.0, 43.0, 43.5, 44.0]
     )
perch_weight = np.array(
    [5.9, 32.0, 40.0, 51.5, 70.0, 100.0, 78.0, 80.0, 85.0, 85.0,
     110.0, 115.0, 125.0, 130.0, 120.0, 120.0, 130.0, 135.0, 110.0,
     130.0, 150.0, 145.0, 150.0, 170.0, 225.0, 145.0, 188.0, 180.0,
     197.0, 218.0, 300.0, 260.0, 265.0, 250.0, 250.0, 300.0, 320.0,
     514.0, 556.0, 840.0, 685.0, 700.0, 700.0, 690.0, 900.0, 650.0,
     820.0, 850.0, 900.0, 1015.0, 820.0, 1100.0, 1000.0, 1100.0,
     1000.0, 1000.0]
     )
Python
복사

from sklearn.model_selection import train_test_split

train_input, test_input, train_target, test_target = train_test_split(
    perch_length, perch_weight, random_state=42)

train_input = train_input.reshape(-1, 1)
test_input = test_input.reshape(-1, 1)
Python
복사

from sklearn.neighbors import KNeighborsRegressor

knr = KNeighborsRegressor(n_neighbors=3)
knr.fit(train_input, train_target)
Python
복사

print(knr.predict([[50]]))
Python
복사

import matplotlib.pyplot as plt

distances, indexes = knr.kneighbors([[50]])

plt.scatter(train_input, train_target)
plt.scatter(train_input[indexes], train_target[indexes], marker='D')
plt.scatter(50, 1033, marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()
Python
복사

print(np.mean(train_target[indexes]))
Python
복사

print(knr.predict([[100]]))
Python
복사

# 100cm 농어의 이웃 구하기
distances, indexes = knr.kneighbors([[100]])

plt.scatter(train_input, train_target)
plt.scatter(train_input[indexes], train_target[indexes], marker='D')
plt.scatter(100, 1033, marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()
Python
복사

선형회귀

from sklearn.linear_model import LinearRegression
Python
복사

lr = LinearRegression()
lr.fit(train_input, train_target)
Python
복사

print(lr.predict([[50]]))
Python
복사

print(lr.coef_, lr.intercept_)
Python
복사

plt.scatter(train_input, train_target)
plt.plot([15, 50], [15*lr.coef_+lr.intercept_, 50*lr.coef_+lr.intercept_])
plt.scatter(50, 1241.8, marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()
Python
복사

print(lr.score(train_input, train_target))
print(lr.score(test_input, test_target))
Python
복사

다항회귀

train_poly = np.column_stack((train_input ** 2, train_input))
test_poly = np.column_stack((test_input ** 2, test_input))
Python
복사

print(train_poly.shape, test_poly.shape)
Python
복사

lr = LinearRegression()
lr.fit(train_poly, train_target)

print(lr.predict([[50**2, 50]]))
Python
복사

print(lr.coef_, lr.intercept_)
Python
복사

point = np.arange(15, 50)
plt.scatter(train_input, train_target)
plt.plot(point, 1.01*point**2 - 21.6*point + 116.05)
plt.scatter([50], [1574], marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()
Python
복사

print(lr.score(train_poly, train_target))
print(lr.score(test_poly, test_target))
Python
복사

다중회귀

import pandas as pd

df = pd.read_csv('https://bit.ly/perch_csv_data')
perch_full = df.to_numpy()
print(perch_full)
Python
복사

import numpy as np

perch_weight = np.array(
    [5.9, 32.0, 40.0, 51.5, 70.0, 100.0, 78.0, 80.0, 85.0, 85.0,
     110.0, 115.0, 125.0, 130.0, 120.0, 120.0, 130.0, 135.0, 110.0,
     130.0, 150.0, 145.0, 150.0, 170.0, 225.0, 145.0, 188.0, 180.0,
     197.0, 218.0, 300.0, 260.0, 265.0, 250.0, 250.0, 300.0, 320.0,
     514.0, 556.0, 840.0, 685.0, 700.0, 700.0, 690.0, 900.0, 650.0,
     820.0, 850.0, 900.0, 1015.0, 820.0, 1100.0, 1000.0, 1100.0,
     1000.0, 1000.0]
     )
Python
복사

from sklearn.model_selection import train_test_split

train_input, test_input, train_target, test_target = train_test_split(perch_full, perch_weight, random_state=42)
Python
복사

from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures()
poly.fit([[2, 3]])
print(poly.transform([[2, 3]]))
Python
복사

poly = PolynomialFeatures(include_bias=False)
poly.fit([[2, 3]])
print(poly.transform([[2, 3]]))
Python
복사

poly = PolynomialFeatures(include_bias=False)

poly.fit(train_input)
train_poly = poly.transform(train_input)

print(train_poly.shape)
Python
복사

poly.get_feature_names_out()
Python
복사

test_poly = poly.transform(test_input)
Python
복사

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(train_poly, train_target)
print(lr.score(train_poly, train_target))
Python
복사

print(lr.score(test_poly, test_target))
Python
복사

poly = PolynomialFeatures(degree=5, include_bias=False)

poly.fit(train_input)
train_poly = poly.transform(train_input)
test_poly = poly.transform(test_input)
Python
복사

print(train_poly.shape)
Python
복사

lr.fit(train_poly, train_target)
print(lr.score(train_poly, train_target))
print(lr.score(test_poly, test_target))
Python
복사

규제

from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
ss.fit(train_poly)

train_scaled = ss.transform(train_poly)
test_scaled = ss.transform(test_poly)
Python
복사

from sklearn.linear_model import Ridge

ridge = Ridge()
ridge.fit(train_scaled, train_target)
print(ridge.score(train_scaled, train_target))
print(ridge.score(test_scaled, test_target))
Python
복사

from sklearn.linear_model import Lasso

lasso = Lasso()
lasso.fit(train_scaled, train_target)
print(lasso.score(train_scaled, train_target))
print(lasso.score(test_scaled, test_target))
Python
복사

위 내용까지는 수업시간에 실습한 내용입니다.

아래부터는 과제에 해당하므로, 이어서 셀을 추가하여, 코드를 직접 작성해주세요.

과제

# degree=15로 설정하여, 다중 회귀 모델을 학습해주세요.
# 이후, train/test score을 출력하여 과적합 여부를 확인해주세요.

Python
복사

# StandardScaler로 데이터를 스케일링한 후 Ridge 또는 Lasso 규제를 적용해주세요.
# 이후, 규제 후의 train/test score을 출력하여 과적합이 완화되었는지 확인해주세요.

Python
복사

과제 명세서 - wil.md

•

3주차 수업을 통해 배운 점 혹은 느낀 점을 작성해주세요.

마감 기한

04월 09일 (목) 23:59까지

제출 방법

자신의 레포지토리에 weekn 폴더를 생성해 .ipynb파일과 wil.md 파일을 제출합니다. 더 자세한 사항은 아래 링크를 참조해주세요.

와우클래스 가이드라인