AI/Machine Learning

[Machine Learning] 파이프라인

byunghyun23 2022. 9. 13. 16:57

머신러닝 모델을 사용할 때, 사이킷런에서는 파이프라인(Pipeline)을 제공합니다.
파이프라인은 데이터 전처리와 모델 예측 등 머신러닝 과정을 순차적으로 처리할 수 있도록 합니다.
파이프라인은 코드를 간결하게 작성할 수 있고, 가시성을 높여줍니다.

 

from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

raw_boston = datasets.load_boston()

X = raw_boston.data
y = raw_boston.target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)

std_scaler = StandardScaler()
X_train = std_scaler.fit_transform(X_train)
X_test = std_scaler.transform(X_test)

rfg = RandomForestRegressor()
rfg.fit(X_train, y_train)

y_pred = rfg.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
print('MAE', mae)		# MAE 2.31131496062992

위 코드를 sklearn.pipeline의 Pipeline을 이용하여 변경해 보겠습니다.

from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

raw_boston = datasets.load_boston()

X = raw_boston.data
y = raw_boston.target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)

rfg_pipline = Pipeline([
    ('scaler', StandardScaler()),
    ('RFG', RandomForestRegressor())
])

rfg_pipline.fit(X_train, y_train)

y_pred = rfg_pipline.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
print('MAE', mae)		# MAE 2.354173228346456