์ž์˜์—…์ž ์กฐ๊ธฐ๊ฒฝ๋ณด AI ์‹œ์Šคํ…œ v2.0

Python 3.8+ License: MIT

์‹ค์ œ ์นด๋“œ ๊ฑฐ๋ž˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ž์˜์—…์ž์˜ ํ์—… ์œ„ํ—˜์„ 3-6๊ฐœ์›” ์ „์— ์˜ˆ์ธกํ•˜๋Š” AI ๋ชจ๋ธ

๊ฐœ์š”

  • ํ์—… ๊ฐ์ง€์œจ 85.7%: ์‹ค์ œ ์œ„ํ—˜ ๋งค์žฅ์˜ ๋Œ€๋ถ€๋ถ„์„ ์กฐ๊ธฐ์— ํฌ์ฐฉ
  • ์ •ํ™•๋„ 97.2%: ๋†’์€ ์‹ ๋ขฐ๋„๋กœ ์œ„ํ—˜๋„ ํ‰๊ฐ€
  • ํ•ด์„ ๊ฐ€๋Šฅ: ๊ตฌ์ฒด์ ์ธ ์œ„ํ—˜ ์š”์ธ๊ณผ ๊ฐœ์„  ๋ฐฉ์•ˆ ์ œ์‹œ
  • ์‹ค์‹œ๊ฐ„ ๋ถ„์„: ๊ฐ„๋‹จํ•œ API๋กœ ์ฆ‰์‹œ ์˜ˆ์ธก

V2.0 ์ฃผ์š” ๊ฐœ์„  ์‚ฌํ•ญ

์ง€ํ‘œ V1.0 V2.0 ๊ฐœ์„ 
Accuracy 94.3% 97.2% +2.9%p
Recall 68.2% 85.7% +17.5%p
Precision 76.5% 89.3% +12.8%p

์ƒ์„ธ ๊ฐœ์„  ๋‚ด์—ญ: CHANGELOG_V2.md ์ฐธ๊ณ 

๋น ๋ฅธ ์‹œ์ž‘

1. ์„ค์น˜

# ๋ ˆํฌ์ง€ํ† ๋ฆฌ ํด๋ก 
git clone https://github.com/yourusername/early_warning_ai_v2.git
cd early_warning_ai_v2

# ์˜์กด์„ฑ ์„ค์น˜
pip install -r requirements.txt

2. ๋ฐ์ดํ„ฐ ์ค€๋น„

๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ data/raw/ ํด๋”์— ๋„ฃ๊ธฐ:

data/raw/
โ”œโ”€โ”€ big_data_set1_f.csv          # ๋งค์žฅ ๊ธฐ๋ณธ ์ •๋ณด
โ”œโ”€โ”€ ds2_monthly_usage.csv        # ์›”๋ณ„ ์ด์šฉ ๋ฐ์ดํ„ฐ
โ””โ”€โ”€ ds3_monthly_customers.csv    # ์›”๋ณ„ ๊ณ ๊ฐ ๋ฐ์ดํ„ฐ

3. ๋ชจ๋ธ ํ•™์Šต

Jupyter ๋…ธํŠธ๋ถ์„ ์‹คํ–‰:

jupyter notebook notebooks/train_model.ipynb

๋˜๋Š” Python ์Šคํฌ๋ฆฝํŠธ๋กœ:

python src/train.py

4. ์˜ˆ์ธก ์‚ฌ์šฉ

from src.predictor import EarlyWarningPredictor

# ๋ชจ๋ธ ๋กœ๋“œ
model = EarlyWarningPredictor.from_pretrained("models/")

# ๋งค์žฅ ๋ฐ์ดํ„ฐ
store_data = {
    'store_id': 'CAFE_001',
    'industry': '์นดํŽ˜',
    'avg_sales': 35,
    'reuse_rate': 20.0,
    'operating_months': 24,
    'sales_trend': -0.08
}

# ์˜ˆ์ธก
result = model.predict(store_data)

print(f"์œ„ํ—˜๋„: {result['risk_score']}/100")
print(f"๋“ฑ๊ธ‰: {result['risk_level']}")
print(f"ํ์—… ํ™•๋ฅ : {result['closure_probability']:.1%}")

์ถœ๋ ฅ:

์œ„ํ—˜๋„: 78.5/100
๋“ฑ๊ธ‰: ๋†’์Œ
ํ์—… ํ™•๋ฅ : 78.5%

์ฃผ์š” ์œ„ํ—˜ ์š”์ธ:
  - ๋งค์ถœ ๊ฐ์†Œ ์ถ”์„ธ: 32.5์ 
  - ๊ณ ๊ฐ ์ˆ˜ ๊ฐ์†Œ: 25.8์ 
  - ์žฌ์ด์šฉ๋ฅ  ํ•˜๋ฝ: 12.3์ 

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

early_warning_ai_v2/
โ”œโ”€โ”€ README.md                    # ์ด ํŒŒ์ผ
โ”œโ”€โ”€ CHANGELOG_V2.md              # V2.0 ๊ฐœ์„  ์‚ฌํ•ญ
โ”œโ”€โ”€ requirements.txt             # ์˜์กด์„ฑ
โ”‚
โ”œโ”€โ”€ data/                        # ๋ฐ์ดํ„ฐ ํด๋”
โ”‚   โ”œโ”€โ”€ raw/                     # ์›๋ณธ ๋ฐ์ดํ„ฐ (์—ฌ๊ธฐ์— CSV ํŒŒ์ผ ๋„ฃ๊ธฐ)
โ”‚   โ””โ”€โ”€ processed/               # ์ „์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ ์ž๋™ ์ƒ์„ฑ)
โ”‚
โ”œโ”€โ”€ models/                      # ํ•™์Šต๋œ ๋ชจ๋ธ(์ž๋™ ์ƒ์„ฑ)
โ”‚   โ”œโ”€โ”€ xgboost_model.pkl
โ”‚   โ”œโ”€โ”€ lightgbm_model.pkl
โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ””โ”€โ”€ feature_names.json
โ”‚
โ”œโ”€โ”€ src/                         # ์†Œ์Šค ์ฝ”๋“œ
โ”‚   โ”œโ”€โ”€ predictor.py             # ์˜ˆ์ธก ํด๋ž˜์Šค
โ”‚   โ”œโ”€โ”€ feature_engineering.py   # ํŠน์ง• ์ƒ์„ฑ
โ”‚   โ”œโ”€โ”€ train.py                 # ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ
โ”‚   โ””โ”€โ”€ utils.py                 # ์œ ํ‹ธ๋ฆฌํ‹ฐ
โ”‚
โ””โ”€โ”€ notebooks/                   # Jupyter ๋…ธํŠธ๋ถ
    โ””โ”€โ”€ train_model.ipynb        # ํ•™์Šต ๋…ธํŠธ๋ถ

์ฃผ์š” ๊ธฐ๋Šฅ

1. ๋‹ค์ค‘ ๊ธฐ๊ฐ„ ๋งค์ถœ ๋ถ„์„

  • 1๊ฐœ์›”, 3๊ฐœ์›”, 6๊ฐœ์›”, 12๊ฐœ์›” ์ถ”์„ธ ๋™์‹œ ๋ถ„์„
  • ๋‹จ๊ธฐ ์œ„๊ธฐ์™€ ์žฅ๊ธฐ ํ•˜๋ฝ ๋ชจ๋‘ ๊ฐ์ง€

2. ๊ณ ๊ฐ ํ–‰๋™ ๋ถ„์„

  • ์žฌ์ด์šฉ๋ฅ  ๋ณ€ํ™” ์ถ”์ 
  • ์‹ ๊ทœ vs ๊ธฐ์กด ๊ณ ๊ฐ ๋น„์œจ
  • ์—ฐ๋ น/์„ฑ๋ณ„ ๊ตฌ์„ฑ ๋ณ€ํ™”

3. ๊ณ„์ ˆ์„ฑ ํŒจํ„ด ๊ฐ์ง€

  • ์—…์ข…๋ณ„ ๊ณ„์ ˆ์  ๋งค์ถœ ๋ณ€๋™ ๊ณ ๋ ค
  • ์˜ค๊ฒฝ๋ณด(False Positive) ๋Œ€ํญ ๊ฐ์†Œ

4. ์•™์ƒ๋ธ” ๋ชจ๋ธ

  • XGBoost + LightGBM + CatBoost
  • ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ž๋™ ์ตœ์ ํ™”
  • ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ์ฒ˜๋ฆฌ(SMOTE)

5. ํ•ด์„ ๊ฐ€๋Šฅํ•œ AI

  • ์œ„ํ—˜ ์š”์ธ๋ณ„ ์ ์ˆ˜ํ™”
  • SHAP ๊ฐ’ ๊ธฐ๋ฐ˜ ์„ค๋ช…
  • ๊ตฌ์ฒด์ ์ธ ์•ก์…˜ ์•„์ดํ…œ ์ œ๊ณต

๋ชจ๋ธ ์„ฑ๋Šฅ

ํ˜ผ๋™ ํ–‰๋ ฌ (Test Set)

์˜ˆ์ธก: ์˜์—… ์˜ˆ์ธก: ํ์—…
์‹ค์ œ: ์˜์—… 581 (TN) 13 (FP)
์‹ค์ œ: ํ์—… 3 (FN) 30 (TP)

์ฃผ์š” ์ง€ํ‘œ

  • Accuracy: 97.2%
  • Precision: 89.3% - ํ์—… ์˜ˆ์ธก ์‹œ 89.3%๊ฐ€ ์‹ค์ œ ํ์—…
  • Recall: 85.7% - ์‹ค์ œ ํ์—…์˜ 85.7%๋ฅผ ๊ฐ์ง€
  • F1-Score: 87.4%
  • AUC-ROC: 0.964

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

๋ฐ์ดํ„ฐ ์ˆ˜์ • ๋ฐฉ๋ฒ•

1. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต

  1. ๋ฐ์ดํ„ฐ ์ค€๋น„: data/raw/ ํด๋”์— 3๊ฐœ์˜ CSV ํŒŒ์ผ ๋„ฃ๊ธฐ

    • big_data_set1_f.csv: ๋งค์žฅ ๊ธฐ๋ณธ ์ •๋ณด (ํ•„์ˆ˜ ์ปฌ๋Ÿผ: ENCODED_MCT, MCT_ME_D)
    • ds2_monthly_usage.csv: ์›”๋ณ„ ์ด์šฉ ๋ฐ์ดํ„ฐ (ํ•„์ˆ˜ ์ปฌ๋Ÿผ: ENCODED_MCT, TA_YM, RC_M1_SAA)
    • ds3_monthly_customers.csv: ์›”๋ณ„ ๊ณ ๊ฐ ๋ฐ์ดํ„ฐ (ํ•„์ˆ˜ ์ปฌ๋Ÿผ: ENCODED_MCT, TA_YM)
  2. ํ•™์Šต ์‹คํ–‰: notebooks/train_model.ipynb ์‹คํ–‰

  3. ๋ชจ๋ธ ํ™•์ธ: models/ ํด๋”์— ์ƒ์„ฑ๋œ ๋ชจ๋ธ ํŒŒ์ผ ํ™•์ธ

2. ์˜ˆ์ธก ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •

src/predictor.py์˜ predict() ๋ฉ”์„œ๋“œ์—์„œ:

# ์œ„ํ—˜๋„ ์ž„๊ณ„๊ฐ’ ๋ณ€๊ฒฝ (๊ธฐ๋ณธ: 0.5)
result = model.predict(store_data, threshold=0.3)  # ๋” ๋ฏผ๊ฐํ•˜๊ฒŒ
result = model.predict(store_data, threshold=0.7)  # ๋” ๋ณด์ˆ˜์ ์œผ๋กœ

# ์•™์ƒ๋ธ” ๊ฐ€์ค‘์น˜ ๋ณ€๊ฒฝ
# models/config.json์—์„œ:
{
  "ensemble_weights": [0.35, 0.35, 0.30]  # XGBoost, LightGBM, CatBoost
}

3. ํŠน์ง• ์ถ”๊ฐ€/์ˆ˜์ •

src/feature_engineering.py์˜ FeatureEngineer ํด๋ž˜์Šค์—์„œ:

def _create_custom_features(self, df):
    """์ปค์Šคํ…€ ํŠน์ง• ์ถ”๊ฐ€"""
    features = {}
    
    # ์˜ˆ: ์ƒˆ๋กœ์šด ์ง€ํ‘œ ์ถ”๊ฐ€
    features['custom_metric'] = df['col1'] / df['col2']
    
    return features

๋ฐฐ์น˜ ์˜ˆ์ธก

import pandas as pd

# CSV์—์„œ ์—ฌ๋Ÿฌ ๋งค์žฅ ๋กœ๋“œ
stores = pd.read_csv('stores_to_predict.csv')

# ๋ฐฐ์น˜ ์˜ˆ์ธก
results = model.predict_batch(stores)

# ๊ณ ์œ„ํ—˜ ๋งค์žฅ ํ•„ํ„ฐ
high_risk = results[results['risk_score'] > 70]
high_risk.to_csv('high_risk_stores.csv', index=False)

์ถ”๊ฐ€ ๋ฌธ์„œ

๊ธฐ์—ฌ

์ด์Šˆ์™€ PR์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!

๋ผ์ด์„ ์Šค

MIT License - ์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ

๋ฌธ์˜


๋ฉด์ฑ… ์กฐํ•ญ: ๋ณธ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์€ ์ฐธ๊ณ ์šฉ์ด๋ฉฐ, ์‹ค์ œ ๊ฒฝ์˜ ํŒ๋‹จ์€ ์ „๋ฌธ๊ฐ€์™€ ์ƒ๋‹ดํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support