๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/69

 

[๋จธ์‹ ๋Ÿฌ๋‹] ๋ณ€์ˆ˜์ค‘์š”๋„, shap value

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/67 https://silvercoding.tistory.com/66 https://silvercoding.tistory.com/65 https://silvercoding.tistory.com/64 https://silvercoding...

silvercoding.tistory.com

 

 


Menchester United ํŒ€์—์„œ 2013๋…„ Alex Ferguson ๊ฐ๋…์ด ์€ํ‡ด๋ฅผ ํ•˜๊ณ , ํ•˜๋ฝ์„ธ๋ฅผ ํƒ€๋‹ค๊ฐ€ ์†”์ƒค๋ฅด ๊ฐ๋…์ด ํŒ€์„ ๋งก๊ฒŒ๋˜์—ˆ์„ ๋•Œ 2020๋…„ 3์›” ๊ธฐ์ค€ 2019/2020 ์‹œ์ฆŒ ๊ฒจ์šธ ์‹œ์žฅ์—์„œ ๋‘๋ช…์˜ ์„ ์ˆ˜๋ฅผ ์˜์ž…ํ•˜์—ฌ ํ•˜๋ฝ์„ธ๋ฅผ ๋ฐ˜์ „์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

์ด๋ฅผ ์„ ์ˆ˜๋“ค์˜ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ํ†ตํ•ด ๋ฐฉ์ถœ๊ณผ ์˜์ž…์„ ๊ฒฐ์ •ํ•œ๋‹ค๋ฉด, ์–ด๋–ค ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ๊นŒ? 


 

 

๋ฐ์ดํ„ฐ : FIFA ๋ฐ์ดํ„ฐ (๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ๊ฐ•์˜ ์ œ๊ณต)


1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

import pandas as pd
import warnings 

warnings.filterwarnings(action='ignore')  # ๊ฒฝ๊ณ ๋ฌธ ์ œ๊ฑฐ
data = pd.read_csv("./data/FIFA_data.csv")
pd.set_option('display.max_columns', 80)

column์ด ๋งŽ์œผ๋ฉด ... ์œผ๋กœ ์ƒ๋žต๋˜์–ด์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์˜ ์ปฌ๋Ÿผ ์ˆ˜์ธ 80๊ฐœ๋กœ ์„ค์ •ํ•ด์ค€๋‹ค. 

data.head()

๋ชจ๋“  ์ปฌ๋Ÿผ์„ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

2. ๋ฐ์ดํ„ฐ ํ™•์ธ, ๋ถ„์„๊ณ„ํš 

์ปฌ๋Ÿผ ๋ณ„ ์˜๋ฏธ ํ™•์ธ 

ID ๊ณ ์œ ์˜ ๋ฒˆํ˜ธ
Name ์ด๋ฆ„
Age ๋‚˜์ด
Overall ํ˜„์žฌ ๋Šฅ๋ ฅ์น˜
Potential ์ž ์žฌ ๋Šฅ๋ ฅ์น˜
Club ์†Œ์† ํŒ€
Value ์˜ˆ์ƒ ์ด์ ๋ฃŒ (์œ ๋กœ)
Wage ์ฃผ๊ธ‰ (์œ ๋กœ)
Preferred Foot ์ž˜ ์‚ฌ์šฉํ•˜๋Š” ๋ฐœ
Weak Foot ์ž˜ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๋ฐœ
Skill Moves ๊ฐœ์ธ๊ธฐ
Position ํฌ์ง€์…˜
Jersey Number ๋“ฑ๋ฒˆํ˜ธ
Joined ์†Œ์† ํŒ€ ์ž…๋‹จ ๋‚ ์งœ
Contract Valid Until ๊ณ„์•ฝ ๊ธฐ๊ฐ„
Height ํ‚ค (ํ”ผํŠธ)
Weight ๋ชธ๋ฌด๊ฒŒ (ํŒŒ์šด๋“œ)
LS ~ RB ํฌ์ง€์…˜ ๋ณ„ ๋Šฅ๋ ฅ์น˜
Crossing ~ GKReflexes ์„ธ๋ถ€ ๋Šฅ๋ ฅ์น˜
Release Clause ๋ฐ”์ด์•„์›ƒ

 

๋ถ„์„ ์ ˆ์ฐจ ์ˆ˜๋ฆฝ 

1. Manchester United ์„ ์ˆ˜ ๋ถ„์„ (์–ด๋–ค ์„ ์ˆ˜๋“ค์ด ์กด์žฌํ•˜๋Š”๊ฐ€?) 

2. Manchester United ์ง€์—ญ๋ผ์ด๋ฒŒ Manchester City ์„ ์ˆ˜๋“ค๊ณผ ๋น„๊ต ๋ถ„์„ 

3. ๋ถ€์กฑํ•œ ํฌ์ง€์…˜ 2๊ฐ€์ง€ ์„ ํƒ 

4. ๋‹ค๋ฅธํŒ€์˜ ์„ ์ˆ˜๋“ค ์ค‘ 2๋ช…์˜ ์˜์ž… ์„ ์ˆ˜ ์„ ํƒ (์žฌ์ •, ํ˜„์‹ค๊ฐ€๋Šฅ์„ฑ, ์˜์ž…๋ฐฉ์นจ ๊ณ ๋ ค

 

 

 

 

 


3. Manchester United ์„ ์ˆ˜๋“ค ๋ถ„์„ 

(1) EDA 

- ๋งจ์œ  ์„ ์ˆ˜ ์ถ”์ถœ

mu = data[data['Club'] == 'Manchester United']
mu.head()

Club์ด Manchester United์ธ ํ–‰๋งŒ ๋ฝ‘์•„ mu์— ์ €์žฅํ•ด์ค€๋‹ค.  

mu['Club'].unique()

unique() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ™•์ธํ•ด ๋ณด๋‹ˆ ๋งจ์œ ๋งŒ ์ž˜ ๋ฝ‘ํžŒ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

- ๋งจ์œ  ์„ ์ˆ˜๋“ค ๊ฐ„๋žตํ•œ ์ •๋ณด ์ถœ๋ ฅ 

print(f"์ธ์›: {mu.shape[0]}")
print(f"๋งจ์œ  ์„ ์ˆ˜๋“ค์˜ ํฌ์ง€์…˜: {mu['Position'].unique()}")
print(f"ํ‰๊ท  ๋Šฅ๋ ฅ์น˜: {mu['Overall'].mean()}")
print(f"ํ‰๊ท  ์ž ์žฌ ๋Šฅ๋ ฅ์น˜: {mu['Potential'].mean()}")

 

 

- ์‹œ๊ฐํ™” 

import seaborn as sns 
sns.countplot(mu['Age'])

์„ ์ˆ˜๋“ค์˜ ๋‚˜์ด ๋ถ„ํฌ์ด๋‹ค. 19์‚ด์ด ๊ฐ€์žฅ ๋งŽ๊ณ , ๊ทธ๋‹ค์Œ์œผ๋ก  25์‚ด, 28์‚ด, 22์‚ด์ธ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

sns.countplot(mu['Position'])

ใ…

์„ ์ˆ˜๋“ค์˜ ํฌ์ง€์…˜ ์ค‘ ๊ฐ€์žฅ ๋งŽ์€ ๊ฒƒ์€ CM, CB ์ด๋‹ค. 

sns.boxplot(data=mu, x='Position', y='Overall')

Position๋ณ„ ๋Šฅ๋ ฅ์น˜ boxplot ์„ ๊ทธ๋ ค๋ณด์•˜๋”๋‹ˆ CB ํฌ์ง€์…˜์—์„œ ์ด์ƒ์น˜๊ฐ€ ๋ฐœ๊ฒฌ๋˜์—ˆ๋‹ค. 

 

 

* ์ด์ƒ์น˜ & ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ 


์ด์ƒ์น˜

  • ์ •์ƒ ๋ฒ”์ฃผ์—์„œ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚œ ๊ฐ’
  • ์ด์ƒ์น˜๋ฅผ ํฌํ•จํ•˜์—ฌ ๋ถ„์„์„ ์ง„ํ–‰ํ•  ๊ฒฝ์šฐ ๋ถ„์„ ๊ฒฐ๊ณผ๊ฐ€ ์™œ๊ณก๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Œ 

๊ฒฐ์ธก์น˜

  • ๋ˆ„๋ฝ๊ฐ’, ๋น„์–ด์žˆ๋Š” ๊ฐ’ 
  • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋‹น์‹œ ๊ธฐ๋ก๋˜์ง€ ์•Š์•˜๊ฑฐ๋‚˜, ๋ˆ„๋ฝ๋œ ๊ฐ’

์ด์ƒ์น˜์™€ ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ๋ฒ•

  • ์ œ๊ฑฐ: ์ด์ƒ์น˜ ๋ฐ ๊ฒฐ์ธก์น˜๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋Š” ํ–‰, ํ˜น์€ ์—ด์„ ์ œ๊ฑฐํ•œ๋‹ค. (์ตœํ›„์˜ ์ˆ˜๋‹จ, ๋ฐ์ดํ„ฐ ํ•˜๋‚˜ํ•˜๋‚˜๊ฐ€ ์†Œ์ค‘ํ•˜๊ธฐ ๋•Œ๋ฌธ) 
  • ๋Œ€์ฒด: ์ด์ƒ์น˜ ๋ฐ ๊ฒฐ์ธก์น˜๋ฅผ ํ•ด๋‹น ์ปฌ๋Ÿผ์˜ ์ตœ๋Œ“๊ฐ’, ํ‰๊ท ๊ฐ’, ์ค‘์•™๊ฐ’ ๋“ฑ์œผ๋กœ ๋Œ€์ฒด (์ถ”์ฒœํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์•„๋‹˜.)
  • ์˜ˆ์ธก: ์ด์ƒ์น˜ ๋ฐ ๊ฒฐ์ธก์น˜๊ฐ€ ํฌํ•จ๋œ ์ปฌ๋Ÿผ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ์ฑ„์›Œ ๋„ฃ์Œ (์ถ”์ฒœ) 

mu[mu['Overall']>100]

๋Šฅ๋ ฅ์น˜๊ฐ€ 100์ด์ƒ์ธ row๋ฅผ ํ™•์ธํ•ด ๋ณธ๋‹ค. 

 

 

์ด์ƒ์น˜ ์ฒ˜๋ฆฌ - ์˜ˆ์ธก ์‚ฌ์šฉ 

mu[mu['Position'] == 'CB'][['Position', 'Overall', 'CB']]

๊ฐ™์€ ํฌ์ง€์…˜ ์„ ์ˆ˜๋“ค๋ผ๋ฆฌ ๋น„๊ต๋ฅผ ํ•ด๋ณธ๋‹ค. CB๊ฐ€ ๋น„์Šทํ•œ ์„ ์ˆ˜๋“ค๋ผ๋ฆฌ์˜ ๋Šฅ๋ ฅ์น˜๊ฐ€ ๊ฐ™์€ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ด์ƒ์น˜๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š” ์„ ์ˆ˜๋Š” 11081 ๋ฒˆ์งธ ์„ ์ˆ˜์™€ CB๊ฐ€ ๊ฐ™์œผ๋ฏ€๋กœ 75๋กœ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค. 

mu['Overall'][11422] = 75

11422 ๋ฒˆ์งธ ์„ ์ˆ˜์˜ ๋Šฅ๋ ฅ์น˜๋ฅผ 75๋กœ ๋ฐ”๊พธ์–ด์ค€๋‹ค. 

sns.boxplot(data=mu, x='Position', y='Overall')

๋‹ค์‹œ boxplot์„ ๊ทธ๋ ค๋ณด๋‹ˆ ์ด์ƒ์น˜ ์—†์ด ๊ทธ๋ ค์ง„ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

sns.boxplot(data=mu, x='Position', y='Potential')

potential์— ๋Œ€ํ•œ boxplot๋„ ๊ทธ๋ ค์ค€๋‹ค. potential์—๋Š” ์ด์ƒ์น˜๊ฐ€ ๋‚˜์˜ค์ง€ ์•Š์•˜๋‹ค. 

 

 

 

mu.info()

mu๋Š” ์ด 33๊ฐœ์˜ row์ธ๋ฐ, 19~44 ๋ฒˆ์งธ ์ปฌ๋Ÿผ์— 3๊ฐœ์˜ ๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” ๊ฒƒ์ด ํ™•์ธ๋˜์—ˆ๋‹ค. 

mu[mu.isnull()['LS']]

ํฌ์ง€์…˜์ด GK์ธ ์„ ์ˆ˜๋“ค๋งŒ ๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค. GK๋Š” ๊ณจํ‚คํผ์ด๊ณ , ๊ณจํ‚คํผ๋Š” ๋‹ค๋ฅธ ํฌ์ง€์…˜์— ๋Œ€ํ•œ ๋Šฅ๋ ฅ์น˜๋ฅผ ๋ถ€์—ฌํ•  ํ•„์š”๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฐ์ธก๊ฐ’์œผ๋กœ ๋‘” ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. 

mu = mu.fillna(-1)

๊ฒฐ์ธก๊ฐ’์„ -1๋กœ ์ฑ„์›Œ์ค€๋‹ค. (๊ฐ’์„ ์ธก์ •ํ•  ์ˆ˜ ์—†๋‹ค๋Š” ์˜๋ฏธ์—์„œ ์ž„์˜์˜ ๊ฐ’ -1, ๋‹ค๋ฅธ๊ฐ’์„ ๋„ฃ์–ด์ฃผ์–ด๋„ ๋จ) 

mu.info()

๊ฒฐ์ธก๊ฐ’์ด ๋ชจ๋‘ ์ฑ„์›Œ์กŒ๋‹ค. 

 

 

 

 

 


4. Manchester United vs Manchester City 

(1) ์ „์ฒ˜๋ฆฌ 

df = data[(data['Club'] == 'Manchester United') | (data['Club']=='Manchester City')]

Manchester United์™€ Manchester City๋งŒ ๋ฝ‘์•„ df ์— ์ €์žฅํ•ด์ค€๋‹ค. 

df['Club'].unique()

df['Value'].head()

์ด์ ๋ฃŒ Value๊ฐ€ ๊ธฐํ˜ธ๋กœ ์จ์ ธ์žˆ์œผ๋ฏ€๋กœ, ๊ธฐํ˜ธ ์‚ญ์ œ, ์†Œ์ˆ˜์  ์‚ญ์ œ๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค. 

df['Value'] = df['Value'].str.replace('M', '000000')
df['Value'] = df['Value'].str.replace('K', '000')

M์ด ์จ์ ธ์žˆ์œผ๋ฉด 0์„ 6๊ฐœ, K๊ฐ€ ์จ์ ธ์žˆ์œผ๋ฉด 0์„ 3๊ฐœ ๋ถ™์—ฌ ์ค€๋‹ค. 

df['Value']

df['Value'] = df['Value'].str.slice(1,)

๊ทธ๋‹ค์Œ str.slice๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ธฐํ˜ธ๋ฅผ ์—†์• ์ค€๋‹ค. 

df['Value'].iloc[3]

'64.5000000'

์ด๋ ‡๊ฒŒ ์†Œ์ˆ˜์ ์ด ์žˆ๋Š” ๊ฒƒ์ด ์กด์žฌํ•˜๋ฏ€๋กœ, ์ ์„ ์—†์• ๊ณ  ๋’ค์˜ 0์„ ํ•˜๋‚˜ ์‚ญ์ œํ•œ๋‹ค. 

for i in df["Value"]:
    if '.' in i:
        df['Value'] = df['Value'].str.replace('.', '')
        df['Value'] = df['Value'].str.slice(0,-1)
df['Value']

์ ์šฉ์ด ์ž˜ ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

df['Value'] = df['Value'].astype('int')

์ด์ œ ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ object -> int๋กœ ๋ฐ”๊ฟ”์ค€๋‹ค. 

df.head()

 

 

 

- mu, mc ์„ ์ˆ˜ ๋ถ„๋ฆฌ 

mu = df[df['Club'] == "Manchester United"]
mc = df[df['Club'] == "Manchester City"]

df์—์„œ Manchester United, Manchester City ์„ ์ˆ˜๋“ค์„ ๋ถ„๋ฆฌํ•ด ์ค€๋‹ค. 

mc.head()

df['Position'].unique()

์œ„์˜ ํฌ์ง€์…˜์„ ๊ณจ๊ธฐํผ, ์ˆ˜๋น„์ˆ˜, ๋ฏธ๋“œํ•„๋”, ๊ณต๊ฒฉ์ˆ˜, ์ด 4๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜ํ•˜์—ฌ ๋ถ„์„์„ ์ง„ํ–‰ํ•œ๋‹ค. ํฌ์ง€์…˜์„ ๋‚˜๋ˆ„๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 


  • ๊ณจํ‚คํผ ๋ฆฌ์ŠคํŠธ GK= GK (๊ณจํ‚คํผ)
  • ์ˆ˜๋น„์ˆ˜ ๋ฆฌ์ŠคํŠธ CB = CB(์ค‘์•™ ์ˆ˜๋น„์ˆ˜), LB(์™ผ์ชฝ ์ˆ˜๋น„์ˆ˜), RB(์˜ค๋ฅธ์ชฝ ์ˆ˜๋น„์ˆ˜), RCB(์˜ค๋ฅธ์ชฝ/์ค‘์•™ ์ˆ˜๋น„์ˆ˜), LCB(์™ผ์ชฝ/์ค‘์•™ ์ˆ˜๋น„์ˆ˜) 
  • ๋ฏธ๋“œํ•„๋” ๋ฆฌ์ŠคํŠธ MF = RCM(์˜ค๋ฅธ์ชฝ/์ค‘์•™ ๋ฏธ๋“œํ•„๋”), LCM(์™ผ์ชฝ/์ค‘์•™ ๋ฏธ๋“œํ•„๋”), RDM(์˜ค๋ฅธ์ชฝ ์ˆ˜๋น„ํ˜• ๋ฏธ๋“œํ•„๋”), CDM(์ค‘์•™ ์ˆ˜๋น„ํ˜• ๋ฏธ๋“œํ•„๋”), CM(์ค‘์•™ ๋ฏธ๋“œํ•„๋”), RM(์˜ค๋ฅธ์ชฝ ๋ฏธ๋“œํ•„๋”), CAM(์ค‘์•™ ๊ณต๊ฒฉํ˜• ๋ฏธ๋“œํ•„๋”)
  • ๊ณต๊ฒฉ์ˆ˜ ๋ฆฌ์ŠคํŠธ ST = ST(์ „๋ฐฉ ๊ณต๊ฒฉ์ˆ˜), LW(์™ผ์ชฝ ๊ณต๊ฒฉ์ˆ˜), RW(์˜ค๋ฅธ์ชฝ ๊ณต๊ฒฉ์ˆ˜)

* GK(๊ณต๊ฒฉ์ˆ˜) : 1๋ช…, CB(์ˆ˜๋น„์ˆ˜) : 4๋ช…, MF(๋ฏธ๋“œํ•„๋”) : 4๋ช…, ST(๊ณต๊ฒฉ์ˆ˜) : 2๋ช… ์„ ๋ฐœ

-> ์„ ๋ฐœ์˜ ๊ธฐ์ค€์€ ํ˜„์žฌ๋Šฅ๋ ฅ์น˜(Overall ์ปฌ๋Ÿผ)

 

gk_list = ['GK']
cb_list = ['CB', 'LCB', 'RCB', 'RB', 'LB']
mf_list = ['RCM', 'LCM', 'RDM', 'CDM', 'CM', 'RM', 'CAM']
st_list = ['ST', 'LW', 'RW']

ํฌ์ง€์…˜์„ ๋ถ„๋ฅ˜ํ•œ๋Œ€๋กœ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ž‘์„ฑํ•ด์ค€๋‹ค. 

 

gk_count = 1
cb_count = 4
mf_count = 4
st_count = 2



mu_id = []

for index in mu.index:
    if mu['Position'][index] in gk_list: 
        if gk_count != 0:
            mu_id.append(mu['ID'][index])
            gk_count -= 1 
    elif mu['Position'][index] in cb_list:
        if cb_count != 0:
            mu['Position'][index] = 'CB'
            mu_id.append(mu['ID'][index])
            cb_count -= 1 
    elif mu['Position'][index] in mf_list:
        if mf_count != 0:
            mu['Position'][index] = 'MF'
            mu_id.append(mu['ID'][index])
            mf_count -= 1 
    else:
        if st_count != 0:
            mu['Position'][index] = 'ST'
            mu_id.append(mu['ID'][index])
            st_count -= 1

ํ˜„์žฌ๋Šฅ๋ ฅ์น˜๊ฐ€ ๋†’์€ ์ˆœ์œผ๋กœ ์ •๋ ฌ๋˜์–ด์žˆ๋Š” ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ˆœ์„œ๋Œ€๋กœ ์ƒ์œ„ ํฌ์ง€์…˜ ์„ ์ˆ˜๋“ค์˜ ID ๊ฐ’์„ ๋ฆฌ์ŠคํŠธ์— ๋„ฃ์–ด์ค€๋‹ค. 

mu[mu['ID'].isin(mu_id)]

11๋ช…์˜ ์„ ์ˆ˜๊ฐ€ ์•Œ๋งž๊ฒŒ ๋‚˜์˜จ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

mu = mu[mu['ID'].isin(mu_id)]

์„ ๋ฐœ๋œ 11๋ช…์˜ ์„ ์ˆ˜๋“ค๋งŒ mu ๋ณ€์ˆ˜์— ๋„ฃ์–ด ์ค€๋‹ค. 

 

 

 

๊ฐ™์€ ์ ˆ์ฐจ๋กœ Manchester City ๋˜ํ•œ ์ง„ํ–‰ํ•œ๋‹ค. 

gk_count = 1
cb_count = 4
mf_count = 4
st_count = 2


mc_id = []

for index in mc.index:
    if mc['Position'][index] in gk_list: 
        if gk_count != 0:
            mc_id.append(mc['ID'][index])
            gk_count -= 1 
    elif mc['Position'][index] in cb_list:
        if cb_count != 0:
            mc['Position'][index] = 'CB'
            mc_id.append(mc['ID'][index])
            cb_count -= 1 
    elif mc['Position'][index] in mf_list:
        if mf_count != 0:
            mc['Position'][index] = 'MF'
            mc_id.append(mc['ID'][index])
            mf_count -= 1 
    else:
        if st_count != 0:
            mc['Position'][index] = 'ST'
            mc_id.append(mc['ID'][index])
            st_count -= 1
mc = mc[mc['ID'].isin(mc_id)]

 


concat vs merge

merge: ์ขŒ์šฐํ•ฉ๋ณ‘, concat: ์ƒํ•˜ํ•ฉ๋ณ‘


df = pd.concat([mu, mc])

์„ ๋ฐœ๋œ mu, mc ์„ ์ˆ˜๋“ค์„ ํ•ฉ์ณ df์— ์ €์žฅํ•ด์ค€๋‹ค. 

 

 

(2) EDA 

- mu vs mc ํฌ์ง€์…˜๋ณ„ ์ฃผ์ „์„ ์ˆ˜์˜ ํ˜„์žฌ๋Šฅ๋ ฅ์น˜(overall) ๋น„๊ต 

df = pd.concat([mu, mc])

๊ณจ๊ธฐํผ๋ฅผ ๋บ€ ํƒ€ ํฌ์ง€์…˜์€ ๋ชจ๋‘ Manchester United ํŒ€์ด ๋‚ฎ์€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

- mu vs mc ํฌ์ง€์…˜๋ณ„ ์ฃผ์ „์„ ์ˆ˜์˜ ์˜ˆ์ƒ์ด์ ๋ฃŒ(Value) ๋น„๊ต

sns.boxplot(data=df, x='Position', y='Value', hue='Club')

์ด์ ๋ฃŒ๋Š” ๊ณจ๊ธฐํผ๋ฅผ ๋นผ๊ณ  ๊ฑฐ์˜ ์ฐจ์ด๊ฐ€ ์—†๊ฑฐ๋‚˜ ๋” ๋†’์€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

 

์œ„์˜ boxplot์œผ๋กœ ๋‘ ํŒ€์„ ๋น„๊ตํ•ด๋ณด์•˜์„ ๋•Œ, ์ด์ ๋ฃŒ ๋Œ€๋น„ ๋Šฅ๋ ฅ์น˜๊ฐ€ ๋–จ์–ด์ง€๋Š” ํฌ์ง€์…˜์€ MF, CB๋กœ ํŒ๋‹จํ•˜์—ฌ ๋‘ ํฌ์ง€์…˜์— ๋Œ€ํ•ด ์–ด๋–ค ์„ ์ˆ˜๋ฅผ ์˜์ž…ํ• ์ง€ ๋ถ„์„์„ ํ•ด๋ณธ๋‹ค. 

 

 

 


5. Manchester United๋Š” ์–ด๋–ค ์„ ์ˆ˜๋ฅผ ์˜์ž…ํ•ด์•ผ ํ•˜๋Š”๊ฐ€? 

(1) EDA

* ๋ฐฉ์ถœ ์„ ์ˆ˜ ์„ ์ •

์˜์ž…์ผ, ๋Šฅ๋ ฅ์น˜, ์ž ์žฌ๋ ฅ, ๋‚˜์ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ณต์‹ ์„ธ์šฐ๊ธฐ 

 Point = (Overall * 2 + Potential) / Age 

๋Šฅ๋ ฅ์น˜(๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€ํ•จ)์™€ ์ž ์žฌ๋ ฅ์ด ๋†’์„ ์ˆ˜๋ก, ๋‚˜์ด๊ฐ€ ๋‚ฎ์„ ์ˆ˜๋ก ์ข‹์Œ. 

mu['Point'] = (mu['Overall'] * 2 + mu['Potential']) / mu['Age']

 

- MF ํฌ์ง€์…˜ 

mu[mu['Position'] == 'MF'][['Name', 'Overall', 'Potential', 'Age', 'Joined', 'Point']]

๊ฐ€์žฅ ๋‚ฎ์€ ํฌ์ธํŠธ๋Š” 211๋ฒˆ ์„ ์ˆ˜์ด๋‹ค.  

 

- CB ํฌ์ง€์…˜ 

mu[mu['Position'] == 'CB'][['Name', 'Overall', 'Potential', 'Age', 'Joined', 'Point']]

๊ฐ€์žฅ ๋‚ฎ์€ ํฌ์ธํŠธ๋Š” 377๋ฒˆ ์„ ์ˆ˜์ด๋‹ค. 

 

๋งˆํƒ€, ์Šค๋ชฐ๋ง ๋‘ ์„ ์ˆ˜๋ฅผ ๋ฐฉ์ถœํ•˜๊ณ  MF, CB ํฌ์ง€์…˜์„ ํ•œ๋ช…์”ฉ ์˜์ž…ํ•œ๋‹ค. 

 

 

(2) ์‹œ๊ฐํ™” 

์ „์ฒด ์„ ์ˆ˜ ์‹œ๊ฐํ™” - ์˜์ž…๋ฐฉ์นจ์— ๋”ฐ๋ฅธ ์˜์ž… ์„ ์ˆ˜ ๊ฒฐ์ • 


Manchester United ์˜์ž…๋ฐฉ์นจ (์†”์ƒค๋ฅด๊ฐ๋…) 

- ์„ ์ˆ˜์˜ ๋‚˜์ด๋Š” ์–ด๋ฆด ์ˆ˜๋ก ์ข‹์Œ

- ์ž ์žฌ๋ ฅ ๋ณด๋‹ค ํ˜„์žฌ ๋ฐ”๋กœ ์ฃผ์ „์œผ๋กœ ๋›ธ ์ˆ˜ ์žˆ๋Š” ์„ ์ˆ˜ 


market = data[(data['Position']=='RM') | (data['Position']=='CB')]

ํฌ์ง€์…˜์€ ๋ฐฉ์ถœ ์„ ์ •๋œ ๋‘์„ ์ˆ˜์˜ ์„ธ๋ถ€ ํฌ์ง€์…˜์ธ RM, CB๋ฅผ ์„ ํƒํ•œ๋‹ค. 

market.head()

import matplotlib.pyplot as plt
f, ax = plt.subplots(2, 4, figsize=(20, 10))

vs_list = ['Age', 'Overall', 'Potential', 'Weak Foot']

for i in range(8):
    if i < 4:
        colors = ['firebrick' if x > market[market['Position']=='CB'][:13][vs_list[i]].mean() else 'gray' for x in market[market['Position']=='CB'][:13][vs_list[i]]]
        sns.barplot(x=vs_list[i], y='Name', data=market[market['Position']=='CB'][:13], ax=ax[i//4, i%4], palette=colors)
        ax[i//4, i%4].axvline(market[market['Position']=='CB'][:13][vs_list[i]].mean(), ls = '--', color='k')
   
    else:
        colors = ['firebrick' if x > market[market['Position']=='RM'][:13][vs_list[i%4]].mean() else 'gray' for x in market[market['Position']=='RM'][:13][vs_list[i%4]]]        
        sns.barplot(x=vs_list[i%4], y='Name', data=market[market['Position']=='RM'][:13], ax=ax[i//4, i%4], palette=colors)        
        ax[i//4, i%4].axvline(market[market['Position']=='RM'][:13][vs_list[i%4]].mean(), ls='--', color='k')

๋ฐ์ดํ„ฐ ๋ถ„์„์œผ๋กœ ๋‹ค๋ฅธ ๊ฒƒ์„ ๋ฐฐ์ œํ•˜๊ณ  ๋‚˜์ด, ํ˜„์žฌ ๋Šฅ๋ ฅ์น˜, ์ž ์žฌ๋ ฅ์œผ๋กœ๋งŒ ๋”ฐ์ง„๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ, ์˜์ž…๋ฐฉ์นจ์— ๋”ฐ๋ผ ์˜์ž…์„ ๊ฒฐ์ •ํ•œ๋‹ค๋ฉด S. Umtiti, K. Mbappé ์„ ์ˆ˜๊ฐ€ ๋  ๊ฒƒ์ด๋ผ ํŒ๋‹จํ•˜์˜€๋‹ค. 

+ Recent posts