๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/54

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 1. best baseball player ๋ถ„์„

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/53 https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot, jointplot, pairplot, boxplot, swarm..

silvercoding.tistory.com

 

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
import pandas as pd
file  = './data/KBO_2019_player_gamestats.csv'
raw = pd.read_csv(file, encoding = 'cp949')
raw.head()

์ด์ „ ํฌ์ŠคํŒ…์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ๋ถˆ๋Ÿฌ์˜จ๋‹ค. 

 

 


 ์•ผ๊ตฌ์„ ์ˆ˜๊ฐ€ ๊ฐ•ํ•ด์ง€๋Š” ๊ณ„์ ˆ์ด ์žˆ์„๊นŒ ? 

idea : '์ผ์ž' ์ปฌ๋Ÿผ์—์„œ ์›” ์ •๋ณด๋งŒ ์ถ”์ถœํ•ด ๋‚ด์–ด ์›” ๋ณ„ ์ถœ๋ฃจ์œจ์„ ์‹œ๊ฐํ™” ํ•ด๋ณธ๋‹ค. 

 

1. ์›” ๋ณ„ ๊ธฐ๋ก ์ •๋ฆฌํ•˜๊ธฐ 

- '์ผ์ž' ์ปฌ๋Ÿผ์—์„œ ์›” ์ถ”์ถœํ•˜๊ธฐ 

month_list = []
for monthdate in raw['์ผ์ž']:
    month, date = monthdate.split('-')
    month_list.append(month)
raw['์›”'] = month_list
raw.head()

 

- ๋ถ„์„์— ํ•„์š”ํ•œ ์ปฌ๋Ÿผ ์„ ํƒ 

columns_select = ['ํŒ€', '์ด๋ฆ„', '์ƒ์ผ','์ผ์ž', '์ƒ๋Œ€','ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ', '๋ฃจํƒ€', 'ํƒ€์ ','๋ณผ๋„ท', '์‚ฌ๊ตฌ', 'ํฌ๋น„', '์›”']

data = raw[columns_select]
data.head()

 

- ์›”๋ณ„ ์‹ค์  ์ง‘๊ณ„ 

data_player_month = data.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ', '์›”'], 
                               values = ['ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ','๋ฃจํƒ€','ํƒ€์ ','๋ณผ๋„ท','์‚ฌ๊ตฌ','ํฌ๋น„'], 
                              aggfunc = 'sum', fill_value = 0
                                )
data_player_month

ํŒ€, ์ด๋ฆ„, ์ƒ์ผ๋กœ ์„ ์ˆ˜๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ณ , ์›”๋ณ„ ์‹ค์ ์„ ์ง‘๊ณ„ํ•˜๋Š” pivot table์„ ์ƒ์„ฑํ•œ๋‹ค. ๋น„์–ด์žˆ๋Š” ๊ณณ์€ 0์œผ๋กœ ๊ฐ’์„ ์ฑ„์›Œ ์ค€๋‹ค. 

data_player_month = data_player_month.reset_index()
data_player_month

reset_index๋กœ ๋ฉ€ํ‹ฐ์ธ๋ฑ์Šค๋ฅผ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

 

 

- ํƒ€์œจ, ์ถœ๋ฃจ์œจ, ์žฅํƒ€์œจ, OPS ( ์ฃผ์š” ์‹ค์  ๊ณ„์‚ฐ ) ์ปฌ๋Ÿผ ์ถ”๊ฐ€ 

def cal_hit(df):
    '''
    - ํƒ€์œจ : ๊ณต์„ ์ณ์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ --> ์•ˆํƒ€ / ํƒ€์ˆ˜
    - ์ถœ๋ฃจ์œจ: ์ง„๋ฃจํ•ด์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ -->  (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ)/(ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ+ํฌ์ƒํ”Œ๋ผ์ด)
    - ์žฅํƒ€์œจ : ํƒ€์œจ์— ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ๊ฐ€์ค‘์น˜ ์ถ”๊ฐ€ -->   ๋ฃจํƒ€ / ํƒ€์ˆ˜
    '''
    
    df['ํƒ€์œจ'] = df['์•ˆํƒ€'] / df['ํƒ€์ˆ˜']
    df['์ถœ๋ฃจ์œจ'] = (df['์•ˆํƒ€'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ']) / (df['ํƒ€์ˆ˜'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ'] + df['ํฌ๋น„'])
    df['์žฅํƒ€์œจ'] = df['๋ฃจํƒ€'] / df['ํƒ€์ˆ˜']
    df['OPS'] = df['์ถœ๋ฃจ์œจ'] + df['์žฅํƒ€์œจ']
    return df
player_month_stat = cal_hit(data_player_month)
player_month_stat.head()

 

player_month_stat.info()

๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

player_month_stat = player_month_stat.dropna()
player_month_stat.head()

dropna()๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” row๋ฅผ ์ œ๊ฑฐํ•ด ์ค€๋‹ค. 

 

player_month_stat.info()

๊ฒฐ์ธก๊ฐ’์ด ๋ชจ๋‘ ์ฑ„์›Œ์กŒ๋‹ค! 

 

- ์›”๋ณ„ ์ถœ๋ฃจ์œจ 

month_pivot = player_month_stat.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'],
                             columns = '์›”',
                             values = '์ถœ๋ฃจ์œจ')
month_pivot = month_pivot.reset_index()
month_pivot

์œ„์™€ ๊ฐ™์ด ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. 

 

 

2. ๊ฒฐ๊ณผ : KBO ์ถœ๋ฃจ์œจ ์ตœ๊ณ ํƒ€์ž๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ํ™•์ธํ•ด ๋ณด๊ธฐ 

- KBO ์ถœ๋ฃจ์œจ ์ตœ๊ณ ํƒ€์ž ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

file = './data/player_stat.csv'
player_stat = pd.read_csv(file, encoding = 'cp949')
player_stat.head(20)

์ €๋ฒˆ ๊ธ€์˜ ์ฃผ์ œ์˜€๋˜ ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•ด ๋†“์€ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด ๋ฐ์ดํ„ฐ์™€ ์œ„์—์„œ ๋งŒ๋“ค์–ด ๋†“์€ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ๋ณ‘ํ•˜์—ฌ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 50๋ช…์˜ ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ๊ธฐ๋ก์„ ํ™•์ธํ•ด ๋ณด์ž. 

 

 

- ๋ฐ์ดํ„ฐ ํ•ฉ๋ณ‘ 

df = pd.merge(player_stat, month_pivot, how = 'left', on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'])
# left_on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'], right_on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ']
df.head(10)

df_sort = df.sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(50)
df_sort

๋‹ค์‹œํ•œ๋ฒˆ ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ ํ•˜๊ณ  , 50๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ df_sort ์— ๋„ฃ์–ด ์ค€๋‹ค. 

df_selected = df_sort[['ํŒ€', '์ด๋ฆ„', '์ถœ๋ฃจ์œจ', '03', '04', '05', '06', '07', '08', '09', '10']]
df_selected

๊ทธ๋ฆฌ๊ณ  df_sort์—์„œ ์ถœ๋ฃจ์œจ๊ณผ ๊ด€๋ จ๋œ ์ปฌ๋Ÿผ๋งŒ ๋ฝ‘์•„์„œ df_selectied ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•œ๋‹ค. 

์ˆซ์ž๋กœ ๋‚˜์™€์žˆ์œผ๋‹ˆ 50๋ช…์˜ ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ๊ธฐ๋ก์ด ์–ด๋–ค์ง€ ๊ฐ€๋Š ์ด ์•ˆ ๊ฐ„๋‹ค. ๋”ฐ๋ผ์„œ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•˜์—ฌ ํ™•์ธํ•˜๋„๋ก ํ•œ๋‹ค. 

 

 

 

- ์‹œ๊ฐํ™”๋กœ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ 

df_selected = df_selected.set_index(['ํŒ€','์ด๋ฆ„'])
df_selected

ํžˆํŠธ๋งต์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐํ™”๋ฅผ ํ•  ๊ฒƒ์ด๋‹ค. ์ด ๋•Œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ values ๋ถ€๋ถ„์€ ๋ชจ๋‘ ์ˆซ์žํ˜• ๋ฐ์ดํ„ฐ๋กœ๋งŒ ๋˜์–ด ์žˆ์–ด์•ผ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ํŒ€, ์ด๋ฆ„ ์ปฌ๋Ÿผ์€ index๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

import matplotlib
from matplotlib import font_manager, rc
import platform
import matplotlib.pyplot as plt
import seaborn as sns

# ์ด๋ฏธ์ง€ ํ•œ๊ธ€ ํ‘œ์‹œ ์„ค์ •
if platform.system() == 'Windows':  # ์œˆ๋„์šฐ์ธ ๊ฒฝ์šฐ ๋ง‘์€๊ณ ๋”•
    font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
    rc('font', family=font_name)
else:    # Mac ์ธ ๊ฒฝ์šฐ ์• ํ”Œ๊ณ ๋”•
    rc('font', family='AppleGothic')

#๊ทธ๋ž˜ํ”„์—์„œ ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ๊ฐ€ ํ‘œ์‹œ๋˜๋„๋ก ํ•˜๋Š” ์„ค์ •์ž…๋‹ˆ๋‹ค.
matplotlib.rcParams['axes.unicode_minus'] = False
sns.heatmap(df_selected)

์กฐ๊ธˆ ๋” ์„ค์ •์„ ์ฃผ์–ด ๋ณด๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ค์–ด ๋ณด์ž. 

fig, ax = plt.subplots( figsize=(15,15) )
sns.heatmap(data = df_selected, 
            annot = True, fmt = '.3f', 
            cmap = 'Reds'
           )

์ง„ํ•œ ๋นจ๊ฐ„์ƒ‰์ผ ์ˆ˜๋ก ๋ณด๋‹ค ๋†’์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ€์ง„๋‹ค. ์•„์ง์€ ์›” ๋ณ„๋กœ ๋” ์ž˜ํ•œ ๊ฑด์ง€ ๋ชปํ•œ ๊ฑด์ง€ ํ•œ ๋ˆˆ์— ๋ณด๊ธฐ๋Š” ์–ด๋ ต๋‹ค. ๋”ฐ๋ผ์„œ ์‹œ์ฆŒ ์ „์ฒด ์ถœ๋ฃจ์œจ๊ณผ์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜์—ฌ  ํ™•์ธํ•ด ๋ณด์ž. 

for col in df_selected.columns[1:]:
    df_selected[col] = df_selected[col] - df_selected['์ถœ๋ฃจ์œจ'] 
df_selected['์ถœ๋ฃจ์œจ'] = 0.0

์‹œ์ฆŒ ์ถœ๋ฃจ์œจ๊ณผ ์›”๋ณ„ ์ถœ๋ฃจ์œจ์˜ ์ฐจ์ด๋ฅผ ๋ชจ๋‘ ๊ตฌํ•œ ํ›„, ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ์˜ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋‘ 0์œผ๋กœ ๋ฐ”๊พธ์–ด ์ค€๋‹ค. 

fig, ax = plt.subplots( figsize=(10,10) )

sns.heatmap(data = df_selected.head(50), 
            annot = True, fmt = '.3f', 
            cmap = 'RdBu_r'
           )

๋นจ๊ฐ„์ƒ‰์ด ์ง™์œผ๋ฉด ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ ๋ณด๋‹ค ๋†’์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ–๊ณ  ์žˆ์œผ๋ฉฐ, ํŒŒ๋ž€์ƒ‰์ด ์ง™๋‹ค๋ฉด , ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ ๋ณด๋‹ค ๋” ๋‚ฎ์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ–๊ณ  ์žˆ์Œ์„ ์˜๋ฏธ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ•œํ™” ์ตœ์žฌํ›ˆ ์„ ์ˆ˜๋Š” 3์›” ์ถœ๋ฃจ์œจ์ด ์šฐ์„ธํ–ˆ๊ณ , KIA ์œ ๋ฏผ์ƒ ์„ ์ˆ˜๋Š” 6์›”์— ๋น„ํ•ด 7์›”์— ์—„์ฒญ๋‚œ ์ถœ๋ฃจ์œจ ์ƒ์Šน์„ ๋ณด์ธ๋‹ค. ๊ณ„์ ˆ์— ๋”ฐ๋ฅธ ์˜ํ–ฅ์ด ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ–ˆ๋Š”๋ฐ , (๋”์šด ์—ฌ๋ฆ„์ฒ ์—๋Š” ๊ธฐ๋ก์ด ์ค„์–ด๋“œ๋Š” ๋“ฑ ) ์„ ์ˆ˜ ๋งˆ๋‹ค ๋ชจ๋‘ ๋‹ค๋ฅด๊ณ , ํ™•์‹คํžˆ ํŠน์ • ๋‹ฌ์— ์ถœ๋ฃจ์œจ์ด ๋†’์•„์ง€๋Š” ์„ ์ˆ˜๋“ค์ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

+ Recent posts