λ¬λμ€νΌμ¦ μμ μ 리
< μ΄μ κΈ >
https://silvercoding.tistory.com/55
[μκ°ν λΆμ νλ‘μ νΈ] 1-2. μΌκ΅¬μ μκ° κ°ν΄μ§λ κ³μ μ΄ μμκΉ?
λ¬λμ€νΌμ¦ μμ μ 리 < μ΄μ κΈ > https://silvercoding.tistory.com/54 https://silvercoding.tistory.com/53 https://silvercoding.tistory.com/52 [python μκ°ν] 1. seaborn λΌμ΄λΈλ¬λ¦¬ (distplot, relplot,..
silvercoding.tistory.com
νΉμ νμ κ°ν μ μκ° μμκΉ?
* idea : 'μλ' 컬λΌμΌλ‘ 겨루μλ μλ νμ μΆμΆνμ¬ κ°κ°μ μλ νκ³Όμ κ²½κΈ°μμμ μΆλ£¨μ¨μ κ³μ°νλ€.
- λ°μ΄ν° λΆλ¬μ€κΈ°
import pandas as pd
file = './data/KBO_2019_player_gamestats.csv'
raw = pd.read_csv(file, encoding = 'cp949')
raw.head()
- μλ νλ³ κΈ°λ‘ μ 리
raw['μλ'].unique()
μ°μ unique() λ₯Ό μ΄μ©νμ¬ 'μλ' 컬λΌμ μλ κ°λ€μ νμΈν΄ μ€λ€. μμ @κ° λΆμ κ²½μ°λ μμ κ²½κΈ°, κ·Έλ μ§ μμΌλ©΄ ν κ²½κΈ°λ₯Ό μλ―Ένλ€. μ΄λ₯Ό 'νμ΄μ¨μ΄' λΌλ 컬λΌμ μΆκ°νμ¬ κ΅¬λΆνλλ‘ νκ³ , 'μλν' 컬λΌμλ νλͺ λ§μ λ£μ΄μ€λ€.
opp_list = [ ]
home_away_list = [ ]
for opp in raw['μλ']:
if "@" in opp:
home_away = 'μμ '
opp = opp.replace('@', '')
else:
home_away = 'ν'
home_away_list.append(home_away)
opp_list.append(opp)
raw['νμ΄μ¨μ΄'] = home_away_list
raw['μλν'] = opp_list
raw.head()
forλ¬Έμ μ΄μ©νμ¬ νμ΄μ¨μ΄μ μλνμ ꡬλΆνκ³ , λκ°μ 컬λΌμ μΆκ°ν΄ μ£Όμλ€.
factors = ['νμ','μν','νλ°', '루ν', 'νμ ','λ³Όλ·', 'μ¬κ΅¬', 'ν¬λΉ']
data = raw.pivot_table(index = ['ν','μ΄λ¦','μμΌ', 'μλν'],
values = factors,
aggfunc = 'sum')
data.head()
μ μλ€μ μνν λ³ μ€μ μ μ§κ³νκΈ° μν΄ νΌλ² ν μ΄λΈμ μμ±νλ€.
cond = data['νμ'] > 0
data = data[ cond ]
data.head()
νμκ° μλ μ μλ€μ λ°μ΄ν°νλ μμμ μ μΈμν¨λ€.
data = data.reset_index()
data.head()
reset_indexλ₯Ό μ¬μ©νμ¬ indexλ₯Ό λͺ¨λ 컬λΌμΌλ‘ λ³κ²½ν΄ μ€λ€.
- νμ vs μλν λ³ μ€μ κ³μ°
def cal_hit(df):
'''
- νμ¨ : 곡μ μ³μ λκ°λ λΉμ¨ --> μν / νμ
- μΆλ£¨μ¨: μ§λ£¨ν΄μ λκ°λ λΉμ¨ --> (μν+λ³Όλ·+λͺΈμλ§λλ³Ό)/(νμ+λ³Όλ·+λͺΈμλ§λλ³Ό+ν¬μνλΌμ΄)
- μ₯νμ¨ : νμ¨μ μ§λ£¨ν λ² μ΄μ€ κ°μ€μΉ μΆκ° --> 루ν / νμ
'''
df['νμ¨'] = df['μν'] / df['νμ']
df['μΆλ£¨μ¨'] = (df['μν'] + df['λ³Όλ·'] + df['μ¬κ΅¬']) / (df['νμ'] + df['μ¬κ΅¬'] + df['ν¬λΉ'])
df['μ₯νμ¨'] = df['루ν'] / df['νμ']
df['OPS'] = df['μΆλ£¨μ¨'] + df['μ₯νμ¨']
return df
μ΄μ κΈμμ μ¬μ©νλ ν¨μ , μ€μ κ³μ°μ νκ³ μ»¬λΌμ μΆκ°ν΄ μ€λ€.
player_stats_opp = cal_hit(data)
player_stats_opp
- 결과 보기 : DataFrame
(1) λμ°μ κ°ν μ μ ? - μμ 10λͺ
team = 'λμ°'
cond = (player_stats_opp['μλν'] == team) & (player_stats_opp['νμ'] > 10)
player_stats_opp[cond].sort_values(by = 'μΆλ£¨μ¨', ascending = False).head(10)
μλνμ΄ λμ°μ΄λ©΄μ , νμκ° 20λ³΄λ€ ν° μ μλ€μ λ½μ λ΄κ³ , μΆλ£¨μ¨μ κΈ°μ€μΌλ‘ μ λ ¬νμ¬ 10λͺ μ μ μλ₯Ό μΆλ ₯ν΄ λ³Έλ€.
λ°λΌμ μλνμ΄ 'λμ°' μΌ λ μΆλ£¨μ¨ μμ 10λͺ μ μ΄λ¦μ λ½μ보면 λ€μκ³Ό κ°λ€
player_stats_opp[cond].sort_values(by = 'μΆλ£¨μ¨', ascending = False)['μ΄λ¦'].head(10)
(2) λ‘―λ°μ κ°ν μ μ ? - μμ 10λͺ
team = 'λ‘―λ°'
cond = (player_stats_opp['μλν'] == team) & (player_stats_opp['νμ'] > 20)
player_stats_opp[cond].sort_values(by = 'μΆλ£¨μ¨', ascending = False).head(10)
λμΌνκ² μν ν΄ μ€λ€.
μλνμ΄ 'λμ°' μΌ λ μΆλ£¨μ¨ μμ 10λͺ μ μ΄λ¦μ λ½μ보면 λ€μκ³Ό κ°λ€
player_stats_opp[cond].sort_values(by = 'μΆλ£¨μ¨', ascending = False)['μ΄λ¦'].head(10)
(3) KBO μ 체 νμ μλλ‘ ν λ³ μΆλ£¨μ¨ μμ 5μΈ νμλ€ νμΈν΄ 보기
hitter_df = pd.DataFrame()
for team in player_stats_opp['μλν'].unique():
print(team)
cond = (player_stats_opp['μλν'] == team) & (player_stats_opp['νμ'] > 20)
df = player_stats_opp[cond].sort_values(by = 'μΆλ£¨μ¨', ascending = False).head(5)
hitter_df = hitter_df.append(df)
hitter_df
νΉμ ν μλ μΆλ£¨μ¨ Top5 μμ λ€μ΄ μλ νμ 리μ€νΈ (μ€λ³΅ μ κ±°)
hitter_df['μ΄λ¦'].unique()
- 결과보기 : Heatmap (μκ°ν)
cond = player_stats_opp['μ΄λ¦'].isin(hitter_df['μ΄λ¦'].unique())
top_df = player_stats_opp[cond]
top_pivot = top_df.pivot_table(index = ['ν','μ΄λ¦'], values = 'μΆλ£¨μ¨', columns = 'μλν', aggfunc = 'sum')
top_pivot
μμμ λ§λ€μ΄ λμλ νΉμ νμ μλλ‘ μΆλ£¨μ¨ top5 μμ λ€μλ μ΄λ¦λ€λ§ player_stats_oppμμ λ½μμ¨ ν , ν΄λΉ μ μλ€μ μλν λ³ μΆλ£¨μ¨ pivot_tableμ μμ±νλ€.
import matplotlib
from matplotlib import font_manager, rc
import platform
import matplotlib.pyplot as plt
import seaborn as sns
# μ΄λ―Έμ§ νκΈ νμ μ€μ
if platform.system() == 'Windows': # μλμ°μΈ κ²½μ° λ§μκ³ λ
font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
rc('font', family=font_name)
else: # Mac μΈ κ²½μ° μ νκ³ λ
rc('font', family='AppleGothic')
#κ·Έλνμμ λ§μ΄λμ€ κΈ°νΈκ° νμλλλ‘ νλ μ€μ μ
λλ€.
matplotlib.rcParams['axes.unicode_minus'] = False
fig, ax = plt.subplots( figsize=(15,15) )
sns.heatmap(data = top_pivot,
annot = True, fmt = '.3f',
cmap = 'Reds',
center= 0.4 # 컬λ¬λ§΅ μ€κ°κ° μ§μ
)
μμ΄ μ§ν μλ‘ μΆλ£¨μ¨μ΄ λμμ μλ―Ένλ€. μλ₯Ό λ€μ΄ , NCμ μμμ§ μ μλ ν΄λΉ μμ¦μμ KIAλ₯Ό μλλ‘ ν κ²½κΈ°μμ μΆλ£¨μ¨μ΄ λμμΌλ©° , ννμ μ κ·Όμ° μ μλ LGμμ κ²½κΈ°μμ κ°νλ€λ κ²μ μ μ μλ€.
sns.heatmap(data = top_pivot,
annot = True, fmt = '.3f',
cmap = 'Reds',
center= 0.6 # 컬λ¬λ§΅ μ€κ°κ° μ§μ
)
μλμ μΈ ν¬κΈ°λ₯Ό μ΄ν΄λ³΄κ³ μ ν λλ centerλ₯Ό λ³κ²½ν΄ κ°λ©° νμΈν΄λ³Ό μ μλ€.