๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/53

 

[python ์‹œ๊ฐํ™”] 2. ์„œ์šธ์‹œ ๋Œ€ํ”ผ์†Œ ํ˜„ํ™ฉ ์ง€๋„ ๋งŒ๋“ค๊ธฐ , ์ง€๋„ ์‹œ๊ฐํ™” ( folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ )

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot, jointplot, pairplot, boxplot, swarmplot, heatmap) ๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ..

silvercoding.tistory.com

 

 


 ์‚ฌ์ „ ์ง€์‹ 
  • ํƒ€์œจ : ํƒ€๊ฒฉ์— ์„ฑ๊ณตํ•ด ์‚ด์•„๋‚˜๊ฐ€๋Š” ์ •๋„ 

= ํƒ€๊ฒฉ ์„ฑ๊ณต ํšŸ์ˆ˜ / ํƒ€๊ฒฉ ๊ธฐํšŒ ์ˆ˜ 

= ์•ˆํƒ€ ์ˆ˜ / ํƒ€์ˆ˜ 

  • ์ถœ๋ฃจ์œจ : ์‚ด์•„์„œ ๋‚˜๊ฐ€๋Š” ์ •๋„ 

= ์ง„๋ฃจ ์„ฑ๊ณต ํšŸ์ˆ˜ / ์ง„๋ฃจ ๊ธฐํšŒ ์ˆ˜ 

= (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์— ๋งž๋Š” ๋ณผ) / (ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์— ๋งž๋Š” ๋ณผ+ํฌ์ƒ ํ”Œ๋ผ์ด) 

  • ์žฅํƒ€์œจ : ํƒ€๊ฒฉ์— ์„ฑ๊ณตํ•ด ๋ฉ€๋ฆฌ ์‚ด์•„๋‚˜๊ฐ€๋Š” ์ •๋„ 

= ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ์ˆ˜ / ํƒ€๊ฒฉ ๊ธฐํšŒ ์ˆ˜ 

= ๋ฃจํƒ€ ์ˆ˜ / ํƒ€์ˆ˜ 

(ํƒ€์œจ์— ๊ฑฐ๋ฆฌ ๊ฐœ๋… ์ถ”๊ฐ€ : 2๋ฃจํƒ€ = 1๋ฃจํƒ€ x 2) 

  • OPS  : ์‚ด์•„์„œ ๋ฉ€๋ฆฌ ๋‚˜๊ฐ€๋Š” ์ •๋„   

= ์ถœ๋ฃจ์œจ + ์žฅํƒ€์œจ 

 

 

 

*** ๋ณธ ๊ธฐ์ดˆ ์‹œ๊ฐํ™” ํ”„๋กœ์ ํŠธ์—์„œ๋Š” ์ถœ๋ฃจ์œจ - ์žฅํƒ€์œจ - ops - ํƒ€์œจ ์„ ๊ธฐ์ค€์œผ๋กœ best player ๋ฅผ ๋ถ„์„ํ•œ๋‹ค. ***

 

 

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
file  = './data/KBO_2019_player_gamestats.csv'
raw = pd.read_csv(file, encoding = 'cp949')

๋ฐ์ดํ„ฐ : KBO 2019 ์‹œ์ฆŒ ํƒ€์ž ๊ธฐ๋ก์ง€ ๋ฐ์ดํ„ฐ (๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ œ๊ณต) 

raw.head()

ํƒ€์ž & ๊ฒŒ์ž„ ๋ณ„๋กœ ๊ธฐ๋ก๋˜์–ด ์žˆ๋‹ค. 

raw.info()

raw.columns

์‚ฌ์šฉํ•  ์ปฌ๋Ÿผ์„ ๋ฝ‘๊ธฐ ์œ„ํ•ด ์ „์ฒด ์ปฌ๋Ÿผ์„ ์‚ดํŽด ๋ณธ๋‹ค. 

columns_select = ['ํŒ€', '์ด๋ฆ„', '์ƒ์ผ','์ผ์ž', '์ƒ๋Œ€','ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ', '๋ฃจํƒ€', 'ํƒ€์ ','๋ณผ๋„ท', '์‚ฌ๊ตฌ', 'ํฌ๋น„']
data = raw[columns_select]
data.head()

์›ํ•˜๋Š” ์ปฌ๋Ÿผ๋งŒ ๋ฝ‘์•„์„œ DataFrame์„ ์ƒ์„ฑํ•˜๊ณ , data์— ๋„ฃ์–ด์ฃผ์—ˆ๋‹ค. 

 

 

 

 

 


 KBO best player ๋ถ„์„ํ•˜๊ธฐ 

- ์„ ์ˆ˜๋ณ„ ๊ธฐ๋ก ์ง‘๊ณ„ 

data_player = data.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'], 
                               values = ['ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ','๋ฃจํƒ€','ํƒ€์ ','๋ณผ๋„ท','์‚ฌ๊ตฌ','ํฌ๋น„'], 
                              aggfunc = 'sum')

data_player

pivot_table ์„ ์‚ฌ์šฉํ•˜์—ฌ ํŒ€, ์ด๋ฆ„ , ์ƒ์ผ์„ ๊ธฐ์ค€์œผ๋กœ ํƒ€์ˆ˜, ์•ˆํƒ€, ํ™ˆ๋Ÿฐ, ๋ฃจํƒ€, ํƒ€์ , ๋ณผ๋„ท, ์‚ฌ๊ตฌ, ํฌ๋น„์˜ ์ด ํ•ฉ๊ณ„๋ฅผ ์ง‘๊ณ„ํ•œ๋‹ค. 

data_player['ํƒ€์ˆ˜'].hist()

ํŒ๋‹ค์Šค์—์„œ ๊ธฐ๋ณธ์œผ๋กœ ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š” ์‹œ๋ฆฌ์ฆˆ.hist()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํƒ€์ˆ˜์˜ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ . ( ํƒ€์ˆ˜๊ฐ€ ์ ์€ ์„ ์ˆ˜๋ฅผ ์ œ์™ธํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋Š ์ •๋„๊ฐ€ ์ ์€์ง€ ํŒ๋‹จํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ ) ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ณด๋‹ˆ, ํƒ€์ˆ˜๊ฐ€ 50 ์ดํ•˜์ธ ์„ ์ˆ˜๋ฅผ ์ œ์™ธํ•˜๋ฉด ์ ์ ˆํ•  ๊ฒƒ ๊ฐ™๋‹ค. 

cond = data_player['ํƒ€์ˆ˜'] > 50
data_player = data_player[cond].reset_index()    # ๋‹ค์ค‘ ์ธ๋ฑ์Šค --> ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•˜๊ธฐ
data_player

ํƒ€์ˆ˜๊ฐ€ 50 ์ดˆ๊ณผ์ธ ์„ ์ˆ˜๋“ค๋งŒ ์ถ”๋ฆฌ๊ณ , ์ˆ˜์›”ํ•œ ์ปจํŠธ๋กค์„ ์œ„ํ•ด reset_index()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์ธ๋ฑ์Šค๋ฅผ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•ด์ค€๋‹ค. 

def cal_hit(df):
    '''
    - ํƒ€์œจ : ํƒ€๊ฒฉ์— ์„ฑ๊ณตํ•ด์„œ ์ง„๋ฃจํ•˜๋Š” ๋น„์œจ --> ์•ˆํƒ€ / ํƒ€์ˆ˜
    - ์ถœ๋ฃจ์œจ: ์‚ด์•„์„œ ์ง„๋ฃจํ•˜๋Š” ๋น„์œจ -->  (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ)/(ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ+ํฌ์ƒํ”Œ๋ผ์ด)
    - ์žฅํƒ€์œจ : ํƒ€์œจ์— ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ๊ฐ€์ค‘์น˜ ์ถ”๊ฐ€ -->   ๋ฃจํƒ€ / ํƒ€์ˆ˜
    - OPS : ์ถœ๋ฃจ์œจ + ์žฅํƒ€์œจ 
    '''
    
    df['ํƒ€์œจ'] = df['์•ˆํƒ€'] / df['ํƒ€์ˆ˜']
    df['์ถœ๋ฃจ์œจ'] = (df['์•ˆํƒ€'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ']) / (df['ํƒ€์ˆ˜'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ'] + df['ํฌ๋น„'])
    df['์žฅํƒ€์œจ'] = df['๋ฃจํƒ€'] / df['ํƒ€์ˆ˜']
    df['OPS'] = df['์ถœ๋ฃจ์œจ'] + df['์žฅํƒ€์œจ']
    return df

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ๋„ฃ์–ด ์ฃผ๋ฉด best player ๋ฅผ ์„ ์ •ํ•˜๋Š” ๊ธฐ์ค€์ธ ํƒ€์œจ, ์ถœ๋ฃจ์œจ, ์žฅํƒ€์œจ, OPS ์ปฌ๋Ÿผ์„ ๊ณ„์‚ฐํ•˜์—ฌ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. 

player_stat = cal_hit(data_player)
player_stat

 

- ์ถœ๋ฃจ์œจ -> ์žฅํƒ€์œจ -> OPS -> ํƒ€์œจ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌํ•˜์—ฌ best 10 player ๋ฝ‘์•„๋ณด๊ธฐ 

player_stat = player_stat.sort_values(by = ['์ถœ๋ฃจ์œจ','์žฅํƒ€์œจ','OPS', 'ํƒ€์œจ'], ascending = False)
player_stat = player_stat.reset_index(drop = True)
player_stat.head(10)

์ธ๋ฑ์Šค๊ฐ€ ๋’ค์ฃฝ๋ฐ•์ฃฝ์œผ๋กœ ์ •๋ ฌ๋˜๋ฏ€๋กœ, reset_index()๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธ๋ฑ์Šค๋ฅผ ์ •๋ ฌํ•ด ์ค€๋‹ค. 

 

๊ฒฐ๋ก ์ ์œผ๋กœ ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ํ•œ KBO 2019 ์‹œ์ฆŒ best player๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

player_stat['์ด๋ฆ„'][:10]

 

 

 

 


 ํŒ€๋ณ„ ์„ ์ˆ˜๋“ค์˜ ์ถœ๋ฃจ์œจ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ 
import matplotlib
from matplotlib import font_manager, rc
import platform
import matplotlib.pyplot as plt
import seaborn as sns

# ์ด๋ฏธ์ง€ ํ•œ๊ธ€ ํ‘œ์‹œ ์„ค์ •
if platform.system() == 'Windows':  # ์œˆ๋„์šฐ์ธ ๊ฒฝ์šฐ ๋ง‘์€๊ณ ๋”•
    font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
    rc('font', family=font_name)
else:    # Mac ์ธ ๊ฒฝ์šฐ ์• ํ”Œ๊ณ ๋”•
    rc('font', family='AppleGothic')

#๊ทธ๋ž˜ํ”„์—์„œ ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ๊ฐ€ ํ‘œ์‹œ๋˜๋„๋ก ํ•˜๋Š” ์„ค์ •์ž…๋‹ˆ๋‹ค.
matplotlib.rcParams['axes.unicode_minus'] = False

์šฐ์„  ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ธ์„ ๋•Œ ํ•œ๊ธ€์ด ๊นจ์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ํฐํŠธ์„ค์ •์„ ํ•˜๊ณ  , ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ ํ‘œ์‹œ ์—ฌ๋ถ€๋ฅผ ์„ค์ •ํ•œ๋‹ค. 

 

 

- ํŒ€ ๋ณ„ ์ถœ๋ฃจ์œจ ๋ถ„ํฌ ( boxplot & swarmplot ) 

sns.boxplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ')

๊ฐœ์ˆ˜๋„ ์ง๊ด€์ ์œผ๋กœ ๋ณด๊ธฐ ์œ„ํ•ด swarmplot์„ ์ถ”๊ฐ€ ํ•œ๋‹ค. 

sns.swarmplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ')
sns.boxplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ')

์ƒ‰๊น”์ด ๊ฒน์น˜๊ธฐ ๋•Œ๋ฌธ์— boxplot์— ์—ฌ๋Ÿฌ ์˜ต์…˜์„ ์ฃผ์–ด ๋ณด๊ธฐ ํŽธํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ์ค€๋‹ค. 

sns.swarmplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ')
sns.boxplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ',
            showcaps=False,             # ๋ฐ•์Šค ์ƒ๋‹จ ๊ฐ€๋กœ๋ผ์ธ ๋ณด์ด์ง€ ์•Š๊ธฐ
            whiskerprops={'linewidth':0}, # ๋ฐ•์Šค ์ƒ๋‹จ ์„ธ๋กœ ๋ผ์ธ ๋ณด์ด์ง€ ์•Š๊ธฐ 
            showfliers=False,           # ๋ฐ•์Šค ๋ฒ”์œ„ ๋ฒ—์–ด๋‚œ ์•„์›ƒ๋ผ์ด์–ด ํ‘œ์‹œํ•˜์ง€ ์•Š๊ธฐ
            boxprops={'facecolor':'None'}, # ๋ฐ•์Šค ์ƒ‰์ƒ ์ง€์šฐ๊ธฐ
           )

์ด๋ ‡๊ฒŒ ํŒ€๋ณ„ ์„ ์ˆ˜๋“ค์˜ ์ถœ๋ฃจ์œจ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. NC ํŒ€์ด ๊ฐ€์žฅ ๋†’์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ€์ง„ ์„ ์ˆ˜๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ๊ณ , ์ถœ๋ฃจ์œจ์ด ๊ฐ€์žฅ ๋งŽ์ด ๋ชฐ๋ ค ์žˆ๋Š” ํŒ€์€ LGํŒ€์ธ ๊ฒƒ์œผ๋กœ ๋ณด์—ฌ์ง€๋ฉฐ, ์ถœ๋ฃจ์œจ์˜ ์ค‘์œ„์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ํŒ€์€ ๋‘์‚ฐ์ธ ๊ฒƒ ๋“ฑ์„ ํ•œ ๋ˆˆ์— ์•Œ ์ˆ˜ ์žˆ๋‹ค.    

 

 

 

 


 ๋งˆ๋ฌด๋ฆฌ : csv ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๊ธฐ 
file = './data/player_stat.csv'
player_stat.to_csv(file, encoding = 'cp949', index = False)

๋ณธ ํฌ์ŠคํŒ…์—์„œ ๋ถ„์„์— ์‚ฌ์šฉํ•œ player_stat์„ csv ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. 


 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/52

 

[python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot, jointplot, pairplot, boxplot, swarmplot, heatmap)

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ ์ˆ˜์น˜ํ˜• x ์ˆ˜์น˜ํ˜• : scatterplot, lmplot, jointplot ์ˆ˜์น˜ํ˜• x ์นดํ…Œ๊ณ ๋ฆฌํ˜• : boxplot, violinplot, barplot, heatmap ์ˆ˜์น˜ํ˜• x ์œ„์น˜ํ˜• : folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” seaborn ๋ผ์ด..

silvercoding.tistory.com

 

 

 

 

1. folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉํ•ด๋ณด๊ธฐ 


 folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
!pip install folium

folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ํ•˜๊ธฐ 

import folium

folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ import ํ•ด์ค€๋‹ค. 


 folium ์ง€๋„ ์‹œ๊ฐํ™” ์‚ฌ์šฉ ๋ฒ• 

์ง€๋„ ์‹œ๊ฐํ™”

  • ์ง€๋„์ƒ์„ฑํ•˜๊ธฐ
    • m = folium.Map(location = [์œ„๋„, ๊ฒฝ๋„], zoom_start = ํ™•๋Œ€์ •๋„)
  • ์ •๋ณด ์ถ”๊ฐ€ํ•˜๊ธฐ
    • ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
      • folium.Marker([์œ„๋„, ๊ฒฝ๋„]).add_to(m)
    • ์› ์ถ”๊ฐ€ํ•˜๊ธฐ
      • folium.CircleMarker([์œ„๋„, ๊ฒฝ๋„], radius= ์›ํฌ๊ธฐ).add_to(m)
    • ์ถ”๊ฐ€์˜ต์…˜:
      • tooltip="๋งˆ์šฐ์Šค ์˜ฌ๋ฆฌ๋ฉด ๋ณด์—ฌ์งˆ ์ •๋ณด"
      • popup="ํด๋ฆญํ•˜๋ฉด ๋ณด์—ฌ์งˆ ์ •๋ณด"
    • ๊ธฐํƒ€) ClickForMarker('์ฒดํฌ').add_to(m) ์ง€๋„์—์„œ ํด๋ฆญํ•  ๊ฒฝ์šฐ ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ

 ์„œ์šธ์—ญ ์ง€๋„ ์‹œ๊ฐํ™”
m = folium.Map(location=[37.5536067,126.9674308],  
               zoom_start=12)
m

 

- ๋งˆ์ปค ์ถ”๊ฐ€ 

folium.Marker([37.5536067,126.9674308],    # ์„œ์šธ์—ญ์œ„์น˜
              tooltip="์„œ์šธ์—ญ(๋งˆ์šฐ์Šค์˜ฌ๋ฆฌ๋ฉด๋ณด์—ฌ์ง)",
              popup="์„œ์šธ์—ญ(ํด๋ฆญํ•˜๋ฉด ๋ณด์—ฌ์ง)",
              ).add_to(m)
m

tooltip๊ณผ popup์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งˆ์šฐ์Šค๋ฅผ ์˜ฌ๋ฆฌ๊ฑฐ๋‚˜ , ํด๋ฆญํ•˜๋ฉด ๋ฉ”์‹œ์ง€๋ฅผ ๋„์šธ ์ˆ˜ ์žˆ๋‹ค. 

 

- ์„œํด๋งˆ์ปค ์ถ”๊ฐ€ 

folium.CircleMarker([37.5536067,126.9674308],
                    radius=20,
                    tooltip = '๋งˆ์šฐ์Šค์˜ฌ๋ฆด๊ฒฝ์šฐ'
               ).add_to(m)
m

 

- ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ MiniMap 

from folium.plugins import MiniMap

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],   # ๊ธฐ์ค€์ขŒํ‘œ: ์„œ์šธ์—ญ
               zoom_start=12)

# ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ํ•˜๊ธฐ
minimap = MiniMap() 
minimap.add_to(m)


# ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
folium.Marker([37.5536067,126.9674308],    # ์„œ์šธ์—ญ์œ„์น˜
              tooltip="์„œ์šธ์—ญ(๋งˆ์šฐ์Šค์˜ฌ๋ฆฌ๋ฉด๋ณด์—ฌ์ง)",
              popup="์„œ์šธ์—ญ(ํด๋ฆญํ•˜๋ฉด ๋ณด์—ฌ์ง)",
              ).add_to(m)
m

๋ฏธ๋‹ˆ๋งต์„ ์ƒ์„ฑํ•˜๊ณ , add_to๋กœ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค. 

 

 

 - ํด๋ฆญํ•ด์„œ ๋งˆํฌ ์ถ”๊ฐ€ํ•˜๊ธฐ 

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],   # ๊ธฐ์ค€์ขŒํ‘œ: ์„œ์šธ์—ญ
               zoom_start=12)

# ์จํด๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
folium.CircleMarker([37.5536067,126.9674308],
                    radius=20,
                    tooltip = '๋งˆ์šฐ์Šค์˜ฌ๋ฆด๊ฒฝ์šฐ'
               ).add_to(m)
folium.ClickForMarker('์ฒดํฌ์ถ”๊ฐ€').add_to(m)    
m

๋งˆ์šฐ์Šค๋กœ ํด๋ฆญํ•ด์„œ ์ง์ ‘ ๋งˆ์ปค๋ฅผ ์ถ”๊ฐ€ํ•ด์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. (ClickForMarker ์‚ฌ์šฉ) 


 

 

 

 

2. ์‘์šฉ : ์„œ์šธ ๋Œ€ํ”ผ์†Œ ํ˜„ํ™ฉ ์ง€๋„ ๋งŒ๋“ค๊ธฐ 


< ๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ > 

http://data.seoul.go.kr/dataList/OA-2189/S/1/datasetView.do

 

์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ

๋ชจ๋“  ์„œ์šธ์‹œ๋ฏผ์„ ์œ„ํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ์—์„œ ์„œ์šธ์‹œ์™€ ์—ฐ๊ณ„ ๊ธฐ๊ด€์ด ๊ณต๊ฐœํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธ์‹œ์™€ ๊ด€๋ จ๋œ ๋‹ค์–‘ํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”.

data.seoul.go.kr


import pandas as pd

DataFrame์„ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— pandas๋ฅผ import ํ•ด์ค€๋‹ค. 

file = './data/์„œ์šธ์‹œ ๋Œ€ํ”ผ์†Œ ๋ฐฉ์žฌ์‹œ์„ค ํ˜„ํ™ฉ (์ขŒํ‘œ๊ณ„_ WGS1984).csv'
raw = pd.read_csv(file, encoding = 'cp949')   # encoding = 'cp949' : MS ํ”„๋กœ๊ทธ๋žจ ์‚ฌ์šฉ์‹œ, ๊ทธ์™ธ์˜ ๊ฒฝ์šฐ encoding = 'utf-8'  (๊ธฐ๋ณธ๊ฐ’)
raw.head()

raw.info()

 

์ˆ˜์—…์—์„œ๋Š” ์œ„๋„, ๊ฒฝ๋„๋ฅผ ์„ค์ •ํ•˜์—ฌ ๋งˆ์ปค๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  , ๋งˆ์šฐ์Šค๋ฅผ ๋งˆ์ปค ์œ„์— ์˜ฌ๋ฆฌ๋ฉด ๋Œ€ํ”ผ์†Œ ๋ช…์นญ์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ ๊นŒ์ง€ ๋ฐฐ์› ๊ณ  , ๊ฑฐ๊ธฐ์— ๋”ํ•ด์„œ ํด๋ฆญํ•˜๋ฉด ์ตœ๋Œ€ ์ˆ˜์šฉ์ธ์›์ด ๋‚˜์˜ค๋„๋ก ํ•˜๋Š” ์ง€๋„๋ฅผ ๋งŒ๋“ค์–ด ๋ณธ๋‹ค. 

i = 7

lat = raw.loc[i, '์œ„๋„']
long = raw.loc[i, '๊ฒฝ๋„']
name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']

print(lat, long, name, num)

37.5365343 126.9650202 ๋‚จ์ •์ดˆ๋“ฑํ•™๊ต 300

์šฐ์„  ์ธ๋ฑ์Šค๊ฐ€ 7์ผ ๋•Œ์˜ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์–ป์–ด๋ณด๊ธฐ 

for i in range(len(raw)):
    lat = raw.loc[i, '์œ„๋„']
    long = raw.loc[i, '๊ฒฝ๋„']
    name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
    num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']


    print(name, lat, long, num)

์ด 694๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‚˜์™”์„ ๊ฒƒ์ด๋‹ค. 

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],
               zoom_start=12)
m
# ๋Œ€ํ”ผ์†Œ ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ

for i in range(len(raw)):
    lat = raw.loc[i, '์œ„๋„']
    long = raw.loc[i, '๊ฒฝ๋„']
    name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
    num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']


    folium.Marker([lat, long],tooltip= name, popup='%d๋ช…'%num).add_to(m)    
m

for๋ฌธ์„ ์ด์šฉํ•˜์—ฌ ๋Œ€ํ”ผ์†Œ ๋ชจ๋“  ๊ณณ์— ๋งˆ์ปค๋ฅผ ์ฐ์–ด ์ค€๋‹ค. 

 


* ๋ฌธ์ œ์  : ๋งˆ์ปค๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์•„์„œ ์ง€๋„๋ฅผ ์‚ดํŽด๋ณด๊ธฐ์— ์–ด๋ ค์›€์ด ์ƒ๊ธด๋‹ค 

* ํ•ด๊ฒฐ : MarkerCluster ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ทผ์ฒ˜์— ์žˆ๋Š” ๋งˆ์ปค๋“ค ๋ผ๋ฆฌ ๊ทธ๋ฃน์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. 


 MarkerCluster ์‚ฌ์šฉํ•˜์—ฌ ๊ทธ๋ฃน ์ƒ์„ฑํ•˜๊ธฐ 
# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],
               zoom_start=12)
marker_cluster = MarkerCluster().add_to(m)  # ํด๋Ÿฌ์Šคํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ

# ๋Œ€ํ”ผ์†Œ ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ

for i in range(len(raw)):
    lat = raw.loc[i, '์œ„๋„']
    long = raw.loc[i, '๊ฒฝ๋„']
    name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
    num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']


    folium.Marker([lat, long],tooltip= name, popup='%d๋ช…'%num).add_to(marker_cluster)    
m

๊ทธ๋ฃน์„ ํด๋ฆญํ•˜์—ฌ ์ž์„ธํžˆ ํ™•๋Œ€๊ฐ€ ๋˜๋ฉด, ๋งˆ์ปค๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋ณด๋‹ค ์ง€๋„๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. 

 

 

- ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ 

from folium.plugins import MiniMap

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],
               zoom_start=12)
marker_cluster = MarkerCluster().add_to(m)  # ํด๋Ÿฌ์Šคํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ


# ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ 
minimap = MiniMap()
m.add_child(minimap) 


# ๋Œ€ํ”ผ์†Œ ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
for i in range(len(raw)):
    lat = raw.loc[i, '์œ„๋„']
    long = raw.loc[i, '๊ฒฝ๋„']
    name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
    num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']


    folium.Marker([lat, long],tooltip= name, popup='%d๋ช…'%num).add_to(marker_cluster)    
m


 ๋งˆ๋ฌด๋ฆฌ : ์ง€๋„ ์ €์žฅ (html) 
m.save('./data/Sheltermap.html')

html๋กœ ์ €์žฅํ•˜์—ฌ ํฐ ํ™”๋ฉด์œผ๋กœ ์–ธ์ œ๋“  ์ง€๋„๋ฅผ ๊บผ๋‚ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 


 

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

์ˆ˜์น˜ํ˜• x ์ˆ˜์น˜ํ˜• : scatterplot, lmplot, jointplot 

์ˆ˜์น˜ํ˜• x ์นดํ…Œ๊ณ ๋ฆฌํ˜• : boxplot, violinplot, barplot, heatmap 

์ˆ˜์น˜ํ˜• x ์œ„์น˜ํ˜• : folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ 

 

 

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ distplot, relplot, jointplot, pairplot, boxplot, swarmplot, heatmap ์„ ์‚ฌ์šฉํ•œ๋‹ค. 

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import seaborn as sns

seaborn ์„ import ํ•ด์ฃผ๊ณ , ์•ฝ์–ด๋Š” sns๋ฅผ ์“ด๋‹ค. 

raw = sns.load_dataset('tips')

seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ˜•์‹์œผ๋กœ, pandas๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋‹ค. 

raw.head()

total_bill ( ์ด ๊ธˆ์•ก ) , tip ( ํŒ ) , sex ( ์„ฑ๋ณ„ ) , smoker ( ํก์—ฐ ์—ฌ๋ถ€ ) , day ( ์š”์ผ ) , time ( ์‹์‚ฌ ์‹œ๊ฐ„ ) , size ( ์ธ์› ์ˆ˜ ) 

raw.info()

info()๋ฅผ ์ด์šฉํ•˜์—ฌ row, column์˜ ๊ฐœ์ˆ˜, data type, ๊ฒฐ์ธก๊ฐ’ ํ™•์ธ์„ ํ•ด ์ค€๋‹ค. 

 


 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜•) 
  • sns.distplot( df[ '์ปฌ๋Ÿผ๋ช…' ] )
raw['total_bill']

์ˆ˜์น˜ํ˜• ์ž๋ฃŒ์ธ total_bill ์ปฌ๋Ÿผ์˜ ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ณธ๋‹ค. 

sns.distplot(raw['total_bill'])

 

 

 

 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ ( ์ˆ˜์น˜ํ˜• vs ์ˆ˜์น˜ํ˜• ) 
  • relplot( data=df, x=, y=, hue=, kind='scatter' ) 

: ๋‘ ๊ฐœ์˜ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜ ๋ถ„ํฌ ํ™•์ธ 

- kind = 'scatter' (default

- kind = 'line' 

sns.relplot(x = 'tip', y = 'total_bill', data = raw)  # kind ์˜ต์…˜ ๊ฐ’ ๋ฏธ์ง€์ •์‹œ "scatter"

sns.relplot(x = 'tip', y = 'total_bill', data = raw, kind = 'line')

sns.relplot(x = 'tip', y = 'total_bill', data = raw, hue = 'sex')

 

hue๋ฅผ ์„ฑ๋ณ„๋กœ ์ง€์ •ํ•˜์—ฌ ์„ฑ๋ณ„์„ ๊ธฐ์ค€์œผ๋กœ ๋‚˜๋ˆ„์–ด ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 


 ๊ด€๊ณ„ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์ˆ˜์น˜ํ˜• ) 
  • jointplot ( data=df, x=, y=, kind='scatter' ) 

- kind = 'scatter' : (default / point) 

- kind = 'reg' : (point + regression)

- kind = 'kde' : ๋ˆ„์ ๋ถ„ํฌ์ฐจํŠธ like ์ง€๋„

 

sns.jointplot(data = raw, x = 'tip', y = 'total_bill')  # kind ๊ฐ’ ๋ฏธ ์ง€์ •์‹œ ๊ธฐ๋ณธ ๊ฐ’์€ kind = 'scatter'

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'kde')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'reg')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'hex')

kind='hex'๋Š” kde์™€ ๋น„์Šทํ•œ๋ฐ, ๊ฒน์น˜์ง€ ์•Š๋Š” ์ •์œก๊ฐํ˜•์œผ๋กœ ํ‘œ์‹œ๋œ๋‹ค. 

 

 

 

  • pairplot( data=df ) 

: df ์˜ ๋ชจ๋“  ์ˆ˜์น˜ํ˜•๋ฐ์ดํ„ฐ ์ปฌ๋Ÿผ์—์„œ ๋‘ ์ปฌ๋Ÿผ์”ฉ ๊ด€๊ณ„๋ฅผ ์‹œ๊ฐํ™” ํ•จ 

 

sns.pairplot(data = raw)

์„œ๋กœ ๊ฐ™์€ ์ปฌ๋Ÿผ์ด๋ฉด ํžˆ์Šคํ† ๊ทธ๋žจ์„ ๊ทธ๋ฆฌ๊ณ , ๋‹ค๋ฅธ ์ปฌ๋Ÿผ์€ ๋‘ ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ์˜ ๊ด€๊ณ„๋ฅผ ์‚ฐ์ ๋„๋กœ ํ‘œ์‹œํ•ด ์ค€๋‹ค. 

sns.pairplot(data = raw, hue = 'sex')

hue๋ฅผ ์„ฑ๋ณ„๋กœ ์ง€์ •ํ•˜์—ฌ ์„ฑ๋ณ„ ๊ฐ„ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 


 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์นดํŽ˜๊ณ ๋ฆฌํ˜•) 
  • boxplot( data = df, x = , y = , hue = )
sns.boxplot(data = raw, x = 'day', y = 'tip')

์š”์ผ ๋ณ„ ํŒ์˜ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•œ๋‹ค. 

sns.boxplot(data = raw, x = 'day', y = 'tip', hue = 'smoker')

hue๋ฅผ smoker๋กœ ์ง€์ •ํ•˜์—ฌ ํก์—ฐ์ž ์—ฌ๋ถ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

  • swarmplot( data=, x=, y=, hue=, dodge= )

boxplot๊ณผ ๋น„์Šทํ•˜๋‹ค. boxplot์€ ํ•˜๋‚˜์˜ ๊ธฐ์ค€์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ๋ฒ”์œ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์šฉ์ดํ•˜์ง€๋งŒ , ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ํ‘œํ˜„ํ•˜์ง€๋Š” ์•Š๊ธฐ ๋•Œ๋ฌธ์— , ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๊ฐ’์„ ๋น„๊ตํ•˜๊ธฐ์—๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋•Œ swarmplot์„ ์ด์šฉํ•œ๋‹ค. 

sns.swarmplot(data = raw, x = 'day', y = 'tip', hue = 'smoker', dodge=True)

 

dodge=False๋กœ ํ•˜๋ฉด ๋‘ ์ƒ‰์ด ๊ฒน์ณ ๋‚˜์˜จ๋‹ค. 

sns.boxplot(data = raw, x = 'day', y = 'tip', hue = 'smoker')
sns.swarmplot(data = raw, x = 'day', y = 'tip', hue = 'smoker', dodge=True)

 

boxplot๊ณผ swarplot์„ ๊ฒน์ณ์…” ๊ทธ๋ ค๋ณผ ์ˆ˜๋„ ์žˆ๋‹ค. 

sns.boxplot(data = raw, x = 'size', y = 'tip', hue = 'sex')

 

 

 

 

  • barplot( data=df, x=, y=, hue= ) 
sns.barplot(data = raw, x = 'size', y = 'tip', hue = 'sex')

 

 

 

 ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์นดํ…Œ๊ณ ๋ฆฌํ˜• vs ์นดํ…Œ๊ณ ๋ฆฌํ˜•)
  • heatmap( data = df )
df = raw.pivot_table(index = 'day', columns = 'size', values = 'tip', aggfunc='mean')
df

์šฐ์„  ์ธ๋ฑ์Šค๋ฅผ day, column์„ size, value๋ฅผ tip์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ์š”์ผ ๋ณ„ ์ธ์›์ˆ˜์— ๋”ฐ๋ฅธ tip ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•œ๋‹ค. 

sns.heatmap(data = df)

์นดํ…Œ๊ณ ๋ฆฌํ˜• - ์นดํ…Œ๊ณ ๋ฆฌํ˜•์˜ ๊ด€๊ณ„๋ฅผ ์ˆ˜์น˜ํ˜•์„ ์ด์šฉํ•˜์—ฌ ํ‘œํ˜„ํ•˜๋Š” heatmap์ด๋‹ค. 

 

๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ ์„ค์ •์„ ํ•  ์ˆ˜ ์žˆ๋‹ค. 

sns.heatmap(data = df, 
           annot = True, fmt = '.2f')

 

sns.heatmap(data = df, 
           annot = True, fmt = '0.2f',
           cmap = 'Pastel1')

* annot : ์ˆซ์ž ํ‘œ์‹œ, fmt : ์ˆซ์ž ํฌ๋งคํŒ… , cmap : ์ƒ‰๊น” 

* cmap ์ถ”์ฒœ ์ƒ‰ : Reds , Blues, Vlag, Pastel1, RdBu_r


 

 

+ Recent posts