๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/58

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 2-2 ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ๋‚ ?

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/57 https://silvercoding.tistory.com/56 https://silvercoding.tistory.com/55 https://silvercoding.tistory.com/54 https://silvercoding...

silvercoding.tistory.com

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
raw = pd.read_excel('./data/subway_raw.xlsx')

2-1 ํฌ์ŠคํŒ…์—์„œ ํ•ฉ๋ณ‘ํ•ด ๋†“์€ 2019๋…„ 1์›” - 6์›” ์ง€ํ•˜์ฒ  ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค. 

raw.info()

์ด 99342 ๊ฐœ์˜ row๊ฐ€ ์กด์žฌํ•˜๊ณ  , null ๊ฐ’์ด ์—†๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. 

 

 

 


 ์–ด๋Š ์—ญ์—์„œ , ์–ธ์ œ ์ง€ํ•˜์ฒ ์„ ๊ฐ€์žฅ ๋งŽ์ด ํƒˆ๊นŒ ? 

1. ์Šน๊ฐ์ด ๊ฐ€์žฅ ๋งŽ์ด ํƒ€๋Š” ์—ญ

data_station = raw.pivot_table(index = '์—ญ๋ช…', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc='sum')
data_station = data_station.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending = False)
data_station.head(10)  # ์Šน์ฐจ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์—ญ ์ƒ์œ„ 10๊ฐœ

์—ญ๋ณ„๋กœ ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌํ•œ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์ด๋‹ค. ์ƒ์œ„ 10๊ฐœ ์—ญ์„ ์ถœ๋ ฅํ•˜์˜€๊ณ  , 2019๋…„ ์ƒ๋ฐ˜๊ธฐ ๊ฐ€์žฅ ๋งŽ์€ ์Šน๊ฐ์ˆ˜๊ฐ€ ์žˆ์—ˆ๋˜ ์—ญ์€ ์ž ์‹ค์—ญ์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

2. ์‹œ๊ฐํ™” : ๋…ธ์„  ๋ณ„ (1-9ํ˜ธ์„ ) ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜ ๋น„๊ตํ•ด ๋ณด๊ธฐ


* ํžˆํŠธ๋งต 

  • sns.heatmap(data, annot = True, fmt = '.0f', cmap = "RdBu_r")
    • annot : True ์ผ๊ฒฝ์šฐ ๊ฐ’์„ ๊ทธ๋ž˜ํ”„์— ํ‘œ์‹œ
    • fmt : ๊ฐ’ ํ‘œ์‹œ ํ˜•ํƒœ.
      • ex) 'f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„(default ๋กœ ๊ฐ’์ด ์žˆ๋Š” ์†Œ์ˆ˜ ์ž๋ฆฌ๊นŒ์ง€ ํ‘œ์‹œ๋จ)
      • ex) '.0f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„ํ•ด๋‹ฌ๋ผ (์†Œ์ˆ˜ 0๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€๋งŒ == ์ •์ˆ˜์ž๋ฆฌ๋งŒ )
      • ex) '.1f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„ํ•ด๋‹ฌ๋ผ (์†Œ์ˆ˜ 1๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€๋งŒ)
      • ex) .1% ๋Š” ํผ์„ผํŠธ(์†Œ์ˆ˜ ์ฒซ๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€ ํ‘œ์‹œ)
    • cmap : ์ƒ‰์ƒ ์ฐจํŠธ. _r ์œผ๋กœ ๋๋‚˜๋Š” ์ฐจํŠธ๋Š” ์ƒ‰์ƒ ๋ฐฉํ–ฅ ๋ฐ˜๋Œ€๋กœ ๋˜์–ด์žˆ๋Š” ๋ฒ„์ „์ž„(์•„๋ž˜ ์ปฌ๋Ÿฌ ๋ฆฌ์ŠคํŠธ ์ฐธ๊ณ )

* cmap ์ข…๋ฅ˜

Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, icefire, icefire_r, inferno, inferno_r, jet, jet_r, magma, magma_r, mako, mako_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, rocket, rocket_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, twilight, twilight_r, twilight_shifted, twilight_shifted_r, viridis, viridis_r, vlag, vlag_r, winter, winter_r


- 1ํ˜ธ์„ ๋งŒ ์‹œ๊ฐํ™” ํ•ด๋ณด๊ธฐ 

line = '1ํ˜ธ์„ '
data_line = raw[raw['๋…ธ์„ ๋ช…'] == line]

# ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”: ๋…ธ์„ ์˜ ์—ญ ์ˆœ์„œ์— ๋งž์ถฐ ์ •๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์—ญID๋„ ์ธ๋ฑ์Šค์— ํฌํ•จ
df_pivot = data_line.pivot_table(index = ['์—ญID', '์—ญ๋ช…'], columns = '์š”์ผ', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜',aggfunc = 'sum') 
df_pivot = df_pivot[['์›”','ํ™”','์ˆ˜','๋ชฉ','๊ธˆ','ํ† ','์ผ']]   # ์ปฌ๋Ÿผ ์ˆœ์„œ๋ฅผ ์š”์ผ์— ๋งž๊ฒŒ ์ •๋ฆฌ
df_pivot = df_pivot / 10000  # ๋งŒ๋ช…๋‹จ์œ„๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ „์ฒด๋ฅผ 1๋งŒ์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ
df_pivot

์—ญ๋ณ„ ์š”์ผ๋ณ„ ์Šน์ฐจ์ด๊ฐ์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•œ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์ด๋‹ค. ์š”์ผ์ด ๋’ค์ฃฝ๋ฐ•์ฃฝ ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์‹œ ์„ ํƒํ•˜์—ฌ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ฆฌํ•ด ์ค€๋‹ค. 

import matplotlib.pyplot as plt
import seaborn as sns 
from matplotlib import font_manager, rc
import platform 

# ํ•œ๊ธ€ ํฐํŠธ ์‚ฌ์šฉ
if platform.system() == 'Windows': 
    path = 'c:/Windows/Fonts/malgun.ttf'
    font_name = font_manager.FontProperties(fname=path).get_name()
    rc('font', family=font_name)
elif platform.system() == 'Darwin':
    rc('font', family='AppleGothic')
fig, ax = plt.subplots( figsize=(6,5) )   # ๊ทธ๋ž˜ํ”„ ์‚ฌ์ด์ฆˆ ์ง€์ •
plt.title(f"{line} ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜", fontsize = 20) # for title
sns.heatmap(df_pivot, cmap = "Reds", 
           annot = True, fmt = '.0f')

1ํ˜ธ์„ ์˜ ์—ญ๋ณ„ ์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜ ํžˆํŠธ๋งต๋‹ˆ๋‹ค. ์„œ์šธ์—ญ๊ณผ ์ข…๊ฐ์—ญ์˜ ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์•„๋ณด์ด๊ณ , ๊ทธ์ค‘์—์„œ๋„ ์„œ์šธ์—ญ์˜ ๊ธˆ์š”์ผ์— ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

๋™์ผํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋“  ๋…ธ์„ ์„ ์‹œ๊ฐํ™”ํ•˜์—ฌ ๋น„๊ต ํ•ด ๋ณด์ž. 

 

- 1ํ˜ธ์„ ~9ํ˜ธ์„  ์‹œ๊ฐํ™” ํ•˜์—ฌ ๋น„๊ต ํ•ด๋ณด๊ธฐ 

raw['๋…ธ์„ ๋ช…'].unique()

์ด๋ ‡๊ฒŒ ๋งŽ์€ ๋…ธ์„ ์ด ์žˆ๋Š”๋ฐ , ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” 1ํ˜ธ์„  ~ 9ํ˜ธ์„  ๋งŒ์„ ์‹œ๊ฐํ™” ํ•œ๋‹ค. 

line_seoul_list = [ ]
for line in raw['๋…ธ์„ ๋ช…'].unique():
    if line[1:] == 'ํ˜ธ์„ ':    # xํ˜ธ์„  ์ธ ๊ฒฝ์šฐ๋ฅผ ์„ ํƒ. 
        line_seoul_list.append(line)
line_seoul_list

for line in sorted(line_seoul_list):
    
    # ๋ฐ์ดํ„ฐ ์ •๋ฆฌํ•˜๊ธฐ
    data_line = raw[raw['๋…ธ์„ ๋ช…'] == line]
    df_pivot = data_line.pivot_table(index = ['์—ญID', '์—ญ๋ช…'], columns = '์š”์ผ', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜',aggfunc = 'sum')
    df_pivot = df_pivot[['์›”','ํ™”','์ˆ˜','๋ชฉ','๊ธˆ','ํ† ','์ผ']]
    df_pivot = df_pivot / 10000  # ๋งŒ๋ช…๋‹จ์œ„๋กœ ์ˆ˜์ •
    
    
    # ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
    fig, ax = plt.subplots( figsize=(6,len(df_pivot)/3 ) )   # ๊ทธ๋ž˜ํ”„ ์‚ฌ์ด์ฆˆ๋ฅผ ์กฐ์ •ํ•˜์—ฌ, ์—ญ ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ๋Š” ์„ธ๋กœ๋ฅผ ๊ธธ๊ฒŒ ํ‘œํ˜„
    plt.title(f"{line} ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜", fontsize = 20) # for title
    sns.heatmap(df_pivot, cmap = "Reds", 
               annot = True, fmt = '.0f')

 

์ง„ํ•œ ๋นจ๊ฐ„์ƒ‰์ผ ์ˆ˜๋ก ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ์—ญ์€ ๋ชจ๋“  ์š”์ผ์ด ๋Œ€์ฒด์ ์œผ๋กœ ์ƒ‰์ด ์ง„ํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด์ „ ํฌ์ŠคํŒ…์—์„œ ์ „์ฒด์ ์œผ๋กœ ๊ธˆ์š”์ผ์— ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๊ณ  , ์ฃผ๋ง์ด ๋˜๋ฉด ์Šน๊ฐ์ˆ˜๊ฐ€ ๋–จ์–ด์ง€๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์—ˆ๋Š”๋ฐ , ์—ญ ๋ณ„๋กœ ๋ณด๋‹ˆ ์ฃผ๋ง์ด ๋” ๋งŽ์€ ์—ญ๋„ ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. (  ex, ์ดํƒœ์›, ๊ณ ์†ํ„ฐ๋ฏธ๋„, ํ™๋Œ€์ž…๊ตฌ )  

+ Recent posts