์‚ฌ์šฉ ๋ฐ์ดํ„ฐ์…‹ 

https://www.data.go.kr/dataset/3035522/fileData.do

ํ˜„์žฌ ์ด ๋ฐ์ดํ„ฐ์…‹์€ ํ๊ธฐ ๋˜์—ˆ๋‹ค๊ณ  ๋‚˜์˜จ๋‹ค. 

 

์œ„ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธํ”„๋Ÿฐ๊ฐ•์˜ (๊ณต๊ณต๋ฐ์ดํ„ฐ๋กœ ํŒŒ์ด์ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„) (https://bit.ly/3sISk6Z) ์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์ „์ฒ˜๋ฆฌํ•œ ๋ฐ์ดํ„ฐ๋กœ ์‹œ๊ฐํ™” ์ •๋ฆฌ ์ง„ํ–‰ํ•œ๋‹ค. 

์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ

 

cf1) figure, axes ์ƒ์„ฑ

fig=plt.figure(figsize=(10,3), dpi=100)
ax1=fig.subplots()

 

cf2) ๋ชจ๋“  x tick ํ‘œํ˜„ํ•˜๊ธฐ 

_=plt.xticks(ticks=np.arange(len(df)), labels=df.index)

 

cf3) x์ถ• ์†Œ์ˆ˜์  ์ œ๊ฑฐ

from matplotlib.ticker import MaxNLocator
ax1.xaxis.set_major_locator(MaxNLocator(integer=True))

 

(cf4) ๊ทธ๋ž˜ํ”„์˜ ๋ฐ–์— Legend ํ‘œ์‹œํ•˜๋„๋ก ์„ค์ •

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

 

 

 

 lineplot 


1. pandas plot

(1) pandas plot์˜ ๊ธฐ๋ณธ plot - lineplot 

- df์˜ index ๋˜๋Š” column ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ๊ทธ๋ ค์ง 

df.plot(figsize=(10,3))

์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ , gropuby ์‚ฌ์šฉ

cf) ๋ชจ๋“  x tick ํ‘œํ˜„ํ•˜๊ธฐ 

_=plt.xticks(ticks=np.arange(len(g)), labels=g.index)

- df ์˜ column์ด ์—ฌ๋Ÿฌ ๊ฐœ ์กด์žฌํ•  ๋•Œ  (df์˜ column์ด seaborn์˜ hue์—ญํ• )

์—ฐ๋„๋ณ„ ์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

2. seaborn plot 

sns.lineplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue="์ง€์—ญ๋ช…", ci=None, ax=ax1)
ax1.legend(bbox_to_anchor=(1.02, 1), loc=2)

์—ฐ๋„๋ณ„ ์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

 

 pointplot 

sns.pointplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue="์ง€์—ญ๋ช…", ci=None, ax=ax2)
ax2.legend(bbox_to_anchor=(1.02, 1), loc=2)

์—ฐ๋„๋ณ„ ์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

 

 

 barplot 


1. pandas plot 

(1) df.plot(kind='bar')

- df์˜ index ๋˜๋Š” columm ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ๊ทธ๋ ค์ง

df.plot.bar(rot=0, figsize=(10, 3))
# or
df.plot(kind='bar',rot=0, ax=ax1)

์ง€์—ญ ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ , groupby ์‚ฌ์šฉ

(2) df.plot.bar()

df.plot.bar(color='g',rot=0, figsize=(10,3)) # cmap='Pastel1' ๋˜ํ•œ ๊ฐ€๋Šฅ

์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

- column ์—ฌ๋Ÿฌ๊ฐœ์ผ ๋•Œ ( df์˜ column์ด seaborn์˜ hue์™€ ๊ฐ™์€ ์—ญํ• )

ax=df2.plot.bar(figsize=(10,3), rot=0)
ax.set_ylabel('ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ')

์ง€์—ญ๋ณ„ ์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

2. seaborn plot 

sns.barplot(data=df, x="์ง€์—ญ๋ช…", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ")
# estimator default: mean
# color changable
# palette (https://seaborn.pydata.org/tutorial/color_palettes.html)
# ci: bootstrap resampling (with replacement), sorted means

palette ์ƒ‰ ๋ชจ์Œ ๋งํฌ

์ง€์—ญ๋ณ„ ํ‰๋‹น ๋ถ„์–‘๊ฐ€๊ฒฉ (ํ™•์‹คํžˆ seaborn์ด ๋” ์˜ˆ์˜๊ธด ํ•˜๋‹ค)

 

- hue ์ง€์ • 

sns.barplot(data=df, x="์ง€์—ญ๋ช…", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue='์—ฐ๋„', ci=None)

 

 

 histplot 


1. pandas plot

(1) df.plot(kind='hist') or df.plot.hist()

df.plot(kind='hist', figsize=(10, 3), title='ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ')
# or
ax=df.plot(kind='hist', figsize=(10, 3))
ax.set_title('ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ')

ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ์˜ ๋ถ„ํฌ

df["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"].plot.hist(bins=50)

 

 

(2) df.hist(bins=)

df["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"].hist(bins=50)

axs=df.hist(bins=50, figsize=(10,10))
ax1,ax2,ax3,ax4=axs.flatten()
ax2.set_title('ax๋ณ„ ์ œ๋ชฉ ์ง€์ • ๊ฐ€๋Šฅ')

 

 

 

 

 

2. seaborn plot 

sns.histplot(df["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"], kde=True)

 

 

 

 kdeplot 


1. seaborn plot 

sns.kdeplot(data=df['ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ'])

sns.kdeplot(data=df[['ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ','๋ถ„์–‘๊ฐ€๊ฒฉ']])

 

 

 

 

 

 boxplot 


1. pandas plot

(1) df.plot(kind='box')

df.plot(kind='box', figsize=(5, 5))

 

(2) df.plot.box()

- df ์˜ column์ด x์ถ• 

df.plot.box(fontsize=15)

 

์›”๋ณ„ ์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

- ์ด์ค‘ column์ผ ๊ฒฝ์šฐ 

df.plot.box(figsize=(15, 3), rot=30)

์›”๋ณ„ ์—ฐ๋„๋ณ„ ์ „์šฉ๋ฉด์ ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

(3) df.boxplot(column='', by='')

- by: x์ถ• 

df.boxplot(column='ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ',by='์—ฐ๋„', figsize=(5,3), rot=30)

์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

- by๊ฐ€ ๋ฆฌ์ŠคํŠธ์ผ ๋•Œ 

df.boxplot(column='ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ',by=['์—ฐ๋„','์ „์šฉ๋ฉด์ '], figsize=(20,3), rot=30)

์—ฐ๋„๋ณ„ ์ „์šฉ๋ฉด์ ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

 

2. seaborn plot 

sns.boxplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ")

์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

- hue ์ง€์ •

plt.figure(figsize=(12, 3))
sns.boxplot(data=df_last, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue="์ „์šฉ๋ฉด์ ")

 

 

 violinplot 

1. seaborn plot 

sns.violinplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ")

์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

- hue ์ง€์ •

plt.figure(figsize=(12, 3))
sns.violinplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue="์ „์šฉ๋ฉด์ ")

์—ฐ๋„๋ณ„ ์ „์šฉ๋ฉด์ ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

 

 heatmap 

1. seaborn plot 

plt.figure(figsize=(15, 7), dpi=100)
ax=sns.heatmap(df, cmap="Blues", annot=True, fmt=".0f")

์—ฐ๋„๋ณ„ ์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ, pivot_table๋กœ ์ „์ฒ˜๋ฆฌ ํ•ด์ค€ df์— ์ ์šฉํ•ด์•ผ ํ•จ

 

 

2. matplotlib pcolor  

fig=plt.figure(figsize=(15,5), dpi=100)
ax=fig.subplots()

t2=t.iloc[::-1]
t2
hm1=ax.pcolor(t2, cmap="Blues")
_=fig.colorbar(hm1, ax=ax)

col_len=len(t2.columns)
row_len=len(t2.index)
for r in range(row_len):
    for c in range(col_len):
        _=ax.text(c+0.5, r+0.5, int(t2.iloc[r, c]),ha="center", va="center", color="k", fontsize=11)

_=ax.set_xticks(np.arange(col_len)+0.5)
_=ax.set_xticklabels(t2.columns)

_=ax.set_yticks(np.arange(row_len)+0.5)
_=ax.set_yticklabels(t2.index)

 

 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/60?category=965020 

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 3-1 open API ์‹ ์ฒญ & ํ™œ์šฉ (์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ)

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/59 https://silvercoding.tistory.com/58 https://silvercoding.tistory.com/57 https://silvercoding.tistory.com/56 https://silvercoding...

silvercoding.tistory.com

 

 


์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ด์ „ ๊ธ€์—์„œ ์ƒ์„ฑํ•œ ํŒŒ์ผ๊ณผ folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ์„œ์šธ์‹œ ๋”ฐ๋ฆ‰์ด ์ง€๋„๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. 

 

 

<folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๊ด€๋ จ ํฌ์ŠคํŒ…> 

https://silvercoding.tistory.com/53?category=965020 

 

[python ์‹œ๊ฐํ™”] 2. ์„œ์šธ์‹œ ๋Œ€ํ”ผ์†Œ ํ˜„ํ™ฉ ์ง€๋„ ๋งŒ๋“ค๊ธฐ , ์ง€๋„ ์‹œ๊ฐํ™” ( folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ )

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot, jointplot, pairplot, boxplot, swarmplot, heatmap) ๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ..

silvercoding.tistory.com


๋ฐ์ดํ„ฐ์…‹ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

import pandas as pd
data = pd.read_excel('./data/bicycle.xlsx')
data.head()

 

 

 

 

 

 


์„œ์šธ์‹œ ๋”ฐ๋ฆ‰์ด ์ง€๋„ ์‹œ๊ฐํ™” 

import folium

 

- ์ง€๋„ ์ƒ์„ฑ 

m = folium.Map(location = ['37.5536067','126.9674308'], zoom_start = 13)   # ์„œ์šธ์—ญ ์ค‘์‹ฌ
m

์„œ์šธ์—ญ์˜ ์œ„๋„, ๊ฒฝ๋„๋ฅผ ์ด์šฉํ•˜์—ฌ ์ง€๋„๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. zoom_start ์ธ์ž๋ฅผ ์ด์šฉํ•˜์—ฌ ํ™•๋Œ€ ์ •๋„๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

- ์ง€๋„์— ํ‘œ์‹œํ•  ๋ฐ์ดํ„ฐ ํ™•์ธ 

for i in range(len(data)):
    name = data.loc[i, 'stationName']
    available = data.loc[i, 'parkingBikeTotCnt']
    total = data.loc[i, 'rackTotCnt']
    lat = data.loc[i, 'stationLatitude']
    long = data.loc[i, 'stationLongitude']
    print(name, available, total, lat, long)

์ง€๋„์ƒ์„ฑ์— ํ•„์š”ํ•œ ์ปฌ๋Ÿผ์˜ ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•˜์—ฌ ์ถœ๋ ฅํ•œ ๊ฒƒ์ด๋‹ค.  ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋งˆ์ปค๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์ง€๋„๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. 

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location = ['37.5536067','126.9674308'], zoom_start = 13)

# ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
for i in range(len(data)):
    lat = data.loc[i, 'stationLatitude']
    long = data.loc[i, 'stationLongitude']
    name = data.loc[i, 'stationName']
    available = int(data.loc[i, 'parkingBikeTotCnt'])
    total = int(data.loc[i, 'rackTotCnt'])
    
    # ์ž์ „๊ฑฐ ์ˆ˜๋Ÿ‰์— ๋Œ€ํ•ด ์ƒ‰์ƒ์œผ๋กœ ํ‘œ์‹œ
    ##  ์ž์ „๊ฑฐ ๋ณด์œ ์œจ์ด 50% ์ดˆ๊ณผ์ผ ๊ฒฝ์šฐ --> ํŒŒ๋ž€์ƒ‰
    ##  ํ˜„์žฌ ์ž์ „๊ฑฐ๊ฐ€ 2๋Œ€ ๋ณด๋‹ค ์ ์„ ๊ฒฝ์šฐ --> ๋นจ๊ฐ„์ƒ‰
    ##  ๊ทธ ์™ธ์˜ ๊ฒฝ์šฐ(์ž์ „๊ฑฐ 2๋Œ€ ์ด์ƒ ์ด๋ฉด์„œ, ์ž์ „๊ฑฐ ๋ณด์œ ์œจ 50% ๋ฏธ๋งŒ) --> ์ดˆ๋ก์ƒ‰
    if available/total > 0.5:
        color = 'blue'
    elif available < 2 :
        color = 'red'
    else:
        color = 'green'
    icon=folium.Icon(color=color, icon='info-sign')
    folium.Marker(location = [lat, long],
                 tooltip = f"{name} : {available}", 
                  icon = icon
             ).add_to(m)
m

ํ˜„์žฌ ์ž์ „๊ฑฐ ์ด์šฉ ํ˜„ํ™ฉ์„ ๋” ์ง๊ด€์ ์œผ๋กœ ๋ณด๊ธฐ ์œ„ํ•ด ์ƒ‰๊น” ์„ค์ •์„ ํ•ด์ค€๋‹ค. ํŒŒ๋ž€์ƒ‰์€ ๋Œ€์—ฌ ๊ฐ€๋Šฅ ์ž์ „๊ฑฐ 50% ์ด์ƒ, ๋นจ๊ฐ„์ƒ‰์€ 2๊ฐœ ๋ฏธ๋งŒ์ผ ๋•Œ, ์ดˆ๋ก์ƒ‰์€ ๊ทธ ์ด์™ธ์˜ ์ค‘๊ฐ„ ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋˜ํ•œ, tooltip์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งˆ์šฐ์Šค๋ฅผ ๊ฐ–๋‹ค๋Œ€๋ฉด ์ž์ „๊ฑฐ ๋Œ€์—ฌ์†Œ ์ด๋ฆ„๊ณผ ์ด์šฉ๊ฐ€๋Šฅํ•œ ์ž์ „๊ฑฐ ์ˆ˜๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. 

 

 

 

 


๋ฌธ์ œ์  : ๋ฐ์ดํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์•„ ์ง€๋„๋ฅผ ๋ณด๋Š” ๋ฐ ์–ด๋ ค์›€์ด ์žˆ๋‹ค. 

ํ•ด๊ฒฐ๋ฐฉ์•ˆ : ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ทผ์ ‘ํ•œ ๋งˆ์ปค๋“ค๋ผ๋ฆฌ ์„œ๋กœ ๋ฌถ์–ด์ค€๋‹ค. 


ํด๋Ÿฌ์Šคํ„ฐ & ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ํ•˜์—ฌ ์ง€๋„ ์‹œ๊ฐํ™” 

from folium.plugins import MiniMap, MarkerCluster
# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m_ver2 = folium.Map(location = ['37.5536067','126.9674308'], zoom_start = 13)

# ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ํ•˜๊ธฐ
minimap = MiniMap() 
m_ver2.add_child(minimap)

# ๋งˆ์ปค ํด๋Ÿฌ์Šคํ„ฐ ๋งŒ๋“ค๊ธฐ
marker_cluster_ver2 = MarkerCluster().add_to(m_ver2)  # ํด๋Ÿฌ์Šคํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ

# ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
for i in range(len(data)):
    lat = data.loc[i, 'stationLatitude']
    long = data.loc[i, 'stationLongitude']
    name = data.loc[i, 'stationName']
    available = int(data.loc[i, 'parkingBikeTotCnt'])
    total = int(data.loc[i, 'rackTotCnt'])
    
    # ์ž์ „๊ฑฐ ์ˆ˜๋Ÿ‰์— ๋Œ€ํ•ด ์ƒ‰์ƒ์œผ๋กœ ํ‘œ์‹œ
    ##  ์ž์ „๊ฑฐ ๋ณด์œ ์œจ์ด 50% ์ดˆ๊ณผ์ผ ๊ฒฝ์šฐ --> ํŒŒ๋ž€์ƒ‰
    ##  ํ˜„์žฌ ์ž์ „๊ฑฐ๊ฐ€ 2๋Œ€ ๋ณด๋‹ค ์ ์„ ๊ฒฝ์šฐ --> ๋นจ๊ฐ„์ƒ‰
    ##  ๊ทธ ์™ธ์˜ ๊ฒฝ์šฐ(์ž์ „๊ฑฐ 2๋Œ€ ์ด์ƒ ์ด๋ฉด์„œ, ์ž์ „๊ฑฐ ๋ณด์œ ์œจ 50% ๋ฏธ๋งŒ) --> ์ดˆ๋ก์ƒ‰
    if available/total > 0.5:
        color = 'blue'
    elif available < 2 :
        color = 'red'
    else:
        color = 'green'
    icon=folium.Icon(color=color, icon='info-sign')
#     print(name, available, total, lat, long)
    folium.Marker(location = [lat, long],
                 tooltip = f"{name} : {available}", 
                  icon = icon
             ).add_to(marker_cluster_ver2)
m_ver2

ํด๋Ÿฌ์Šคํ„ฐ์™€ ๋ฏธ๋‹ˆ๋งต์„ ์ถ”๊ฐ€ํ•œ ์ง€๋„์ด๋‹ค. ์ˆซ์ž๋ฅผ ํด๋ฆญํ•˜๋ฉด ํ•ด๋‹น ์ง€์—ญ์œผ๋กœ ํ™•๋Œ€๋˜์–ด ๋” ํŽธ๋ฆฌํ•˜๊ณ  ์ง๊ด€์ ์ธ ์ง€๋„๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

m_ver2.save('./map/bicycle_clustermap.html')

์ง€๋„๋Š” html๋กœ ์ €์žฅํ•˜์—ฌ ์–ธ์ œ๋“  ๊บผ๋‚ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/59

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 2-3 ์Šน์ฐจ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์ง€ํ•˜์ฒ  ์—ญ ๋ถ„์„

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/58 https://silvercoding.tistory.com/57 https://silvercoding.tistory.com/56 https://silvercoding.tistory.com/55 https://silvercoding...

silvercoding.tistory.com

 

 


 API ์‹ ์ฒญํ•˜๊ธฐ 

1. ํšŒ์›๊ฐ€์ž…, ๋กœ๊ทธ์ธ ํ•˜๊ธฐ 

http://data.seoul.go.kr

 

์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ

๋ชจ๋“  ์„œ์šธ์‹œ๋ฏผ์„ ์œ„ํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ์—์„œ ์„œ์šธ์‹œ์™€ ์—ฐ๊ณ„ ๊ธฐ๊ด€์ด ๊ณต๊ฐœํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธ์‹œ์™€ ๊ด€๋ จ๋œ ๋‹ค์–‘ํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”.

data.seoul.go.kr

 

 

2. ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์ธ์ฆํ‚ค ์‹ ์ฒญ ํด๋ฆญ 

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” ์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ์˜ ๊ณต๊ณต์ž์ „๊ฑฐ ์‹ค์‹œ๊ฐ„ ๋Œ€์—ฌ์ •๋ณด API ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

๋‹ค์Œ์˜ ๋งํฌ์—์„œ API๋ฅผ ์‹ ์ฒญํ•  ์ˆ˜ ์žˆ๋‹ค. 

http://data.seoul.go.kr/dataList/OA-15493/A/1/datasetView.do

์ธ์ฆํ‚ค ์‹ ์ฒญ์„ ํด๋ฆญํ•œ๋‹ค. 

 

 

3. ๊ฐ€์ž… ์‹ ์ฒญ์„œ ์ž‘์„ฑ 

ํ™œ์šฉํ•  ๋ชฉ์ ์— ๋งž๊ฒŒ ์‹ ์ฒญ์„œ๋ฅผ ์ž‘์„ฑํ•ด ์ค€๋‹ค. ํ˜„์žฌ ๋ฐ์ดํ„ฐ๋ถ„์„ ํ•™์Šต์— ์‚ฌ์šฉํ•  ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ url์ด ์—†๋‹ค. ์ด ์ฒ˜๋Ÿผ ์‚ฌ์šฉurl์ด ์—†์„ ๊ฒฝ์šฐ์—๋Š” localhost๋ฅผ ์ž‘์„ฑํ•ด์ค€๋‹ค. 

 

 

4. ์ƒ๋‹จ [๋‚˜์˜ํ™”๋ฉด - ์ธ์ฆํ‚ค ๊ด€๋ฆฌ] 

๋‚˜์˜ํ™”๋ฉด - ์ธ์ฆํ‚ค ๊ด€๋ฆฌ์— ๋“ค์–ด๊ฐ„๋‹ค. 

 

 

5. ์ธ์ฆํ‚ค ๋ณต์‚ฌ 

์ธ์ฆํ‚ค ๋ณต์‚ฌ๋ฅผ ํ•œ๋‹ค. ์ด์ œ API์‹ ์ฒญ์ด ์™„๋ฃŒ๋˜์—ˆ๊ณ , ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€๋ณด์ž. 

 

 

 

 

 

 

 


 ๋ฐ์ดํ„ฐ์…‹ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

1. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ import

import requests  # API๋ฅผ ์ด์šฉํ•ด ์ž๋ฃŒ๋ฅผ ๋ฐ›์•„์˜ค๊ธฐ ์œ„ํ•ด
import pandas as pd   # ์ž๋ฃŒ ์ €์žฅ, ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด

 

 

2. ์„œ์šธํŠน๋ณ„์‹œ ๊ณต๊ณต์ž์ „๊ฑฐ ์‹ค์‹œ๊ฐ„ ๋Œ€์—ฌ์ •๋ณด ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ 

- ํ˜ธ์ถœ

์ถœ์ฒ˜-์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ

ํ•œ๋ฒˆ์— ์ตœ๋Œ€ 1000๊ฑด์„ ํ˜ธ์ถœํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ 2๋ฒˆ์— ๊ฑธ์ณ ํ˜ธ์ถœํ•œ ํ›„, ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ์น˜๋Š” ์ž‘์—…์„ ํ•ด์•ผํ•œ๋‹ค. 

 

 

- ์˜ˆ์‹œ

์ถœ์ฒ˜-์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ

์œ„์˜ ์ƒ˜ํ”Œ URL์„ ์‚ฌ์šฉํ•˜์—ฌ API๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ๋‹ค. (์ธ์ฆํ‚ค) ๋ถ€ํ„ฐ ๊ฐ’์˜ ์˜๋ฏธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 

์ถœ์ฒ˜-์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ

์ธ์ฆํ‚ค, ์š”์ฒญํŒŒ์ผ ํƒ€์ž…, ์„œ๋น„์Šค๋ช…, ์š”์ฒญ์‹œ์ž‘์œ„์น˜, ์š”์ฒญ์ข…๋ฃŒ์œ„์น˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์š”์ฒญ์‹œ์ž‘์œ„์น˜์™€ ์š”์ฒญ์ข…๋ฃŒ์œ„์น˜๋ฅผ ์กฐ์ ˆํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ๋œ๋‹ค. 

 

 

- ์ถœ๋ ฅ๊ฐ’ 

์ถœ์ฒ˜-์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์ปฌ๋Ÿผ์ด ๋  ๋ณ€์ˆ˜๋“ค์ด๋‹ค. ๊ฑฐ์น˜๋Œ€๊ฐœ์ˆ˜, ๋Œ€์—ฌ์†Œ์ด๋ฆ„, ์ž์ „๊ฑฐ์ฃผ์ฐจ์ด๊ฑด์ˆ˜, ๊ฑฐ์น˜์œจ, ์œ„๋„, ๊ฒฝ๋„, ๋Œ€์—ฌ์†ŒID ๋กœ ์ด 7๊ฐœ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. 

 

 

3. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

- ์ธ์ฆํ‚ค ์ž…๋ ฅ 

apikey = ' '  # ๋ฐ›์€ key ๊ฐ’ ์ž…๋ ฅ

๋ณธ์ธ์˜ api ๊ฐ’์„ ๋„ฃ์–ด์ค€๋‹ค. 

 

 

- API ์š”์ฒญํ•˜๊ธฐ 

startnum = 1
endnum = 1000
url1 = f'http://openapi.seoul.go.kr:8088/{apikey}/json/bikeList/{startnum}/{endnum}/'
requests.get(url1)
# requests.get(url1).text
# requests.get(url1).content

requests.get() ์„ ์‚ฌ์šฉํ•˜๋ฉด ์‘๋‹ต์ด ์˜จ๋‹ค. ์ด ์™ธ์—๋„ text, content๋กœ ์•ˆ์˜ ๋‚ด์šฉ์„ ์‹คํ–‰ํ•ด๋ณผ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

- ์ž๋ฃŒ ์š”์ฒญ 

json1 = requests.get(url1).json()
json1

์ด๋ ‡๊ฒŒ json ํ˜•ํƒœ๋กœ ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ 1000๊ฐœ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. 

 

 

- ํ•„์š”ํ•œ ์ •๋ณด(ํ‚ค) ์„ ํƒ 

json1['rentBikeStatus'].keys()

ํ˜„์žฌ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•ด ํ•„์š”ํ•œ ์ •๋ณด๋Š” rentBikeStatus ์•ˆ์— ์žˆ๋Š” row ํ‚ค์˜ ๋ฐ์ดํ„ฐ์ด์ง€๋งŒ,  ๋ชจ๋“  ํ‚ค๋ฅผ ๊บผ๋‚ด๋ณธ๋‹ค.

json1['rentBikeStatus']['list_total_count']  # ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜

json1['rentBikeStatus']['RESULT']  # ์˜ค๋ฅ˜ ์—ฌ๋ถ€

json1['rentBikeStatus']['row']  # ์ž์ „๊ฑฐ ์ •๋ฅ˜์žฅ๋ณ„ ์ž์ „๊ฑฐ ํ˜„ํ™ฉ

 

 

 

- ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋ณ€๊ฒฝ

raw1 = pd.DataFrame(json1['rentBikeStatus']['row'])
raw1.head()

 

 

- 1000 ~ 2000 & 2000 ~ 3000 ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ณ  ํ•ฉ๋ณ‘ํ•˜๊ธฐ 

startnum = 1001
endnum = 2000
url2 = f'http://openapi.seoul.go.kr:8088/{apikey}/json/bikeList/{startnum}/{endnum}/'
json2 = requests.get(url2).json()
raw2 = pd.DataFrame(json2['rentBikeStatus']['row'])
raw2.tail()

data_mid = raw1.append(raw2)
data_mid

 

๋ฐ์ดํ„ฐ๊ฐ€ 2000๊ฐœ๋ฅผ ๋” ๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ ๊ฐ™์œผ๋‹ˆ 2000 ~ 3000 ๋ฐ์ดํ„ฐ๋„ ํ˜ธ์ถœํ•œ ํ›„ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค. 

 

 

startnum = 2001
endnum = 3000
url3 = f'http://openapi.seoul.go.kr:8088/{apikey}/json/bikeList/{startnum}/{endnum}/'
json3 = requests.get(url3).json()
raw3 = pd.DataFrame(json3['rentBikeStatus']['row'])
raw3.tail()

data = data_mid.append(raw3)
data

๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. ์ด 2540๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ธ๋ฑ์Šค๊ฐ€ ์กฐ๊ธˆ ์ด์ƒํ•ด์„œ ์ธ๋ฑ์Šค๋ฅผ ์ดˆ๊ธฐํ™” ํ•ด์ค€ ํ›„, ๋ฐ์ดํ„ฐ๋ฅผ ์—‘์…€ ํŒŒ์ผ๋กœ ์ €์žฅํ•ด์ค€๋‹ค. 

 

 

 

- ์ตœ์ข… ๋ฐ์ดํ„ฐ 

data.reset_index(drop=True)

data.info()

์ด 2450๊ฐœ์˜ row๊ฐ€ ์กด์žฌํ•˜๊ณ , null ๊ฐ’์ด ์—†๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

- ์—‘์…€ํŒŒ์ผ๋กœ ์ €์žฅ 

data.to_excel('./data/bicycle.xlsx', index = False)

์—‘์…€ํŒŒ์ผ๋กœ ์ €์žฅํ•ด์ค€๋‹ค. ๋‹ค์Œ์‹œ๊ฐ„์—๋Š” ์ด๋Ÿฌํ•œ ๊ณผ์ •์œผ๋กœ ๋ถˆ๋Ÿฌ์˜จ ์ž์ „๊ฑฐ ๋ฐ์ดํ„ฐ์…‹์„ folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ์„œ์šธ์‹œ ๋”ฐ๋ฆ‰์ด ํ˜„ํ™ฉ ์ง€๋„๋ฅผ ์ƒ์„ฑํ•ด ๋ณผ ๊ฒƒ์ด๋‹ค. 

 

 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/58

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 2-2 ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ๋‚ ?

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/57 https://silvercoding.tistory.com/56 https://silvercoding.tistory.com/55 https://silvercoding.tistory.com/54 https://silvercoding...

silvercoding.tistory.com

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
raw = pd.read_excel('./data/subway_raw.xlsx')

2-1 ํฌ์ŠคํŒ…์—์„œ ํ•ฉ๋ณ‘ํ•ด ๋†“์€ 2019๋…„ 1์›” - 6์›” ์ง€ํ•˜์ฒ  ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค. 

raw.info()

์ด 99342 ๊ฐœ์˜ row๊ฐ€ ์กด์žฌํ•˜๊ณ  , null ๊ฐ’์ด ์—†๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. 

 

 

 


 ์–ด๋Š ์—ญ์—์„œ , ์–ธ์ œ ์ง€ํ•˜์ฒ ์„ ๊ฐ€์žฅ ๋งŽ์ด ํƒˆ๊นŒ ? 

1. ์Šน๊ฐ์ด ๊ฐ€์žฅ ๋งŽ์ด ํƒ€๋Š” ์—ญ

data_station = raw.pivot_table(index = '์—ญ๋ช…', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc='sum')
data_station = data_station.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending = False)
data_station.head(10)  # ์Šน์ฐจ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์—ญ ์ƒ์œ„ 10๊ฐœ

์—ญ๋ณ„๋กœ ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌํ•œ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์ด๋‹ค. ์ƒ์œ„ 10๊ฐœ ์—ญ์„ ์ถœ๋ ฅํ•˜์˜€๊ณ  , 2019๋…„ ์ƒ๋ฐ˜๊ธฐ ๊ฐ€์žฅ ๋งŽ์€ ์Šน๊ฐ์ˆ˜๊ฐ€ ์žˆ์—ˆ๋˜ ์—ญ์€ ์ž ์‹ค์—ญ์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

2. ์‹œ๊ฐํ™” : ๋…ธ์„  ๋ณ„ (1-9ํ˜ธ์„ ) ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜ ๋น„๊ตํ•ด ๋ณด๊ธฐ


* ํžˆํŠธ๋งต 

  • sns.heatmap(data, annot = True, fmt = '.0f', cmap = "RdBu_r")
    • annot : True ์ผ๊ฒฝ์šฐ ๊ฐ’์„ ๊ทธ๋ž˜ํ”„์— ํ‘œ์‹œ
    • fmt : ๊ฐ’ ํ‘œ์‹œ ํ˜•ํƒœ.
      • ex) 'f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„(default ๋กœ ๊ฐ’์ด ์žˆ๋Š” ์†Œ์ˆ˜ ์ž๋ฆฌ๊นŒ์ง€ ํ‘œ์‹œ๋จ)
      • ex) '.0f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„ํ•ด๋‹ฌ๋ผ (์†Œ์ˆ˜ 0๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€๋งŒ == ์ •์ˆ˜์ž๋ฆฌ๋งŒ )
      • ex) '.1f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„ํ•ด๋‹ฌ๋ผ (์†Œ์ˆ˜ 1๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€๋งŒ)
      • ex) .1% ๋Š” ํผ์„ผํŠธ(์†Œ์ˆ˜ ์ฒซ๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€ ํ‘œ์‹œ)
    • cmap : ์ƒ‰์ƒ ์ฐจํŠธ. _r ์œผ๋กœ ๋๋‚˜๋Š” ์ฐจํŠธ๋Š” ์ƒ‰์ƒ ๋ฐฉํ–ฅ ๋ฐ˜๋Œ€๋กœ ๋˜์–ด์žˆ๋Š” ๋ฒ„์ „์ž„(์•„๋ž˜ ์ปฌ๋Ÿฌ ๋ฆฌ์ŠคํŠธ ์ฐธ๊ณ )

* cmap ์ข…๋ฅ˜

Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, icefire, icefire_r, inferno, inferno_r, jet, jet_r, magma, magma_r, mako, mako_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, rocket, rocket_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, twilight, twilight_r, twilight_shifted, twilight_shifted_r, viridis, viridis_r, vlag, vlag_r, winter, winter_r


- 1ํ˜ธ์„ ๋งŒ ์‹œ๊ฐํ™” ํ•ด๋ณด๊ธฐ 

line = '1ํ˜ธ์„ '
data_line = raw[raw['๋…ธ์„ ๋ช…'] == line]

# ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”: ๋…ธ์„ ์˜ ์—ญ ์ˆœ์„œ์— ๋งž์ถฐ ์ •๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์—ญID๋„ ์ธ๋ฑ์Šค์— ํฌํ•จ
df_pivot = data_line.pivot_table(index = ['์—ญID', '์—ญ๋ช…'], columns = '์š”์ผ', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜',aggfunc = 'sum') 
df_pivot = df_pivot[['์›”','ํ™”','์ˆ˜','๋ชฉ','๊ธˆ','ํ† ','์ผ']]   # ์ปฌ๋Ÿผ ์ˆœ์„œ๋ฅผ ์š”์ผ์— ๋งž๊ฒŒ ์ •๋ฆฌ
df_pivot = df_pivot / 10000  # ๋งŒ๋ช…๋‹จ์œ„๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ „์ฒด๋ฅผ 1๋งŒ์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ
df_pivot

์—ญ๋ณ„ ์š”์ผ๋ณ„ ์Šน์ฐจ์ด๊ฐ์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•œ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์ด๋‹ค. ์š”์ผ์ด ๋’ค์ฃฝ๋ฐ•์ฃฝ ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์‹œ ์„ ํƒํ•˜์—ฌ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ฆฌํ•ด ์ค€๋‹ค. 

import matplotlib.pyplot as plt
import seaborn as sns 
from matplotlib import font_manager, rc
import platform 

# ํ•œ๊ธ€ ํฐํŠธ ์‚ฌ์šฉ
if platform.system() == 'Windows': 
    path = 'c:/Windows/Fonts/malgun.ttf'
    font_name = font_manager.FontProperties(fname=path).get_name()
    rc('font', family=font_name)
elif platform.system() == 'Darwin':
    rc('font', family='AppleGothic')
fig, ax = plt.subplots( figsize=(6,5) )   # ๊ทธ๋ž˜ํ”„ ์‚ฌ์ด์ฆˆ ์ง€์ •
plt.title(f"{line} ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜", fontsize = 20) # for title
sns.heatmap(df_pivot, cmap = "Reds", 
           annot = True, fmt = '.0f')

1ํ˜ธ์„ ์˜ ์—ญ๋ณ„ ์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜ ํžˆํŠธ๋งต๋‹ˆ๋‹ค. ์„œ์šธ์—ญ๊ณผ ์ข…๊ฐ์—ญ์˜ ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์•„๋ณด์ด๊ณ , ๊ทธ์ค‘์—์„œ๋„ ์„œ์šธ์—ญ์˜ ๊ธˆ์š”์ผ์— ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

๋™์ผํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋“  ๋…ธ์„ ์„ ์‹œ๊ฐํ™”ํ•˜์—ฌ ๋น„๊ต ํ•ด ๋ณด์ž. 

 

- 1ํ˜ธ์„ ~9ํ˜ธ์„  ์‹œ๊ฐํ™” ํ•˜์—ฌ ๋น„๊ต ํ•ด๋ณด๊ธฐ 

raw['๋…ธ์„ ๋ช…'].unique()

์ด๋ ‡๊ฒŒ ๋งŽ์€ ๋…ธ์„ ์ด ์žˆ๋Š”๋ฐ , ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” 1ํ˜ธ์„  ~ 9ํ˜ธ์„  ๋งŒ์„ ์‹œ๊ฐํ™” ํ•œ๋‹ค. 

line_seoul_list = [ ]
for line in raw['๋…ธ์„ ๋ช…'].unique():
    if line[1:] == 'ํ˜ธ์„ ':    # xํ˜ธ์„  ์ธ ๊ฒฝ์šฐ๋ฅผ ์„ ํƒ. 
        line_seoul_list.append(line)
line_seoul_list

for line in sorted(line_seoul_list):
    
    # ๋ฐ์ดํ„ฐ ์ •๋ฆฌํ•˜๊ธฐ
    data_line = raw[raw['๋…ธ์„ ๋ช…'] == line]
    df_pivot = data_line.pivot_table(index = ['์—ญID', '์—ญ๋ช…'], columns = '์š”์ผ', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜',aggfunc = 'sum')
    df_pivot = df_pivot[['์›”','ํ™”','์ˆ˜','๋ชฉ','๊ธˆ','ํ† ','์ผ']]
    df_pivot = df_pivot / 10000  # ๋งŒ๋ช…๋‹จ์œ„๋กœ ์ˆ˜์ •
    
    
    # ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
    fig, ax = plt.subplots( figsize=(6,len(df_pivot)/3 ) )   # ๊ทธ๋ž˜ํ”„ ์‚ฌ์ด์ฆˆ๋ฅผ ์กฐ์ •ํ•˜์—ฌ, ์—ญ ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ๋Š” ์„ธ๋กœ๋ฅผ ๊ธธ๊ฒŒ ํ‘œํ˜„
    plt.title(f"{line} ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜", fontsize = 20) # for title
    sns.heatmap(df_pivot, cmap = "Reds", 
               annot = True, fmt = '.0f')

 

์ง„ํ•œ ๋นจ๊ฐ„์ƒ‰์ผ ์ˆ˜๋ก ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ์—ญ์€ ๋ชจ๋“  ์š”์ผ์ด ๋Œ€์ฒด์ ์œผ๋กœ ์ƒ‰์ด ์ง„ํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด์ „ ํฌ์ŠคํŒ…์—์„œ ์ „์ฒด์ ์œผ๋กœ ๊ธˆ์š”์ผ์— ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๊ณ  , ์ฃผ๋ง์ด ๋˜๋ฉด ์Šน๊ฐ์ˆ˜๊ฐ€ ๋–จ์–ด์ง€๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์—ˆ๋Š”๋ฐ , ์—ญ ๋ณ„๋กœ ๋ณด๋‹ˆ ์ฃผ๋ง์ด ๋” ๋งŽ์€ ์—ญ๋„ ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. (  ex, ์ดํƒœ์›, ๊ณ ์†ํ„ฐ๋ฏธ๋„, ํ™๋Œ€์ž…๊ตฌ )  

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/57

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 2-1 pandas๋กœ ์—ฌ๋Ÿฌ csv ํŒŒ์ผ ํ•ฉ์น˜๊ธฐ

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/56 https://silvercoding.tistory.com/55 https://silvercoding.tistory.com/54 https://silvercoding.tistory.com/53 https://silvercoding...

silvercoding.tistory.com

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
raw = pd.read_excel('./data/subway_raw.xlsx')
raw.head()

์ด์ „ ๊ธ€์—์„œ ๋ณ‘ํ•ฉํ–ˆ๋˜ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์™€ ์ค€๋‹ค. 

raw.info()

์ด 99342๊ฐœ์˜ row๊ฐ€ ์กด์žฌํ•˜๊ณ  ,  null๊ฐ’์ด ์—†๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

 


 ์Šน๊ฐ์ด ๋งŽ์•„์ง€๋Š” ๋‚ ์€ ์–ธ์ œ์ธ๊ฐ€ ? 

- ์ผ์ž , ์š”์ผ ๋ณ„ ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜ ํ™•์ธ ํ•ด ๋ณด๊ธฐ 

data_date = pd.pivot_table(raw, index = ['์‚ฌ์šฉ์ผ์ž', '์š”์ผ'], values= '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc= 'sum')
data_date.head()

 

- ์ผ์ž , ์š”์ผ ๋ณ„ ์Šน์ฐจ์ด์Šน๊ฐ ์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ 

data_date_sort = data_date.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending= False)
data_date_sort

 

์ผ์ž, ์š”์ผ ๋ณ„ ์Šน์ฐจ์Šน๊ฐ์ˆ˜๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ ํ•ด ๋ณด์•˜๋‹ค. ์ด ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ํ™•์ธํ•ด ๋ณด๋ฉด ๋‘๊ฐ€์ง€ ๊ฐ€์ •์„ ์„ธ์šธ ์ˆ˜ ์žˆ๋‹ค. 

 

1. 5์›”์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค. 

2. ๊ธˆ์š”์ผ์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค. 

 

 

(1) 5์›”์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค 

- '์—ฐ์›”' ์ปฌ๋Ÿผ , '์›”์ผ' ์ปฌ๋Ÿผ ์ถ”๊ฐ€ 

yearmonth_list = []
monthday_list = []
for date in raw['์‚ฌ์šฉ์ผ์ž']:
    yearmonth = str(date)[:6]   # ์™ผ์ชฝ๋ถ€ํ„ฐ 6์ž๋ฆฌ ๋ฌธ์ž ์„ ํƒ
    yearmonth_list.append(yearmonth)
    monthday = str(date)[4:]    # ์™ผ์ชฝ์—์„œ 5๋ฒˆ์งธ ๋ฌธ์ž๋ถ€ํ„ฐ ๋๊นŒ์ง€ ์„ ํƒ
    monthday_list.append(monthday)
    
# ์—ฐ์›”/ ์›”์ผ ์ปฌ๋Ÿผ ์ถ”๊ฐ€ํ•˜๊ธฐ
raw['์—ฐ์›”'] = yearmonth_list
raw['์›”์ผ'] = monthday_list
raw.head()

์›” ๋ณ„ ์Šน๊ฐ์ˆ˜๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ์ผ์ž๋ฅผ ์—ฐ์›”, ์›”์ผ๋กœ ๋‚˜๋ˆ„์–ด ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. 

data_month = pd.pivot_table(raw, index = '์—ฐ์›”', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc='sum')
data_month = data_month.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending= False)
data_month

์›”๋ณ„ ์Šน์ฐจ์ด๊ฐ์ˆ˜ ํ”ผ๋ฒ—๋ฐ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ณ  , ์Šน์ฐจ์ข…์Šน๊ฐ์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์„ ํ•ด๋ณด๋‹ˆ , 5์›”์— ๊ฐ€์žฅ ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

๊ฒฐ๋ก  -> 5์›”์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค : True

 

 

 

(2) ๊ธˆ์š”์ผ์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค 

data_week = pd.pivot_table(raw, index = '์š”์ผ', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc='sum')
data_week = data_week.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending= False)
data_week

์š”์ผ๋ณ„ ์Šน์ฐจ์ด๊ฐ์ˆ˜ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ณ  , ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์„ ํ•ด๋ณด๋‹ˆ ๊ธˆ์š”์ผ์— ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

๊ฒฐ๋ก  -> ๊ธˆ์š”์ผ์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค : True

 

 

 

 

 


* ์›”๋ณ„ / ์ผ์ž๋ณ„ ์Šน๊ฐ์ˆ˜ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™” 

- 2019๋…„ 1์›” ๋ฐ์ดํ„ฐ๋กœ ํ…Œ์ŠคํŠธ ํ•ด๋ณด๊ธฐ 

df_selected = raw[ raw['์—ฐ์›”'] == '201901']
df_selected.head()

df_pivot = pd.pivot_table(df_selected, index = ['์›”์ผ','์š”์ผ'], values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc= 'sum')
df_pivot = df_pivot.reset_index()
df_pivot

2019๋…„ 1์›” ๋‚ ์งœ๋ณ„ ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•˜๋Š” ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋ฅผ pointplot์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐํ™” ํ•ด๋ณด์ž. 

import matplotlib.pyplot as plt
import seaborn as sns 
from matplotlib import font_manager, rc
import platform 

# ํ•œ๊ธ€ ํฐํŠธ ์‚ฌ์šฉ
if platform.system() == 'Windows': 
    path = 'c:/Windows/Fonts/malgun.ttf'
    font_name = font_manager.FontProperties(fname=path).get_name()
    rc('font', family=font_name)
elif platform.system() == 'Darwin':
    rc('font', family='AppleGothic')
fig, ax = plt.subplots( figsize=(20,6) )
# ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
sns.pointplot(data = df_pivot, x = '์›”์ผ', y = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜')

fig, ax = plt.subplots( figsize=(20,6) )
# ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
sns.pointplot(data = df_pivot, x = '์š”์ผ', y = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜')

1์›” ๊ทธ๋ž˜ํ”„๋งŒ ๊ทธ๋ ค๋ณด์•˜๋”๋‹ˆ , ๊ธˆ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์€ ์Šน๊ฐ์ˆ˜๋ฅผ ์ฐ๊ณ , ๊ทธ ์ดํ›„๋กœ ์ค„์–ด๋“ค์–ด ์ผ์š”์ผ์—๋Š” ์Šน๊ฐ์ˆ˜๊ฐ€ ๋Œ€ํญ ์ค„์–ด๋“œ๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. ๋งค ๋‹ฌ๋งˆ๋‹ค ํŽธ์ฐจ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด 1-6์›”์˜ ๋‚ ์งœ๋ณ„ ์Šน๊ฐ์ˆ˜ ์‹œ๊ฐํ™”๋ฅผ ํ•ด๋ณธ๋‹ค. 

 

 

 

 

- ์ƒ๋ฐ˜๊ธฐ ๋‚ ์งœ๋ณ„ ์Šน๊ฐ์ˆ˜ ์‹œ๊ฐํ™” 

raw['์—ฐ์›”'].unique()

for yearmonth in raw['์—ฐ์›”'].unique():
    df_selected = raw[ raw['์—ฐ์›”'] == yearmonth]  # ํ•ด๋‹น ์—ฐ์›” ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ
    df_pivot = pd.pivot_table(df_selected, index = ['์›”์ผ','์š”์ผ'], values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc= 'sum')# ์ผ์ž๋ณ„ ์Šน๊ฐ์ˆ˜ ๊ณ„
    df_pivot = df_pivot.reset_index()
    
    fig, ax = plt.subplots( figsize=(20,6) )
    
    ax.set_title(f'์ผ์ž๋ณ„ ์ง€ํ•˜์ฒ ์Šน๊ฐ์ˆ˜({yearmonth})')  # ๊ทธ๋ž˜ํ”„ ์ œ๋ชฉ ์ถ”๊ฐ€ํ•˜๊ธฐ
    sns.pointplot(data = df_pivot, x = '์›”์ผ', y = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜')

 

ํ•ด์„ :  ์ „์ฒด์ ์œผ๋กœ ํ˜•ํƒœ๊ฐ€ ๋น„์Šทํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๊ธˆ์š”์ผ ์ดํ›„ ์ฃผ๋ง์—๋Š” ์Šน๊ฐ์ˆ˜๊ฐ€ ๋–จ์–ด์ง€๊ณ  , ์ผ์š”์ผ์— ๋Œ€ํญ ํ•˜๋ฝํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. 2์›”๋‹ฌ์„ ๋ณด๋ฉด ์„ค์—ฐํœด๋กœ ์ธํ•ด์„œ 2์›” ์ดˆ ์ฃผ์ค‘์—๋„ ์Šน๊ฐ์ˆ˜์˜ ์ˆ˜๊ฐ€ ๋‚ฎ์œผ๋ฉฐ , 6์›” 6์ผ ํ˜„์ถฉ์ผ์—๋„ ์Šน๊ฐ์ˆ˜๊ฐ€ ๋–จ์–ด์ง€๋Š” ํ˜„์ƒ ๋˜ํ•œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๋Š” ๊ณตํœด์ผ์˜ ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/56

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 1-3. ํŠน์ • ํŒ€์— ๊ฐ•ํ•œ ์•ผ๊ตฌ์„ ์ˆ˜ ๋ถ„์„ํ•˜๊ธฐ

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/55 https://silvercoding.tistory.com/54 https://silvercoding.tistory.com/53 https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. se..

silvercoding.tistory.com

 

 


import pandas as pd

* pandas๋กœ ํŒŒ์ผ ์ฝ์–ด์˜ค๊ธฐ 

์ถœ์ฒ˜ - ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ

< ๊ด€๋ จ ์ฐธ๊ณ  ๋ฌธ์„œ > 

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

 

pandas.read_excel — pandas 1.3.1 documentation

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IN

pandas.pydata.org

 

 

 

 


 pandas ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—‘์…€ ํŒŒ์ผ ๋ณ‘ํ•ฉํ•˜๊ธฐ 

(1) ์—‘์…€ํŒŒ์ผ ํ•œ๊ฐœ ๋ถˆ๋Ÿฌ์™€์„œ ์‚ดํŽด๋ณด๊ธฐ 

* ํŒŒ์ผ : ์ผ์ž / ๋…ธ์„  / ์ง€ํ•˜์ฒ ์—ญ ๋ณ„ ์Šนํ•˜์ฐจ ๊ณ ๊ฐ ์ˆ˜ ( 2019๋…„ ์ƒ๋ฐ˜๊ธฐ - 1์›” ~ 6์›”, ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ œ๊ณต ) 

file = './rawfiles/CARD_SUBWAY_MONTH_201901.csv'
raw = pd.read_csv(file)
raw.head()

raw.info()

์ด 18334 ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๊ณ , ๊ฒฐ์ธก๊ฐ’์€ ์—†๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

(2) ๋‘๊ฐœ ์—‘์…€ํŒŒ์ผ ํ•ฉ์ณ๋ณด๊ธฐ  

raw = pd.DataFrame()  # ๋นˆ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ƒ์„ฑ 
raw.head()

# ์ฒซ ๋ฒˆ์งธ ํŒŒ์ผ
file = './rawfiles/CARD_SUBWAY_MONTH_201901.csv'
temp = pd.read_csv(file)
# temp.head()
raw = raw.append(temp)  # ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ถ”๊ฐ€ 
# raw.head()

# ๋‘ ๋ฒˆ์งธ ํŒŒ์ผ
file = './rawfiles/CARD_SUBWAY_MONTH_201902.csv'
temp = pd.read_csv(file)
raw = raw.append(temp)  # ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ถ”๊ฐ€ 

raw.info()

์šฐ์„  , ๋นˆ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•œ ํ›„, ํŒŒ์ผ์„ ์ฝ์–ด ์™€์„œ append๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋นˆ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค. ์—ฌ๋Ÿฌ๊ฐœ์˜ ํŒŒ์ผ์„ ํ•ฉ์น˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฐ˜๋ณต๋˜๋Š” ์ž‘์—…์„ for๋ฌธ์œผ๋กœ ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋œ๋‹ค. 

 

 

 

(3) ํด๋”์— ์žˆ๋Š” ๋ชจ๋“  ์—‘์…€ํŒŒ์ผ ๋ณ‘ํ•ฉํ•˜๊ธฐ

- ํด๋” ์•ˆ์˜ ํŒŒ์ผ ์ด๋ฆ„ ๊ฐ€์ ธ์˜ค๊ธฐ 

# ํด๋” , ํŒŒ์ผ์„ ๊ด€๋ฆฌํ•˜๋Š” os ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
import os
os.listdir()

os.listdir() ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ˜„์žฌ ํด๋”์— ์žˆ๋Š” ํด๋”์™€ ํŒŒ์ผ๋“ค์˜ ์ด๋ฆ„์„ ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. 

dirpath = './rawfiles/'
files = os.listdir(dirpath)
files

๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณธ ํฌ์ŠคํŒ…์—์„œ ์‚ฌ์šฉํ•  ํ˜„์žฌ ํด๋” ์•ˆ์˜ rawfiles์•ˆ์˜ ํŒŒ์ผ๋“ค์„ ๋ถˆ๋Ÿฌ ์˜จ๋‹ค. 

 

์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ผ์ผ์ด ํŒŒ์ผ ์ด๋ฆ„์„ ๋ณต๋ถ™ํ•˜์ง€ ์•Š์•„๋„ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. 

 

- ๋ณ‘ํ•ฉํ•˜๊ธฐ 

# ๋นˆ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ค€๋น„
raw = pd.DataFrame() 

# ๋ณ‘ํ•ฉ
for file in os.listdir('./rawfiles'):
#     print(file)
    fpath = './rawfiles/'+file
    print(fpath)
    temp = pd.read_csv(fpath)
    raw = raw.append(temp, ignore_index = True)   #ignore_index = True  --> ๊ธฐ์กด ์ธ๋ฑ์Šค๋Š” ๋ฌด์‹œํ•˜๋ผ.

์œ„์™€ ๊ฐ™์ด ๋นˆ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์— ํŒŒ์ผ์„ ํ•˜๋‚˜์”ฉ ์ถ”๊ฐ€ํ•œ๋‹ค. ignore_index=True๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์กด ์ธ๋ฑ์Šค๋ฅผ ๋ฌด์‹œํ•˜์ง€ ์•Š์œผ๋ฉด ์›๋ž˜ ์ž์‹ ์˜ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๋ฑ์Šค๊ฐ€ ๋’ค์ฃฝ๋ฐ•์ฃฝ์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ๊ฒŒ๋œ๋‹ค. ์ˆœ์„œ๋Œ€๋กœ ํ•ด์ฃผ๊ธฐ ์œ„ํ•จ์ด๋‹ค. 

 

 

 


 ๋ณ‘ํ•ฉํ•œ ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ 
raw.head()

raw.tail()

raw.info()

์ด 99342 ๊ฐœ์˜ row๋กœ , raw.tail()์„ ํ–ˆ์„ ๋•Œ index์˜ ๊ฒฐ๊ณผ์™€ ๊ฐ™์œผ๋ฏ€๋กœ ์ •์ƒ์ ์œผ๋กœ ๋ณ‘ํ•ฉ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 


 '์š”์ผ' ์ปฌ๋Ÿผ ์ถ”๊ฐ€ ํ•ด๋ณด๊ธฐ - datetime ์‚ฌ์šฉ 
from datetime import datetime

์ถœ์ฒ˜ - ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ

 

- datetime.strptime()

date_str = str(20190601)     # ์ˆซ์ž๊ฐ€ ์•„๋‹Œ ๋ฌธ์ž๋กœ ์ž…๋ ฅ๋˜์–ด์•ผ ํ•จ
date = datetime.strptime(date_str, "%Y%m%d")
date

datetime.datetime(2019, 6, 1, 0, 0)

# ์›”์š”์ผ : 0 ~ ์ผ์š”์ผ : 6
weekday = date.weekday()
weekday

5

์ง์ ‘ ๋‹ฌ๋ ฅ์„ ์ฐพ์•„๋ณด์ง€ ์•Š๊ณ ๋„ , datetime ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ด๋ ‡๊ฒŒ ์š”์ผ์„ ๋ฐ˜ํ™˜ํ•ด ์ค€๋‹ค. 

 

- '์š”์ผ' ์ปฌ๋Ÿผ ์ถ”๊ฐ€ 

weekday_dict = [ '์›”','ํ™”','์ˆ˜','๋ชฉ','๊ธˆ','ํ† ','์ผ']
weekday_list = []

for date_str in raw['์‚ฌ์šฉ์ผ์ž']:
    date = datetime.strptime(str(date_str), "%Y%m%d")
    weekday_index  = date.weekday()
    weekday = weekday_dict[weekday_index]
    weekday_list.append(weekday)

weekday_list์— ๊ฐ row์˜ ์š”์ผ์„ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค. 

raw['์š”์ผ'] = weekday_list
raw.sample(5)

'์š”์ผ' ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด ์ฃผ๊ณ  , sample์„ ์ด์šฉํ•˜์—ฌ ๋žœ๋ค์œผ๋กœ row๋ฅผ ๋ถˆ๋Ÿฌ์™€์„œ ํ™•์ธํ•ด ๋ณธ๋‹ค. 

raw.columns

new_columns = ['์‚ฌ์šฉ์ผ์ž',  '์š”์ผ', '๋…ธ์„ ๋ช…', '์—ญID', '์—ญ๋ช…', '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', 'ํ•˜์ฐจ์ด์Šน๊ฐ์ˆ˜', '๋“ฑ๋ก์ผ์ž']
raw = raw[ new_columns ]
raw.head()

์š”์ผ์„ ์•ž์— ๋‘๊ธฐ ์œ„ํ•ด ์ปฌ๋Ÿผ์˜ ์œ„์น˜๋ฅผ ๋ฐ”๊พธ์–ด ์„ ํƒํ•ด์ค€๋‹ค. 

 

 

 

 


 ๋ฐ์ดํ„ฐ ์ €์žฅ 
raw.to_excel('./data/subway_raw.xlsx', index = False)

to_excel() ์„ ์ด์šฉํ•˜์—ฌ ์—‘์…€ ํŒŒ์ผ๋กœ ์ €์žฅํ•ด ์ค€๋‹ค. 


 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/55

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 1-2. ์•ผ๊ตฌ์„ ์ˆ˜๊ฐ€ ๊ฐ•ํ•ด์ง€๋Š” ๊ณ„์ ˆ์ด ์žˆ์„๊นŒ?

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/54 https://silvercoding.tistory.com/53 https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot,..

silvercoding.tistory.com

 

 


 ํŠน์ • ํŒ€์— ๊ฐ•ํ•œ ์„ ์ˆ˜๊ฐ€ ์žˆ์„๊นŒ? 

* idea : '์ƒ๋Œ€' ์ปฌ๋Ÿผ์œผ๋กœ ๊ฒจ๋ฃจ์—ˆ๋˜ ์ƒ๋Œ€ ํŒ€์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐ๊ฐ์˜ ์ƒ๋Œ€ ํŒ€๊ณผ์˜ ๊ฒฝ๊ธฐ์—์„œ์˜ ์ถœ๋ฃจ์œจ์„ ๊ณ„์‚ฐํ•œ๋‹ค.  

 

- ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

import pandas as pd

file  = './data/KBO_2019_player_gamestats.csv'
raw = pd.read_csv(file, encoding = 'cp949')
raw.head()

 

 

- ์ƒ๋Œ€ ํŒ€๋ณ„ ๊ธฐ๋ก ์ •๋ฆฌ 

raw['์ƒ๋Œ€'].unique()

์šฐ์„  unique() ๋ฅผ ์ด์šฉํ•˜์—ฌ '์ƒ๋Œ€' ์ปฌ๋Ÿผ์— ์žˆ๋Š” ๊ฐ’๋“ค์„ ํ™•์ธํ•ด ์ค€๋‹ค. ์•ž์— @๊ฐ€ ๋ถ™์€ ๊ฒฝ์šฐ๋Š” ์›์ •๊ฒฝ๊ธฐ, ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ํ™ˆ ๊ฒฝ๊ธฐ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด๋ฅผ 'ํ™ˆ์–ด์›จ์ด' ๋ผ๋Š” ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ตฌ๋ถ„ํ•˜๋„๋ก ํ•˜๊ณ  , '์ƒ๋Œ€ํŒ€' ์ปฌ๋Ÿผ์—๋Š” ํŒ€๋ช…๋งŒ์„ ๋„ฃ์–ด์ค€๋‹ค.  

opp_list = [ ]
home_away_list = [ ]

for opp in raw['์ƒ๋Œ€']:
    if "@" in opp:
        home_away = '์›์ •'
        opp = opp.replace('@', '')
    else:
        home_away = 'ํ™ˆ'
    home_away_list.append(home_away)
    opp_list.append(opp)

raw['ํ™ˆ์–ด์›จ์ด'] = home_away_list
raw['์ƒ๋Œ€ํŒ€'] = opp_list
raw.head()

for๋ฌธ์„ ์ด์šฉํ•˜์—ฌ ํ™ˆ์–ด์›จ์ด์™€ ์ƒ๋Œ€ํŒ€์„ ๊ตฌ๋ถ„ํ•˜๊ณ  ,  ๋‘๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด ์ฃผ์—ˆ๋‹ค. 

factors = ['ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ', '๋ฃจํƒ€', 'ํƒ€์ ','๋ณผ๋„ท', '์‚ฌ๊ตฌ', 'ํฌ๋น„']
data = raw.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ', '์ƒ๋Œ€ํŒ€'],
                      values = factors,
                       aggfunc = 'sum')
data.head()

์„ ์ˆ˜๋“ค์˜ ์ƒํƒœํŒ€ ๋ณ„ ์‹ค์ ์„ ์ง‘๊ณ„ํ•˜๊ธฐ ์œ„ํ•ด ํ”ผ๋ฒ— ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•œ๋‹ค. 

cond = data['ํƒ€์ˆ˜'] > 0 
data = data[ cond ]
data.head()

ํƒ€์ˆ˜๊ฐ€ ์—†๋Š” ์„ ์ˆ˜๋“ค์€ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ œ์™ธ์‹œํ‚จ๋‹ค. 

data = data.reset_index()
data.head()

reset_index๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ index๋ฅผ ๋ชจ๋‘ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

 

 

 

 

- ํƒ€์ž vs ์ƒ๋Œ€ํŒ€ ๋ณ„ ์‹ค์  ๊ณ„์‚ฐ 

def cal_hit(df):
    '''
    - ํƒ€์œจ : ๊ณต์„ ์ณ์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ --> ์•ˆํƒ€ / ํƒ€์ˆ˜
    - ์ถœ๋ฃจ์œจ: ์ง„๋ฃจํ•ด์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ -->  (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ)/(ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ+ํฌ์ƒํ”Œ๋ผ์ด)
    - ์žฅํƒ€์œจ : ํƒ€์œจ์— ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ๊ฐ€์ค‘์น˜ ์ถ”๊ฐ€ -->   ๋ฃจํƒ€ / ํƒ€์ˆ˜
    '''
    
    df['ํƒ€์œจ'] = df['์•ˆํƒ€'] / df['ํƒ€์ˆ˜']
    df['์ถœ๋ฃจ์œจ'] = (df['์•ˆํƒ€'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ']) / (df['ํƒ€์ˆ˜'] + df['์‚ฌ๊ตฌ'] + df['ํฌ๋น„'])
    df['์žฅํƒ€์œจ'] = df['๋ฃจํƒ€'] / df['ํƒ€์ˆ˜']
    df['OPS'] = df['์ถœ๋ฃจ์œจ'] + df['์žฅํƒ€์œจ']
    return df

์ด์ „ ๊ธ€์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ ํ•จ์ˆ˜ , ์‹ค์  ๊ณ„์‚ฐ์„ ํ•˜๊ณ  ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค. 

player_stats_opp = cal_hit(data)
player_stats_opp

 

 

 

 

- ๊ฒฐ๊ณผ ๋ณด๊ธฐ : DataFrame 

(1) ๋‘์‚ฐ์— ๊ฐ•ํ•œ ์„ ์ˆ˜ ? - ์ƒ์œ„ 10๋ช…

team = '๋‘์‚ฐ'
cond = (player_stats_opp['์ƒ๋Œ€ํŒ€'] == team) & (player_stats_opp['ํƒ€์ˆ˜'] > 10)
player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(10)

์ƒ๋Œ€ํŒ€์ด ๋‘์‚ฐ์ด๋ฉด์„œ , ํƒ€์ˆ˜๊ฐ€ 20๋ณด๋‹ค ํฐ ์„ ์ˆ˜๋“ค์„ ๋ฝ‘์•„ ๋‚ด๊ณ , ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ 10๋ช…์˜ ์„ ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•ด ๋ณธ๋‹ค. 

 

๋”ฐ๋ผ์„œ ์ƒ๋Œ€ํŒ€์ด '๋‘์‚ฐ' ์ผ ๋•Œ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 10๋ช…์˜ ์ด๋ฆ„์„ ๋ฝ‘์•„๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False)['์ด๋ฆ„'].head(10)

 

(2) ๋กฏ๋ฐ์— ๊ฐ•ํ•œ ์„ ์ˆ˜ ? - ์ƒ์œ„ 10๋ช…

team = '๋กฏ๋ฐ'
cond = (player_stats_opp['์ƒ๋Œ€ํŒ€'] == team) & (player_stats_opp['ํƒ€์ˆ˜'] > 20)
player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(10)

๋™์ผํ•˜๊ฒŒ ์‹œํ–‰ ํ•ด ์ค€๋‹ค. 

 

์ƒ๋Œ€ํŒ€์ด '๋‘์‚ฐ' ์ผ ๋•Œ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 10๋ช…์˜ ์ด๋ฆ„์„ ๋ฝ‘์•„๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False)['์ด๋ฆ„'].head(10)

 

 

(3) KBO ์ „์ฒด ํŒ€์„ ์ƒ๋Œ€๋กœ ํŒ€ ๋ณ„ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 5์ธ ํƒ€์ž๋“ค ํ™•์ธํ•ด ๋ณด๊ธฐ 

hitter_df = pd.DataFrame()

for team in player_stats_opp['์ƒ๋Œ€ํŒ€'].unique():
    print(team)
    cond = (player_stats_opp['์ƒ๋Œ€ํŒ€'] == team) & (player_stats_opp['ํƒ€์ˆ˜'] > 20)
    df = player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(5)
    hitter_df = hitter_df.append(df)

hitter_df

ํŠน์ • ํŒ€ ์ƒ๋Œ€ ์ถœ๋ฃจ์œจ Top5 ์•ˆ์— ๋“ค์–ด ์žˆ๋Š” ํƒ€์ž ๋ฆฌ์ŠคํŠธ (์ค‘๋ณต ์ œ๊ฑฐ) 

hitter_df['์ด๋ฆ„'].unique()

 

 

 

- ๊ฒฐ๊ณผ๋ณด๊ธฐ : Heatmap (์‹œ๊ฐํ™”) 

cond = player_stats_opp['์ด๋ฆ„'].isin(hitter_df['์ด๋ฆ„'].unique())
top_df = player_stats_opp[cond]
top_pivot = top_df.pivot_table(index = ['ํŒ€','์ด๋ฆ„'], values = '์ถœ๋ฃจ์œจ', columns = '์ƒ๋Œ€ํŒ€', aggfunc = 'sum')
top_pivot

์œ„์—์„œ ๋งŒ๋“ค์–ด ๋†“์•˜๋˜ ํŠน์ • ํŒ€์„ ์ƒ๋Œ€๋กœ ์ถœ๋ฃจ์œจ top5 ์•ˆ์— ๋“ค์—ˆ๋˜ ์ด๋ฆ„๋“ค๋งŒ player_stats_opp์—์„œ ๋ฝ‘์•„์˜จ ํ›„ , ํ•ด๋‹น ์„ ์ˆ˜๋“ค์˜ ์ƒ๋Œ€ํŒ€ ๋ณ„ ์ถœ๋ฃจ์œจ pivot_table์„ ์ƒ์„ฑํ•œ๋‹ค.  

import matplotlib
from matplotlib import font_manager, rc
import platform
import matplotlib.pyplot as plt
import seaborn as sns

# ์ด๋ฏธ์ง€ ํ•œ๊ธ€ ํ‘œ์‹œ ์„ค์ •
if platform.system() == 'Windows':  # ์œˆ๋„์šฐ์ธ ๊ฒฝ์šฐ ๋ง‘์€๊ณ ๋”•
    font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
    rc('font', family=font_name)
else:    # Mac ์ธ ๊ฒฝ์šฐ ์• ํ”Œ๊ณ ๋”•
    rc('font', family='AppleGothic')

#๊ทธ๋ž˜ํ”„์—์„œ ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ๊ฐ€ ํ‘œ์‹œ๋˜๋„๋ก ํ•˜๋Š” ์„ค์ •์ž…๋‹ˆ๋‹ค.
matplotlib.rcParams['axes.unicode_minus'] = False
fig, ax = plt.subplots( figsize=(15,15) )

sns.heatmap(data = top_pivot, 
            annot = True, fmt = '.3f', 
            cmap = 'Reds',
            center= 0.4   # ์ปฌ๋Ÿฌ๋งต ์ค‘๊ฐ„๊ฐ’ ์ง€์ •
           )

์ƒ‰์ด ์ง„ํ• ์ˆ˜๋ก ์ถœ๋ฃจ์œจ์ด ๋†’์Œ์„ ์˜๋ฏธํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด , NC์˜ ์–‘์˜์ง€ ์„ ์ˆ˜๋Š” ํ•ด๋‹น ์‹œ์ฆŒ์—์„œ KIA๋ฅผ ์ƒ๋Œ€๋กœ ํ•œ ๊ฒฝ๊ธฐ์—์„œ ์ถœ๋ฃจ์œจ์ด ๋†’์•˜์œผ๋ฉฐ , ํ•œํ™”์˜ ์ •๊ทผ์šฐ ์„ ์ˆ˜๋Š” LG์™€์˜ ๊ฒฝ๊ธฐ์—์„œ ๊ฐ•ํ–ˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.  

sns.heatmap(data = top_pivot, 
            annot = True, fmt = '.3f', 
            cmap = 'Reds',
            center= 0.6   # ์ปฌ๋Ÿฌ๋งต ์ค‘๊ฐ„๊ฐ’ ์ง€์ •
           )

์ƒ๋Œ€์ ์ธ ํฌ๊ธฐ๋ฅผ ์‚ดํŽด๋ณด๊ณ ์ž ํ•  ๋•Œ๋Š” center๋ฅผ ๋ณ€๊ฒฝํ•ด ๊ฐ€๋ฉฐ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/54

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 1. best baseball player ๋ถ„์„

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/53 https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot, jointplot, pairplot, boxplot, swarm..

silvercoding.tistory.com

 

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
import pandas as pd
file  = './data/KBO_2019_player_gamestats.csv'
raw = pd.read_csv(file, encoding = 'cp949')
raw.head()

์ด์ „ ํฌ์ŠคํŒ…์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ๋ถˆ๋Ÿฌ์˜จ๋‹ค. 

 

 


 ์•ผ๊ตฌ์„ ์ˆ˜๊ฐ€ ๊ฐ•ํ•ด์ง€๋Š” ๊ณ„์ ˆ์ด ์žˆ์„๊นŒ ? 

idea : '์ผ์ž' ์ปฌ๋Ÿผ์—์„œ ์›” ์ •๋ณด๋งŒ ์ถ”์ถœํ•ด ๋‚ด์–ด ์›” ๋ณ„ ์ถœ๋ฃจ์œจ์„ ์‹œ๊ฐํ™” ํ•ด๋ณธ๋‹ค. 

 

1. ์›” ๋ณ„ ๊ธฐ๋ก ์ •๋ฆฌํ•˜๊ธฐ 

- '์ผ์ž' ์ปฌ๋Ÿผ์—์„œ ์›” ์ถ”์ถœํ•˜๊ธฐ 

month_list = []
for monthdate in raw['์ผ์ž']:
    month, date = monthdate.split('-')
    month_list.append(month)
raw['์›”'] = month_list
raw.head()

 

- ๋ถ„์„์— ํ•„์š”ํ•œ ์ปฌ๋Ÿผ ์„ ํƒ 

columns_select = ['ํŒ€', '์ด๋ฆ„', '์ƒ์ผ','์ผ์ž', '์ƒ๋Œ€','ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ', '๋ฃจํƒ€', 'ํƒ€์ ','๋ณผ๋„ท', '์‚ฌ๊ตฌ', 'ํฌ๋น„', '์›”']

data = raw[columns_select]
data.head()

 

- ์›”๋ณ„ ์‹ค์  ์ง‘๊ณ„ 

data_player_month = data.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ', '์›”'], 
                               values = ['ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ','๋ฃจํƒ€','ํƒ€์ ','๋ณผ๋„ท','์‚ฌ๊ตฌ','ํฌ๋น„'], 
                              aggfunc = 'sum', fill_value = 0
                                )
data_player_month

ํŒ€, ์ด๋ฆ„, ์ƒ์ผ๋กœ ์„ ์ˆ˜๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ณ , ์›”๋ณ„ ์‹ค์ ์„ ์ง‘๊ณ„ํ•˜๋Š” pivot table์„ ์ƒ์„ฑํ•œ๋‹ค. ๋น„์–ด์žˆ๋Š” ๊ณณ์€ 0์œผ๋กœ ๊ฐ’์„ ์ฑ„์›Œ ์ค€๋‹ค. 

data_player_month = data_player_month.reset_index()
data_player_month

reset_index๋กœ ๋ฉ€ํ‹ฐ์ธ๋ฑ์Šค๋ฅผ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

 

 

- ํƒ€์œจ, ์ถœ๋ฃจ์œจ, ์žฅํƒ€์œจ, OPS ( ์ฃผ์š” ์‹ค์  ๊ณ„์‚ฐ ) ์ปฌ๋Ÿผ ์ถ”๊ฐ€ 

def cal_hit(df):
    '''
    - ํƒ€์œจ : ๊ณต์„ ์ณ์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ --> ์•ˆํƒ€ / ํƒ€์ˆ˜
    - ์ถœ๋ฃจ์œจ: ์ง„๋ฃจํ•ด์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ -->  (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ)/(ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ+ํฌ์ƒํ”Œ๋ผ์ด)
    - ์žฅํƒ€์œจ : ํƒ€์œจ์— ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ๊ฐ€์ค‘์น˜ ์ถ”๊ฐ€ -->   ๋ฃจํƒ€ / ํƒ€์ˆ˜
    '''
    
    df['ํƒ€์œจ'] = df['์•ˆํƒ€'] / df['ํƒ€์ˆ˜']
    df['์ถœ๋ฃจ์œจ'] = (df['์•ˆํƒ€'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ']) / (df['ํƒ€์ˆ˜'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ'] + df['ํฌ๋น„'])
    df['์žฅํƒ€์œจ'] = df['๋ฃจํƒ€'] / df['ํƒ€์ˆ˜']
    df['OPS'] = df['์ถœ๋ฃจ์œจ'] + df['์žฅํƒ€์œจ']
    return df
player_month_stat = cal_hit(data_player_month)
player_month_stat.head()

 

player_month_stat.info()

๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

player_month_stat = player_month_stat.dropna()
player_month_stat.head()

dropna()๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” row๋ฅผ ์ œ๊ฑฐํ•ด ์ค€๋‹ค. 

 

player_month_stat.info()

๊ฒฐ์ธก๊ฐ’์ด ๋ชจ๋‘ ์ฑ„์›Œ์กŒ๋‹ค! 

 

- ์›”๋ณ„ ์ถœ๋ฃจ์œจ 

month_pivot = player_month_stat.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'],
                             columns = '์›”',
                             values = '์ถœ๋ฃจ์œจ')
month_pivot = month_pivot.reset_index()
month_pivot

์œ„์™€ ๊ฐ™์ด ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. 

 

 

2. ๊ฒฐ๊ณผ : KBO ์ถœ๋ฃจ์œจ ์ตœ๊ณ ํƒ€์ž๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ํ™•์ธํ•ด ๋ณด๊ธฐ 

- KBO ์ถœ๋ฃจ์œจ ์ตœ๊ณ ํƒ€์ž ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

file = './data/player_stat.csv'
player_stat = pd.read_csv(file, encoding = 'cp949')
player_stat.head(20)

์ €๋ฒˆ ๊ธ€์˜ ์ฃผ์ œ์˜€๋˜ ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•ด ๋†“์€ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด ๋ฐ์ดํ„ฐ์™€ ์œ„์—์„œ ๋งŒ๋“ค์–ด ๋†“์€ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ๋ณ‘ํ•˜์—ฌ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 50๋ช…์˜ ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ๊ธฐ๋ก์„ ํ™•์ธํ•ด ๋ณด์ž. 

 

 

- ๋ฐ์ดํ„ฐ ํ•ฉ๋ณ‘ 

df = pd.merge(player_stat, month_pivot, how = 'left', on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'])
# left_on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'], right_on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ']
df.head(10)

df_sort = df.sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(50)
df_sort

๋‹ค์‹œํ•œ๋ฒˆ ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ ํ•˜๊ณ  , 50๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ df_sort ์— ๋„ฃ์–ด ์ค€๋‹ค. 

df_selected = df_sort[['ํŒ€', '์ด๋ฆ„', '์ถœ๋ฃจ์œจ', '03', '04', '05', '06', '07', '08', '09', '10']]
df_selected

๊ทธ๋ฆฌ๊ณ  df_sort์—์„œ ์ถœ๋ฃจ์œจ๊ณผ ๊ด€๋ จ๋œ ์ปฌ๋Ÿผ๋งŒ ๋ฝ‘์•„์„œ df_selectied ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•œ๋‹ค. 

์ˆซ์ž๋กœ ๋‚˜์™€์žˆ์œผ๋‹ˆ 50๋ช…์˜ ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ๊ธฐ๋ก์ด ์–ด๋–ค์ง€ ๊ฐ€๋Š ์ด ์•ˆ ๊ฐ„๋‹ค. ๋”ฐ๋ผ์„œ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•˜์—ฌ ํ™•์ธํ•˜๋„๋ก ํ•œ๋‹ค. 

 

 

 

- ์‹œ๊ฐํ™”๋กœ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ 

df_selected = df_selected.set_index(['ํŒ€','์ด๋ฆ„'])
df_selected

ํžˆํŠธ๋งต์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐํ™”๋ฅผ ํ•  ๊ฒƒ์ด๋‹ค. ์ด ๋•Œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ values ๋ถ€๋ถ„์€ ๋ชจ๋‘ ์ˆซ์žํ˜• ๋ฐ์ดํ„ฐ๋กœ๋งŒ ๋˜์–ด ์žˆ์–ด์•ผ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ํŒ€, ์ด๋ฆ„ ์ปฌ๋Ÿผ์€ index๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

import matplotlib
from matplotlib import font_manager, rc
import platform
import matplotlib.pyplot as plt
import seaborn as sns

# ์ด๋ฏธ์ง€ ํ•œ๊ธ€ ํ‘œ์‹œ ์„ค์ •
if platform.system() == 'Windows':  # ์œˆ๋„์šฐ์ธ ๊ฒฝ์šฐ ๋ง‘์€๊ณ ๋”•
    font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
    rc('font', family=font_name)
else:    # Mac ์ธ ๊ฒฝ์šฐ ์• ํ”Œ๊ณ ๋”•
    rc('font', family='AppleGothic')

#๊ทธ๋ž˜ํ”„์—์„œ ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ๊ฐ€ ํ‘œ์‹œ๋˜๋„๋ก ํ•˜๋Š” ์„ค์ •์ž…๋‹ˆ๋‹ค.
matplotlib.rcParams['axes.unicode_minus'] = False
sns.heatmap(df_selected)

์กฐ๊ธˆ ๋” ์„ค์ •์„ ์ฃผ์–ด ๋ณด๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ค์–ด ๋ณด์ž. 

fig, ax = plt.subplots( figsize=(15,15) )
sns.heatmap(data = df_selected, 
            annot = True, fmt = '.3f', 
            cmap = 'Reds'
           )

์ง„ํ•œ ๋นจ๊ฐ„์ƒ‰์ผ ์ˆ˜๋ก ๋ณด๋‹ค ๋†’์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ€์ง„๋‹ค. ์•„์ง์€ ์›” ๋ณ„๋กœ ๋” ์ž˜ํ•œ ๊ฑด์ง€ ๋ชปํ•œ ๊ฑด์ง€ ํ•œ ๋ˆˆ์— ๋ณด๊ธฐ๋Š” ์–ด๋ ต๋‹ค. ๋”ฐ๋ผ์„œ ์‹œ์ฆŒ ์ „์ฒด ์ถœ๋ฃจ์œจ๊ณผ์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜์—ฌ  ํ™•์ธํ•ด ๋ณด์ž. 

for col in df_selected.columns[1:]:
    df_selected[col] = df_selected[col] - df_selected['์ถœ๋ฃจ์œจ'] 
df_selected['์ถœ๋ฃจ์œจ'] = 0.0

์‹œ์ฆŒ ์ถœ๋ฃจ์œจ๊ณผ ์›”๋ณ„ ์ถœ๋ฃจ์œจ์˜ ์ฐจ์ด๋ฅผ ๋ชจ๋‘ ๊ตฌํ•œ ํ›„, ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ์˜ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋‘ 0์œผ๋กœ ๋ฐ”๊พธ์–ด ์ค€๋‹ค. 

fig, ax = plt.subplots( figsize=(10,10) )

sns.heatmap(data = df_selected.head(50), 
            annot = True, fmt = '.3f', 
            cmap = 'RdBu_r'
           )

๋นจ๊ฐ„์ƒ‰์ด ์ง™์œผ๋ฉด ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ ๋ณด๋‹ค ๋†’์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ–๊ณ  ์žˆ์œผ๋ฉฐ, ํŒŒ๋ž€์ƒ‰์ด ์ง™๋‹ค๋ฉด , ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ ๋ณด๋‹ค ๋” ๋‚ฎ์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ–๊ณ  ์žˆ์Œ์„ ์˜๋ฏธ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ•œํ™” ์ตœ์žฌํ›ˆ ์„ ์ˆ˜๋Š” 3์›” ์ถœ๋ฃจ์œจ์ด ์šฐ์„ธํ–ˆ๊ณ , KIA ์œ ๋ฏผ์ƒ ์„ ์ˆ˜๋Š” 6์›”์— ๋น„ํ•ด 7์›”์— ์—„์ฒญ๋‚œ ์ถœ๋ฃจ์œจ ์ƒ์Šน์„ ๋ณด์ธ๋‹ค. ๊ณ„์ ˆ์— ๋”ฐ๋ฅธ ์˜ํ–ฅ์ด ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ–ˆ๋Š”๋ฐ , (๋”์šด ์—ฌ๋ฆ„์ฒ ์—๋Š” ๊ธฐ๋ก์ด ์ค„์–ด๋“œ๋Š” ๋“ฑ ) ์„ ์ˆ˜ ๋งˆ๋‹ค ๋ชจ๋‘ ๋‹ค๋ฅด๊ณ , ํ™•์‹คํžˆ ํŠน์ • ๋‹ฌ์— ์ถœ๋ฃจ์œจ์ด ๋†’์•„์ง€๋Š” ์„ ์ˆ˜๋“ค์ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

+ Recent posts