์‚ฌ์šฉ ๋ฐ์ดํ„ฐ์…‹ 

https://www.data.go.kr/dataset/3035522/fileData.do

ํ˜„์žฌ ์ด ๋ฐ์ดํ„ฐ์…‹์€ ํ๊ธฐ ๋˜์—ˆ๋‹ค๊ณ  ๋‚˜์˜จ๋‹ค. 

 

์œ„ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธํ”„๋Ÿฐ๊ฐ•์˜ (๊ณต๊ณต๋ฐ์ดํ„ฐ๋กœ ํŒŒ์ด์ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„) (https://bit.ly/3sISk6Z) ์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์ „์ฒ˜๋ฆฌํ•œ ๋ฐ์ดํ„ฐ๋กœ ์‹œ๊ฐํ™” ์ •๋ฆฌ ์ง„ํ–‰ํ•œ๋‹ค. 

์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ

 

cf1) figure, axes ์ƒ์„ฑ

fig=plt.figure(figsize=(10,3), dpi=100)
ax1=fig.subplots()

 

cf2) ๋ชจ๋“  x tick ํ‘œํ˜„ํ•˜๊ธฐ 

_=plt.xticks(ticks=np.arange(len(df)), labels=df.index)

 

cf3) x์ถ• ์†Œ์ˆ˜์  ์ œ๊ฑฐ

from matplotlib.ticker import MaxNLocator
ax1.xaxis.set_major_locator(MaxNLocator(integer=True))

 

(cf4) ๊ทธ๋ž˜ํ”„์˜ ๋ฐ–์— Legend ํ‘œ์‹œํ•˜๋„๋ก ์„ค์ •

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

 

 

 

 lineplot 


1. pandas plot

(1) pandas plot์˜ ๊ธฐ๋ณธ plot - lineplot 

- df์˜ index ๋˜๋Š” column ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ๊ทธ๋ ค์ง 

df.plot(figsize=(10,3))

์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ , gropuby ์‚ฌ์šฉ

cf) ๋ชจ๋“  x tick ํ‘œํ˜„ํ•˜๊ธฐ 

_=plt.xticks(ticks=np.arange(len(g)), labels=g.index)

- df ์˜ column์ด ์—ฌ๋Ÿฌ ๊ฐœ ์กด์žฌํ•  ๋•Œ  (df์˜ column์ด seaborn์˜ hue์—ญํ• )

์—ฐ๋„๋ณ„ ์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

2. seaborn plot 

sns.lineplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue="์ง€์—ญ๋ช…", ci=None, ax=ax1)
ax1.legend(bbox_to_anchor=(1.02, 1), loc=2)

์—ฐ๋„๋ณ„ ์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

 

 pointplot 

sns.pointplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue="์ง€์—ญ๋ช…", ci=None, ax=ax2)
ax2.legend(bbox_to_anchor=(1.02, 1), loc=2)

์—ฐ๋„๋ณ„ ์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

 

 

 barplot 


1. pandas plot 

(1) df.plot(kind='bar')

- df์˜ index ๋˜๋Š” columm ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ๊ทธ๋ ค์ง

df.plot.bar(rot=0, figsize=(10, 3))
# or
df.plot(kind='bar',rot=0, ax=ax1)

์ง€์—ญ ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ , groupby ์‚ฌ์šฉ

(2) df.plot.bar()

df.plot.bar(color='g',rot=0, figsize=(10,3)) # cmap='Pastel1' ๋˜ํ•œ ๊ฐ€๋Šฅ

์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

- column ์—ฌ๋Ÿฌ๊ฐœ์ผ ๋•Œ ( df์˜ column์ด seaborn์˜ hue์™€ ๊ฐ™์€ ์—ญํ• )

ax=df2.plot.bar(figsize=(10,3), rot=0)
ax.set_ylabel('ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ')

์ง€์—ญ๋ณ„ ์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

2. seaborn plot 

sns.barplot(data=df, x="์ง€์—ญ๋ช…", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ")
# estimator default: mean
# color changable
# palette (https://seaborn.pydata.org/tutorial/color_palettes.html)
# ci: bootstrap resampling (with replacement), sorted means

palette ์ƒ‰ ๋ชจ์Œ ๋งํฌ

์ง€์—ญ๋ณ„ ํ‰๋‹น ๋ถ„์–‘๊ฐ€๊ฒฉ (ํ™•์‹คํžˆ seaborn์ด ๋” ์˜ˆ์˜๊ธด ํ•˜๋‹ค)

 

- hue ์ง€์ • 

sns.barplot(data=df, x="์ง€์—ญ๋ช…", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue='์—ฐ๋„', ci=None)

 

 

 histplot 


1. pandas plot

(1) df.plot(kind='hist') or df.plot.hist()

df.plot(kind='hist', figsize=(10, 3), title='ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ')
# or
ax=df.plot(kind='hist', figsize=(10, 3))
ax.set_title('ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ')

ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ์˜ ๋ถ„ํฌ

df["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"].plot.hist(bins=50)

 

 

(2) df.hist(bins=)

df["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"].hist(bins=50)

axs=df.hist(bins=50, figsize=(10,10))
ax1,ax2,ax3,ax4=axs.flatten()
ax2.set_title('ax๋ณ„ ์ œ๋ชฉ ์ง€์ • ๊ฐ€๋Šฅ')

 

 

 

 

 

2. seaborn plot 

sns.histplot(df["ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ"], kde=True)

 

 

 

 kdeplot 


1. seaborn plot 

sns.kdeplot(data=df['ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ'])

sns.kdeplot(data=df[['ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ','๋ถ„์–‘๊ฐ€๊ฒฉ']])

 

 

 

 

 

 boxplot 


1. pandas plot

(1) df.plot(kind='box')

df.plot(kind='box', figsize=(5, 5))

 

(2) df.plot.box()

- df ์˜ column์ด x์ถ• 

df.plot.box(fontsize=15)

 

์›”๋ณ„ ์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

- ์ด์ค‘ column์ผ ๊ฒฝ์šฐ 

df.plot.box(figsize=(15, 3), rot=30)

์›”๋ณ„ ์—ฐ๋„๋ณ„ ์ „์šฉ๋ฉด์ ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

(3) df.boxplot(column='', by='')

- by: x์ถ• 

df.boxplot(column='ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ',by='์—ฐ๋„', figsize=(5,3), rot=30)

์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

- by๊ฐ€ ๋ฆฌ์ŠคํŠธ์ผ ๋•Œ 

df.boxplot(column='ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ',by=['์—ฐ๋„','์ „์šฉ๋ฉด์ '], figsize=(20,3), rot=30)

์—ฐ๋„๋ณ„ ์ „์šฉ๋ฉด์ ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

 

2. seaborn plot 

sns.boxplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ")

์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

- hue ์ง€์ •

plt.figure(figsize=(12, 3))
sns.boxplot(data=df_last, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue="์ „์šฉ๋ฉด์ ")

 

 

 violinplot 

1. seaborn plot 

sns.violinplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ")

์—ฐ๋„๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

- hue ์ง€์ •

plt.figure(figsize=(12, 3))
sns.violinplot(data=df, x="์—ฐ๋„", y="ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ", hue="์ „์šฉ๋ฉด์ ")

์—ฐ๋„๋ณ„ ์ „์šฉ๋ฉด์ ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ

 

 

 

 heatmap 

1. seaborn plot 

plt.figure(figsize=(15, 7), dpi=100)
ax=sns.heatmap(df, cmap="Blues", annot=True, fmt=".0f")

์—ฐ๋„๋ณ„ ์ง€์—ญ๋ณ„ ํ‰๋‹น๋ถ„์–‘๊ฐ€๊ฒฉ, pivot_table๋กœ ์ „์ฒ˜๋ฆฌ ํ•ด์ค€ df์— ์ ์šฉํ•ด์•ผ ํ•จ

 

 

2. matplotlib pcolor  

fig=plt.figure(figsize=(15,5), dpi=100)
ax=fig.subplots()

t2=t.iloc[::-1]
t2
hm1=ax.pcolor(t2, cmap="Blues")
_=fig.colorbar(hm1, ax=ax)

col_len=len(t2.columns)
row_len=len(t2.index)
for r in range(row_len):
    for c in range(col_len):
        _=ax.text(c+0.5, r+0.5, int(t2.iloc[r, c]),ha="center", va="center", color="k", fontsize=11)

_=ax.set_xticks(np.arange(col_len)+0.5)
_=ax.set_xticklabels(t2.columns)

_=ax.set_yticks(np.arange(row_len)+0.5)
_=ax.set_yticklabels(t2.index)

 

 

 

 

 

 

+ Recent posts