๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

์ˆ˜์น˜ํ˜• x ์ˆ˜์น˜ํ˜• : scatterplot, lmplot, jointplot 

์ˆ˜์น˜ํ˜• x ์นดํ…Œ๊ณ ๋ฆฌํ˜• : boxplot, violinplot, barplot, heatmap 

์ˆ˜์น˜ํ˜• x ์œ„์น˜ํ˜• : folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ 

 

 

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ distplot, relplot, jointplot, pairplot, boxplot, swarmplot, heatmap ์„ ์‚ฌ์šฉํ•œ๋‹ค. 

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import seaborn as sns

seaborn ์„ import ํ•ด์ฃผ๊ณ , ์•ฝ์–ด๋Š” sns๋ฅผ ์“ด๋‹ค. 

raw = sns.load_dataset('tips')

seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ˜•์‹์œผ๋กœ, pandas๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋‹ค. 

raw.head()

total_bill ( ์ด ๊ธˆ์•ก ) , tip ( ํŒ ) , sex ( ์„ฑ๋ณ„ ) , smoker ( ํก์—ฐ ์—ฌ๋ถ€ ) , day ( ์š”์ผ ) , time ( ์‹์‚ฌ ์‹œ๊ฐ„ ) , size ( ์ธ์› ์ˆ˜ ) 

raw.info()

info()๋ฅผ ์ด์šฉํ•˜์—ฌ row, column์˜ ๊ฐœ์ˆ˜, data type, ๊ฒฐ์ธก๊ฐ’ ํ™•์ธ์„ ํ•ด ์ค€๋‹ค. 

 


 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜•) 
  • sns.distplot( df[ '์ปฌ๋Ÿผ๋ช…' ] )
raw['total_bill']

์ˆ˜์น˜ํ˜• ์ž๋ฃŒ์ธ total_bill ์ปฌ๋Ÿผ์˜ ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ณธ๋‹ค. 

sns.distplot(raw['total_bill'])

 

 

 

 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ ( ์ˆ˜์น˜ํ˜• vs ์ˆ˜์น˜ํ˜• ) 
  • relplot( data=df, x=, y=, hue=, kind='scatter' ) 

: ๋‘ ๊ฐœ์˜ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜ ๋ถ„ํฌ ํ™•์ธ 

- kind = 'scatter' (default

- kind = 'line' 

sns.relplot(x = 'tip', y = 'total_bill', data = raw)  # kind ์˜ต์…˜ ๊ฐ’ ๋ฏธ์ง€์ •์‹œ "scatter"

sns.relplot(x = 'tip', y = 'total_bill', data = raw, kind = 'line')

sns.relplot(x = 'tip', y = 'total_bill', data = raw, hue = 'sex')

 

hue๋ฅผ ์„ฑ๋ณ„๋กœ ์ง€์ •ํ•˜์—ฌ ์„ฑ๋ณ„์„ ๊ธฐ์ค€์œผ๋กœ ๋‚˜๋ˆ„์–ด ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 


 ๊ด€๊ณ„ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์ˆ˜์น˜ํ˜• ) 
  • jointplot ( data=df, x=, y=, kind='scatter' ) 

- kind = 'scatter' : (default / point) 

- kind = 'reg' : (point + regression)

- kind = 'kde' : ๋ˆ„์ ๋ถ„ํฌ์ฐจํŠธ like ์ง€๋„

 

sns.jointplot(data = raw, x = 'tip', y = 'total_bill')  # kind ๊ฐ’ ๋ฏธ ์ง€์ •์‹œ ๊ธฐ๋ณธ ๊ฐ’์€ kind = 'scatter'

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'kde')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'reg')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'hex')

kind='hex'๋Š” kde์™€ ๋น„์Šทํ•œ๋ฐ, ๊ฒน์น˜์ง€ ์•Š๋Š” ์ •์œก๊ฐํ˜•์œผ๋กœ ํ‘œ์‹œ๋œ๋‹ค. 

 

 

 

  • pairplot( data=df ) 

: df ์˜ ๋ชจ๋“  ์ˆ˜์น˜ํ˜•๋ฐ์ดํ„ฐ ์ปฌ๋Ÿผ์—์„œ ๋‘ ์ปฌ๋Ÿผ์”ฉ ๊ด€๊ณ„๋ฅผ ์‹œ๊ฐํ™” ํ•จ 

 

sns.pairplot(data = raw)

์„œ๋กœ ๊ฐ™์€ ์ปฌ๋Ÿผ์ด๋ฉด ํžˆ์Šคํ† ๊ทธ๋žจ์„ ๊ทธ๋ฆฌ๊ณ , ๋‹ค๋ฅธ ์ปฌ๋Ÿผ์€ ๋‘ ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ์˜ ๊ด€๊ณ„๋ฅผ ์‚ฐ์ ๋„๋กœ ํ‘œ์‹œํ•ด ์ค€๋‹ค. 

sns.pairplot(data = raw, hue = 'sex')

hue๋ฅผ ์„ฑ๋ณ„๋กœ ์ง€์ •ํ•˜์—ฌ ์„ฑ๋ณ„ ๊ฐ„ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 


 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์นดํŽ˜๊ณ ๋ฆฌํ˜•) 
  • boxplot( data = df, x = , y = , hue = )
sns.boxplot(data = raw, x = 'day', y = 'tip')

์š”์ผ ๋ณ„ ํŒ์˜ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•œ๋‹ค. 

sns.boxplot(data = raw, x = 'day', y = 'tip', hue = 'smoker')

hue๋ฅผ smoker๋กœ ์ง€์ •ํ•˜์—ฌ ํก์—ฐ์ž ์—ฌ๋ถ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

  • swarmplot( data=, x=, y=, hue=, dodge= )

boxplot๊ณผ ๋น„์Šทํ•˜๋‹ค. boxplot์€ ํ•˜๋‚˜์˜ ๊ธฐ์ค€์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ๋ฒ”์œ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์šฉ์ดํ•˜์ง€๋งŒ , ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ํ‘œํ˜„ํ•˜์ง€๋Š” ์•Š๊ธฐ ๋•Œ๋ฌธ์— , ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๊ฐ’์„ ๋น„๊ตํ•˜๊ธฐ์—๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋•Œ swarmplot์„ ์ด์šฉํ•œ๋‹ค. 

sns.swarmplot(data = raw, x = 'day', y = 'tip', hue = 'smoker', dodge=True)

 

dodge=False๋กœ ํ•˜๋ฉด ๋‘ ์ƒ‰์ด ๊ฒน์ณ ๋‚˜์˜จ๋‹ค. 

sns.boxplot(data = raw, x = 'day', y = 'tip', hue = 'smoker')
sns.swarmplot(data = raw, x = 'day', y = 'tip', hue = 'smoker', dodge=True)

 

boxplot๊ณผ swarplot์„ ๊ฒน์ณ์…” ๊ทธ๋ ค๋ณผ ์ˆ˜๋„ ์žˆ๋‹ค. 

sns.boxplot(data = raw, x = 'size', y = 'tip', hue = 'sex')

 

 

 

 

  • barplot( data=df, x=, y=, hue= ) 
sns.barplot(data = raw, x = 'size', y = 'tip', hue = 'sex')

 

 

 

 ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์นดํ…Œ๊ณ ๋ฆฌํ˜• vs ์นดํ…Œ๊ณ ๋ฆฌํ˜•)
  • heatmap( data = df )
df = raw.pivot_table(index = 'day', columns = 'size', values = 'tip', aggfunc='mean')
df

์šฐ์„  ์ธ๋ฑ์Šค๋ฅผ day, column์„ size, value๋ฅผ tip์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ์š”์ผ ๋ณ„ ์ธ์›์ˆ˜์— ๋”ฐ๋ฅธ tip ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•œ๋‹ค. 

sns.heatmap(data = df)

์นดํ…Œ๊ณ ๋ฆฌํ˜• - ์นดํ…Œ๊ณ ๋ฆฌํ˜•์˜ ๊ด€๊ณ„๋ฅผ ์ˆ˜์น˜ํ˜•์„ ์ด์šฉํ•˜์—ฌ ํ‘œํ˜„ํ•˜๋Š” heatmap์ด๋‹ค. 

 

๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ ์„ค์ •์„ ํ•  ์ˆ˜ ์žˆ๋‹ค. 

sns.heatmap(data = df, 
           annot = True, fmt = '.2f')

 

sns.heatmap(data = df, 
           annot = True, fmt = '0.2f',
           cmap = 'Pastel1')

* annot : ์ˆซ์ž ํ‘œ์‹œ, fmt : ์ˆซ์ž ํฌ๋งคํŒ… , cmap : ์ƒ‰๊น” 

* cmap ์ถ”์ฒœ ์ƒ‰ : Reds , Blues, Vlag, Pastel1, RdBu_r


 

 

+ Recent posts