๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/59

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 2-3 ์Šน์ฐจ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์ง€ํ•˜์ฒ  ์—ญ ๋ถ„์„

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/58 https://silvercoding.tistory.com/57 https://silvercoding.tistory.com/56 https://silvercoding.tistory.com/55 https://silvercoding...

silvercoding.tistory.com

 

 


 API ์‹ ์ฒญํ•˜๊ธฐ 

1. ํšŒ์›๊ฐ€์ž…, ๋กœ๊ทธ์ธ ํ•˜๊ธฐ 

http://data.seoul.go.kr

 

์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ

๋ชจ๋“  ์„œ์šธ์‹œ๋ฏผ์„ ์œ„ํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ์—์„œ ์„œ์šธ์‹œ์™€ ์—ฐ๊ณ„ ๊ธฐ๊ด€์ด ๊ณต๊ฐœํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธ์‹œ์™€ ๊ด€๋ จ๋œ ๋‹ค์–‘ํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”.

data.seoul.go.kr

 

 

2. ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์ธ์ฆํ‚ค ์‹ ์ฒญ ํด๋ฆญ 

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” ์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ์˜ ๊ณต๊ณต์ž์ „๊ฑฐ ์‹ค์‹œ๊ฐ„ ๋Œ€์—ฌ์ •๋ณด API ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

๋‹ค์Œ์˜ ๋งํฌ์—์„œ API๋ฅผ ์‹ ์ฒญํ•  ์ˆ˜ ์žˆ๋‹ค. 

http://data.seoul.go.kr/dataList/OA-15493/A/1/datasetView.do

์ธ์ฆํ‚ค ์‹ ์ฒญ์„ ํด๋ฆญํ•œ๋‹ค. 

 

 

3. ๊ฐ€์ž… ์‹ ์ฒญ์„œ ์ž‘์„ฑ 

ํ™œ์šฉํ•  ๋ชฉ์ ์— ๋งž๊ฒŒ ์‹ ์ฒญ์„œ๋ฅผ ์ž‘์„ฑํ•ด ์ค€๋‹ค. ํ˜„์žฌ ๋ฐ์ดํ„ฐ๋ถ„์„ ํ•™์Šต์— ์‚ฌ์šฉํ•  ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ url์ด ์—†๋‹ค. ์ด ์ฒ˜๋Ÿผ ์‚ฌ์šฉurl์ด ์—†์„ ๊ฒฝ์šฐ์—๋Š” localhost๋ฅผ ์ž‘์„ฑํ•ด์ค€๋‹ค. 

 

 

4. ์ƒ๋‹จ [๋‚˜์˜ํ™”๋ฉด - ์ธ์ฆํ‚ค ๊ด€๋ฆฌ] 

๋‚˜์˜ํ™”๋ฉด - ์ธ์ฆํ‚ค ๊ด€๋ฆฌ์— ๋“ค์–ด๊ฐ„๋‹ค. 

 

 

5. ์ธ์ฆํ‚ค ๋ณต์‚ฌ 

์ธ์ฆํ‚ค ๋ณต์‚ฌ๋ฅผ ํ•œ๋‹ค. ์ด์ œ API์‹ ์ฒญ์ด ์™„๋ฃŒ๋˜์—ˆ๊ณ , ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€๋ณด์ž. 

 

 

 

 

 

 

 


 ๋ฐ์ดํ„ฐ์…‹ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

1. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ import

import requests  # API๋ฅผ ์ด์šฉํ•ด ์ž๋ฃŒ๋ฅผ ๋ฐ›์•„์˜ค๊ธฐ ์œ„ํ•ด
import pandas as pd   # ์ž๋ฃŒ ์ €์žฅ, ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด

 

 

2. ์„œ์šธํŠน๋ณ„์‹œ ๊ณต๊ณต์ž์ „๊ฑฐ ์‹ค์‹œ๊ฐ„ ๋Œ€์—ฌ์ •๋ณด ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ 

- ํ˜ธ์ถœ

์ถœ์ฒ˜-์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ

ํ•œ๋ฒˆ์— ์ตœ๋Œ€ 1000๊ฑด์„ ํ˜ธ์ถœํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ 2๋ฒˆ์— ๊ฑธ์ณ ํ˜ธ์ถœํ•œ ํ›„, ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ์น˜๋Š” ์ž‘์—…์„ ํ•ด์•ผํ•œ๋‹ค. 

 

 

- ์˜ˆ์‹œ

์ถœ์ฒ˜-์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ

์œ„์˜ ์ƒ˜ํ”Œ URL์„ ์‚ฌ์šฉํ•˜์—ฌ API๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ๋‹ค. (์ธ์ฆํ‚ค) ๋ถ€ํ„ฐ ๊ฐ’์˜ ์˜๋ฏธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 

์ถœ์ฒ˜-์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ

์ธ์ฆํ‚ค, ์š”์ฒญํŒŒ์ผ ํƒ€์ž…, ์„œ๋น„์Šค๋ช…, ์š”์ฒญ์‹œ์ž‘์œ„์น˜, ์š”์ฒญ์ข…๋ฃŒ์œ„์น˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์š”์ฒญ์‹œ์ž‘์œ„์น˜์™€ ์š”์ฒญ์ข…๋ฃŒ์œ„์น˜๋ฅผ ์กฐ์ ˆํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ๋œ๋‹ค. 

 

 

- ์ถœ๋ ฅ๊ฐ’ 

์ถœ์ฒ˜-์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ ๊ด‘์žฅ

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์ปฌ๋Ÿผ์ด ๋  ๋ณ€์ˆ˜๋“ค์ด๋‹ค. ๊ฑฐ์น˜๋Œ€๊ฐœ์ˆ˜, ๋Œ€์—ฌ์†Œ์ด๋ฆ„, ์ž์ „๊ฑฐ์ฃผ์ฐจ์ด๊ฑด์ˆ˜, ๊ฑฐ์น˜์œจ, ์œ„๋„, ๊ฒฝ๋„, ๋Œ€์—ฌ์†ŒID ๋กœ ์ด 7๊ฐœ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. 

 

 

3. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

- ์ธ์ฆํ‚ค ์ž…๋ ฅ 

apikey = ' '  # ๋ฐ›์€ key ๊ฐ’ ์ž…๋ ฅ

๋ณธ์ธ์˜ api ๊ฐ’์„ ๋„ฃ์–ด์ค€๋‹ค. 

 

 

- API ์š”์ฒญํ•˜๊ธฐ 

startnum = 1
endnum = 1000
url1 = f'http://openapi.seoul.go.kr:8088/{apikey}/json/bikeList/{startnum}/{endnum}/'
requests.get(url1)
# requests.get(url1).text
# requests.get(url1).content

requests.get() ์„ ์‚ฌ์šฉํ•˜๋ฉด ์‘๋‹ต์ด ์˜จ๋‹ค. ์ด ์™ธ์—๋„ text, content๋กœ ์•ˆ์˜ ๋‚ด์šฉ์„ ์‹คํ–‰ํ•ด๋ณผ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

- ์ž๋ฃŒ ์š”์ฒญ 

json1 = requests.get(url1).json()
json1

์ด๋ ‡๊ฒŒ json ํ˜•ํƒœ๋กœ ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ 1000๊ฐœ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. 

 

 

- ํ•„์š”ํ•œ ์ •๋ณด(ํ‚ค) ์„ ํƒ 

json1['rentBikeStatus'].keys()

ํ˜„์žฌ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•ด ํ•„์š”ํ•œ ์ •๋ณด๋Š” rentBikeStatus ์•ˆ์— ์žˆ๋Š” row ํ‚ค์˜ ๋ฐ์ดํ„ฐ์ด์ง€๋งŒ,  ๋ชจ๋“  ํ‚ค๋ฅผ ๊บผ๋‚ด๋ณธ๋‹ค.

json1['rentBikeStatus']['list_total_count']  # ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜

json1['rentBikeStatus']['RESULT']  # ์˜ค๋ฅ˜ ์—ฌ๋ถ€

json1['rentBikeStatus']['row']  # ์ž์ „๊ฑฐ ์ •๋ฅ˜์žฅ๋ณ„ ์ž์ „๊ฑฐ ํ˜„ํ™ฉ

 

 

 

- ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋ณ€๊ฒฝ

raw1 = pd.DataFrame(json1['rentBikeStatus']['row'])
raw1.head()

 

 

- 1000 ~ 2000 & 2000 ~ 3000 ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ณ  ํ•ฉ๋ณ‘ํ•˜๊ธฐ 

startnum = 1001
endnum = 2000
url2 = f'http://openapi.seoul.go.kr:8088/{apikey}/json/bikeList/{startnum}/{endnum}/'
json2 = requests.get(url2).json()
raw2 = pd.DataFrame(json2['rentBikeStatus']['row'])
raw2.tail()

data_mid = raw1.append(raw2)
data_mid

 

๋ฐ์ดํ„ฐ๊ฐ€ 2000๊ฐœ๋ฅผ ๋” ๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ ๊ฐ™์œผ๋‹ˆ 2000 ~ 3000 ๋ฐ์ดํ„ฐ๋„ ํ˜ธ์ถœํ•œ ํ›„ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค. 

 

 

startnum = 2001
endnum = 3000
url3 = f'http://openapi.seoul.go.kr:8088/{apikey}/json/bikeList/{startnum}/{endnum}/'
json3 = requests.get(url3).json()
raw3 = pd.DataFrame(json3['rentBikeStatus']['row'])
raw3.tail()

data = data_mid.append(raw3)
data

๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. ์ด 2540๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ธ๋ฑ์Šค๊ฐ€ ์กฐ๊ธˆ ์ด์ƒํ•ด์„œ ์ธ๋ฑ์Šค๋ฅผ ์ดˆ๊ธฐํ™” ํ•ด์ค€ ํ›„, ๋ฐ์ดํ„ฐ๋ฅผ ์—‘์…€ ํŒŒ์ผ๋กœ ์ €์žฅํ•ด์ค€๋‹ค. 

 

 

 

- ์ตœ์ข… ๋ฐ์ดํ„ฐ 

data.reset_index(drop=True)

data.info()

์ด 2450๊ฐœ์˜ row๊ฐ€ ์กด์žฌํ•˜๊ณ , null ๊ฐ’์ด ์—†๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

- ์—‘์…€ํŒŒ์ผ๋กœ ์ €์žฅ 

data.to_excel('./data/bicycle.xlsx', index = False)

์—‘์…€ํŒŒ์ผ๋กœ ์ €์žฅํ•ด์ค€๋‹ค. ๋‹ค์Œ์‹œ๊ฐ„์—๋Š” ์ด๋Ÿฌํ•œ ๊ณผ์ •์œผ๋กœ ๋ถˆ๋Ÿฌ์˜จ ์ž์ „๊ฑฐ ๋ฐ์ดํ„ฐ์…‹์„ folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ์„œ์šธ์‹œ ๋”ฐ๋ฆ‰์ด ํ˜„ํ™ฉ ์ง€๋„๋ฅผ ์ƒ์„ฑํ•ด ๋ณผ ๊ฒƒ์ด๋‹ค. 

 

 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/58

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 2-2 ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ๋‚ ?

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/57 https://silvercoding.tistory.com/56 https://silvercoding.tistory.com/55 https://silvercoding.tistory.com/54 https://silvercoding...

silvercoding.tistory.com

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
raw = pd.read_excel('./data/subway_raw.xlsx')

2-1 ํฌ์ŠคํŒ…์—์„œ ํ•ฉ๋ณ‘ํ•ด ๋†“์€ 2019๋…„ 1์›” - 6์›” ์ง€ํ•˜์ฒ  ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค. 

raw.info()

์ด 99342 ๊ฐœ์˜ row๊ฐ€ ์กด์žฌํ•˜๊ณ  , null ๊ฐ’์ด ์—†๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. 

 

 

 


 ์–ด๋Š ์—ญ์—์„œ , ์–ธ์ œ ์ง€ํ•˜์ฒ ์„ ๊ฐ€์žฅ ๋งŽ์ด ํƒˆ๊นŒ ? 

1. ์Šน๊ฐ์ด ๊ฐ€์žฅ ๋งŽ์ด ํƒ€๋Š” ์—ญ

data_station = raw.pivot_table(index = '์—ญ๋ช…', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc='sum')
data_station = data_station.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending = False)
data_station.head(10)  # ์Šน์ฐจ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์—ญ ์ƒ์œ„ 10๊ฐœ

์—ญ๋ณ„๋กœ ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌํ•œ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์ด๋‹ค. ์ƒ์œ„ 10๊ฐœ ์—ญ์„ ์ถœ๋ ฅํ•˜์˜€๊ณ  , 2019๋…„ ์ƒ๋ฐ˜๊ธฐ ๊ฐ€์žฅ ๋งŽ์€ ์Šน๊ฐ์ˆ˜๊ฐ€ ์žˆ์—ˆ๋˜ ์—ญ์€ ์ž ์‹ค์—ญ์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

2. ์‹œ๊ฐํ™” : ๋…ธ์„  ๋ณ„ (1-9ํ˜ธ์„ ) ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜ ๋น„๊ตํ•ด ๋ณด๊ธฐ


* ํžˆํŠธ๋งต 

  • sns.heatmap(data, annot = True, fmt = '.0f', cmap = "RdBu_r")
    • annot : True ์ผ๊ฒฝ์šฐ ๊ฐ’์„ ๊ทธ๋ž˜ํ”„์— ํ‘œ์‹œ
    • fmt : ๊ฐ’ ํ‘œ์‹œ ํ˜•ํƒœ.
      • ex) 'f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„(default ๋กœ ๊ฐ’์ด ์žˆ๋Š” ์†Œ์ˆ˜ ์ž๋ฆฌ๊นŒ์ง€ ํ‘œ์‹œ๋จ)
      • ex) '.0f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„ํ•ด๋‹ฌ๋ผ (์†Œ์ˆ˜ 0๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€๋งŒ == ์ •์ˆ˜์ž๋ฆฌ๋งŒ )
      • ex) '.1f' : ์‹ค์ˆ˜๋กœ ํ‘œํ˜„ํ•ด๋‹ฌ๋ผ (์†Œ์ˆ˜ 1๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€๋งŒ)
      • ex) .1% ๋Š” ํผ์„ผํŠธ(์†Œ์ˆ˜ ์ฒซ๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€ ํ‘œ์‹œ)
    • cmap : ์ƒ‰์ƒ ์ฐจํŠธ. _r ์œผ๋กœ ๋๋‚˜๋Š” ์ฐจํŠธ๋Š” ์ƒ‰์ƒ ๋ฐฉํ–ฅ ๋ฐ˜๋Œ€๋กœ ๋˜์–ด์žˆ๋Š” ๋ฒ„์ „์ž„(์•„๋ž˜ ์ปฌ๋Ÿฌ ๋ฆฌ์ŠคํŠธ ์ฐธ๊ณ )

* cmap ์ข…๋ฅ˜

Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, icefire, icefire_r, inferno, inferno_r, jet, jet_r, magma, magma_r, mako, mako_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, rocket, rocket_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, twilight, twilight_r, twilight_shifted, twilight_shifted_r, viridis, viridis_r, vlag, vlag_r, winter, winter_r


- 1ํ˜ธ์„ ๋งŒ ์‹œ๊ฐํ™” ํ•ด๋ณด๊ธฐ 

line = '1ํ˜ธ์„ '
data_line = raw[raw['๋…ธ์„ ๋ช…'] == line]

# ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”: ๋…ธ์„ ์˜ ์—ญ ์ˆœ์„œ์— ๋งž์ถฐ ์ •๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์—ญID๋„ ์ธ๋ฑ์Šค์— ํฌํ•จ
df_pivot = data_line.pivot_table(index = ['์—ญID', '์—ญ๋ช…'], columns = '์š”์ผ', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜',aggfunc = 'sum') 
df_pivot = df_pivot[['์›”','ํ™”','์ˆ˜','๋ชฉ','๊ธˆ','ํ† ','์ผ']]   # ์ปฌ๋Ÿผ ์ˆœ์„œ๋ฅผ ์š”์ผ์— ๋งž๊ฒŒ ์ •๋ฆฌ
df_pivot = df_pivot / 10000  # ๋งŒ๋ช…๋‹จ์œ„๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ „์ฒด๋ฅผ 1๋งŒ์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ
df_pivot

์—ญ๋ณ„ ์š”์ผ๋ณ„ ์Šน์ฐจ์ด๊ฐ์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•œ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์ด๋‹ค. ์š”์ผ์ด ๋’ค์ฃฝ๋ฐ•์ฃฝ ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์‹œ ์„ ํƒํ•˜์—ฌ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ฆฌํ•ด ์ค€๋‹ค. 

import matplotlib.pyplot as plt
import seaborn as sns 
from matplotlib import font_manager, rc
import platform 

# ํ•œ๊ธ€ ํฐํŠธ ์‚ฌ์šฉ
if platform.system() == 'Windows': 
    path = 'c:/Windows/Fonts/malgun.ttf'
    font_name = font_manager.FontProperties(fname=path).get_name()
    rc('font', family=font_name)
elif platform.system() == 'Darwin':
    rc('font', family='AppleGothic')
fig, ax = plt.subplots( figsize=(6,5) )   # ๊ทธ๋ž˜ํ”„ ์‚ฌ์ด์ฆˆ ์ง€์ •
plt.title(f"{line} ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜", fontsize = 20) # for title
sns.heatmap(df_pivot, cmap = "Reds", 
           annot = True, fmt = '.0f')

1ํ˜ธ์„ ์˜ ์—ญ๋ณ„ ์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜ ํžˆํŠธ๋งต๋‹ˆ๋‹ค. ์„œ์šธ์—ญ๊ณผ ์ข…๊ฐ์—ญ์˜ ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์•„๋ณด์ด๊ณ , ๊ทธ์ค‘์—์„œ๋„ ์„œ์šธ์—ญ์˜ ๊ธˆ์š”์ผ์— ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

๋™์ผํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋“  ๋…ธ์„ ์„ ์‹œ๊ฐํ™”ํ•˜์—ฌ ๋น„๊ต ํ•ด ๋ณด์ž. 

 

- 1ํ˜ธ์„ ~9ํ˜ธ์„  ์‹œ๊ฐํ™” ํ•˜์—ฌ ๋น„๊ต ํ•ด๋ณด๊ธฐ 

raw['๋…ธ์„ ๋ช…'].unique()

์ด๋ ‡๊ฒŒ ๋งŽ์€ ๋…ธ์„ ์ด ์žˆ๋Š”๋ฐ , ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” 1ํ˜ธ์„  ~ 9ํ˜ธ์„  ๋งŒ์„ ์‹œ๊ฐํ™” ํ•œ๋‹ค. 

line_seoul_list = [ ]
for line in raw['๋…ธ์„ ๋ช…'].unique():
    if line[1:] == 'ํ˜ธ์„ ':    # xํ˜ธ์„  ์ธ ๊ฒฝ์šฐ๋ฅผ ์„ ํƒ. 
        line_seoul_list.append(line)
line_seoul_list

for line in sorted(line_seoul_list):
    
    # ๋ฐ์ดํ„ฐ ์ •๋ฆฌํ•˜๊ธฐ
    data_line = raw[raw['๋…ธ์„ ๋ช…'] == line]
    df_pivot = data_line.pivot_table(index = ['์—ญID', '์—ญ๋ช…'], columns = '์š”์ผ', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜',aggfunc = 'sum')
    df_pivot = df_pivot[['์›”','ํ™”','์ˆ˜','๋ชฉ','๊ธˆ','ํ† ','์ผ']]
    df_pivot = df_pivot / 10000  # ๋งŒ๋ช…๋‹จ์œ„๋กœ ์ˆ˜์ •
    
    
    # ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
    fig, ax = plt.subplots( figsize=(6,len(df_pivot)/3 ) )   # ๊ทธ๋ž˜ํ”„ ์‚ฌ์ด์ฆˆ๋ฅผ ์กฐ์ •ํ•˜์—ฌ, ์—ญ ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ๋Š” ์„ธ๋กœ๋ฅผ ๊ธธ๊ฒŒ ํ‘œํ˜„
    plt.title(f"{line} ์—ญ๋ณ„/์š”์ผ๋ณ„ ์Šน๊ฐ์ˆ˜", fontsize = 20) # for title
    sns.heatmap(df_pivot, cmap = "Reds", 
               annot = True, fmt = '.0f')

 

์ง„ํ•œ ๋นจ๊ฐ„์ƒ‰์ผ ์ˆ˜๋ก ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ์—ญ์€ ๋ชจ๋“  ์š”์ผ์ด ๋Œ€์ฒด์ ์œผ๋กœ ์ƒ‰์ด ์ง„ํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด์ „ ํฌ์ŠคํŒ…์—์„œ ์ „์ฒด์ ์œผ๋กœ ๊ธˆ์š”์ผ์— ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๊ณ  , ์ฃผ๋ง์ด ๋˜๋ฉด ์Šน๊ฐ์ˆ˜๊ฐ€ ๋–จ์–ด์ง€๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์—ˆ๋Š”๋ฐ , ์—ญ ๋ณ„๋กœ ๋ณด๋‹ˆ ์ฃผ๋ง์ด ๋” ๋งŽ์€ ์—ญ๋„ ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. (  ex, ์ดํƒœ์›, ๊ณ ์†ํ„ฐ๋ฏธ๋„, ํ™๋Œ€์ž…๊ตฌ )  

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/57

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 2-1 pandas๋กœ ์—ฌ๋Ÿฌ csv ํŒŒ์ผ ํ•ฉ์น˜๊ธฐ

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/56 https://silvercoding.tistory.com/55 https://silvercoding.tistory.com/54 https://silvercoding.tistory.com/53 https://silvercoding...

silvercoding.tistory.com

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
raw = pd.read_excel('./data/subway_raw.xlsx')
raw.head()

์ด์ „ ๊ธ€์—์„œ ๋ณ‘ํ•ฉํ–ˆ๋˜ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์™€ ์ค€๋‹ค. 

raw.info()

์ด 99342๊ฐœ์˜ row๊ฐ€ ์กด์žฌํ•˜๊ณ  ,  null๊ฐ’์ด ์—†๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

 


 ์Šน๊ฐ์ด ๋งŽ์•„์ง€๋Š” ๋‚ ์€ ์–ธ์ œ์ธ๊ฐ€ ? 

- ์ผ์ž , ์š”์ผ ๋ณ„ ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜ ํ™•์ธ ํ•ด ๋ณด๊ธฐ 

data_date = pd.pivot_table(raw, index = ['์‚ฌ์šฉ์ผ์ž', '์š”์ผ'], values= '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc= 'sum')
data_date.head()

 

- ์ผ์ž , ์š”์ผ ๋ณ„ ์Šน์ฐจ์ด์Šน๊ฐ ์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ 

data_date_sort = data_date.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending= False)
data_date_sort

 

์ผ์ž, ์š”์ผ ๋ณ„ ์Šน์ฐจ์Šน๊ฐ์ˆ˜๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ ํ•ด ๋ณด์•˜๋‹ค. ์ด ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ํ™•์ธํ•ด ๋ณด๋ฉด ๋‘๊ฐ€์ง€ ๊ฐ€์ •์„ ์„ธ์šธ ์ˆ˜ ์žˆ๋‹ค. 

 

1. 5์›”์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค. 

2. ๊ธˆ์š”์ผ์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค. 

 

 

(1) 5์›”์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค 

- '์—ฐ์›”' ์ปฌ๋Ÿผ , '์›”์ผ' ์ปฌ๋Ÿผ ์ถ”๊ฐ€ 

yearmonth_list = []
monthday_list = []
for date in raw['์‚ฌ์šฉ์ผ์ž']:
    yearmonth = str(date)[:6]   # ์™ผ์ชฝ๋ถ€ํ„ฐ 6์ž๋ฆฌ ๋ฌธ์ž ์„ ํƒ
    yearmonth_list.append(yearmonth)
    monthday = str(date)[4:]    # ์™ผ์ชฝ์—์„œ 5๋ฒˆ์งธ ๋ฌธ์ž๋ถ€ํ„ฐ ๋๊นŒ์ง€ ์„ ํƒ
    monthday_list.append(monthday)
    
# ์—ฐ์›”/ ์›”์ผ ์ปฌ๋Ÿผ ์ถ”๊ฐ€ํ•˜๊ธฐ
raw['์—ฐ์›”'] = yearmonth_list
raw['์›”์ผ'] = monthday_list
raw.head()

์›” ๋ณ„ ์Šน๊ฐ์ˆ˜๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ์ผ์ž๋ฅผ ์—ฐ์›”, ์›”์ผ๋กœ ๋‚˜๋ˆ„์–ด ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. 

data_month = pd.pivot_table(raw, index = '์—ฐ์›”', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc='sum')
data_month = data_month.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending= False)
data_month

์›”๋ณ„ ์Šน์ฐจ์ด๊ฐ์ˆ˜ ํ”ผ๋ฒ—๋ฐ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ณ  , ์Šน์ฐจ์ข…์Šน๊ฐ์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์„ ํ•ด๋ณด๋‹ˆ , 5์›”์— ๊ฐ€์žฅ ์Šน๊ฐ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

๊ฒฐ๋ก  -> 5์›”์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค : True

 

 

 

(2) ๊ธˆ์š”์ผ์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค 

data_week = pd.pivot_table(raw, index = '์š”์ผ', values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc='sum')
data_week = data_week.sort_values(by = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', ascending= False)
data_week

์š”์ผ๋ณ„ ์Šน์ฐจ์ด๊ฐ์ˆ˜ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ณ  , ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์„ ํ•ด๋ณด๋‹ˆ ๊ธˆ์š”์ผ์— ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

๊ฒฐ๋ก  -> ๊ธˆ์š”์ผ์— ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ๋‹ค : True

 

 

 

 

 


* ์›”๋ณ„ / ์ผ์ž๋ณ„ ์Šน๊ฐ์ˆ˜ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™” 

- 2019๋…„ 1์›” ๋ฐ์ดํ„ฐ๋กœ ํ…Œ์ŠคํŠธ ํ•ด๋ณด๊ธฐ 

df_selected = raw[ raw['์—ฐ์›”'] == '201901']
df_selected.head()

df_pivot = pd.pivot_table(df_selected, index = ['์›”์ผ','์š”์ผ'], values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc= 'sum')
df_pivot = df_pivot.reset_index()
df_pivot

2019๋…„ 1์›” ๋‚ ์งœ๋ณ„ ์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•˜๋Š” ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋ฅผ pointplot์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐํ™” ํ•ด๋ณด์ž. 

import matplotlib.pyplot as plt
import seaborn as sns 
from matplotlib import font_manager, rc
import platform 

# ํ•œ๊ธ€ ํฐํŠธ ์‚ฌ์šฉ
if platform.system() == 'Windows': 
    path = 'c:/Windows/Fonts/malgun.ttf'
    font_name = font_manager.FontProperties(fname=path).get_name()
    rc('font', family=font_name)
elif platform.system() == 'Darwin':
    rc('font', family='AppleGothic')
fig, ax = plt.subplots( figsize=(20,6) )
# ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
sns.pointplot(data = df_pivot, x = '์›”์ผ', y = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜')

fig, ax = plt.subplots( figsize=(20,6) )
# ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
sns.pointplot(data = df_pivot, x = '์š”์ผ', y = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜')

1์›” ๊ทธ๋ž˜ํ”„๋งŒ ๊ทธ๋ ค๋ณด์•˜๋”๋‹ˆ , ๊ธˆ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์€ ์Šน๊ฐ์ˆ˜๋ฅผ ์ฐ๊ณ , ๊ทธ ์ดํ›„๋กœ ์ค„์–ด๋“ค์–ด ์ผ์š”์ผ์—๋Š” ์Šน๊ฐ์ˆ˜๊ฐ€ ๋Œ€ํญ ์ค„์–ด๋“œ๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. ๋งค ๋‹ฌ๋งˆ๋‹ค ํŽธ์ฐจ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด 1-6์›”์˜ ๋‚ ์งœ๋ณ„ ์Šน๊ฐ์ˆ˜ ์‹œ๊ฐํ™”๋ฅผ ํ•ด๋ณธ๋‹ค. 

 

 

 

 

- ์ƒ๋ฐ˜๊ธฐ ๋‚ ์งœ๋ณ„ ์Šน๊ฐ์ˆ˜ ์‹œ๊ฐํ™” 

raw['์—ฐ์›”'].unique()

for yearmonth in raw['์—ฐ์›”'].unique():
    df_selected = raw[ raw['์—ฐ์›”'] == yearmonth]  # ํ•ด๋‹น ์—ฐ์›” ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ
    df_pivot = pd.pivot_table(df_selected, index = ['์›”์ผ','์š”์ผ'], values = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', aggfunc= 'sum')# ์ผ์ž๋ณ„ ์Šน๊ฐ์ˆ˜ ๊ณ„
    df_pivot = df_pivot.reset_index()
    
    fig, ax = plt.subplots( figsize=(20,6) )
    
    ax.set_title(f'์ผ์ž๋ณ„ ์ง€ํ•˜์ฒ ์Šน๊ฐ์ˆ˜({yearmonth})')  # ๊ทธ๋ž˜ํ”„ ์ œ๋ชฉ ์ถ”๊ฐ€ํ•˜๊ธฐ
    sns.pointplot(data = df_pivot, x = '์›”์ผ', y = '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜')

 

ํ•ด์„ :  ์ „์ฒด์ ์œผ๋กœ ํ˜•ํƒœ๊ฐ€ ๋น„์Šทํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๊ธˆ์š”์ผ ์ดํ›„ ์ฃผ๋ง์—๋Š” ์Šน๊ฐ์ˆ˜๊ฐ€ ๋–จ์–ด์ง€๊ณ  , ์ผ์š”์ผ์— ๋Œ€ํญ ํ•˜๋ฝํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. 2์›”๋‹ฌ์„ ๋ณด๋ฉด ์„ค์—ฐํœด๋กœ ์ธํ•ด์„œ 2์›” ์ดˆ ์ฃผ์ค‘์—๋„ ์Šน๊ฐ์ˆ˜์˜ ์ˆ˜๊ฐ€ ๋‚ฎ์œผ๋ฉฐ , 6์›” 6์ผ ํ˜„์ถฉ์ผ์—๋„ ์Šน๊ฐ์ˆ˜๊ฐ€ ๋–จ์–ด์ง€๋Š” ํ˜„์ƒ ๋˜ํ•œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ง€ํ•˜์ฒ  ์Šน๊ฐ์ˆ˜๋Š” ๊ณตํœด์ผ์˜ ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/56

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 1-3. ํŠน์ • ํŒ€์— ๊ฐ•ํ•œ ์•ผ๊ตฌ์„ ์ˆ˜ ๋ถ„์„ํ•˜๊ธฐ

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/55 https://silvercoding.tistory.com/54 https://silvercoding.tistory.com/53 https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. se..

silvercoding.tistory.com

 

 


import pandas as pd

* pandas๋กœ ํŒŒ์ผ ์ฝ์–ด์˜ค๊ธฐ 

์ถœ์ฒ˜ - ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ

< ๊ด€๋ จ ์ฐธ๊ณ  ๋ฌธ์„œ > 

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

 

pandas.read_excel — pandas 1.3.1 documentation

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IN

pandas.pydata.org

 

 

 

 


 pandas ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—‘์…€ ํŒŒ์ผ ๋ณ‘ํ•ฉํ•˜๊ธฐ 

(1) ์—‘์…€ํŒŒ์ผ ํ•œ๊ฐœ ๋ถˆ๋Ÿฌ์™€์„œ ์‚ดํŽด๋ณด๊ธฐ 

* ํŒŒ์ผ : ์ผ์ž / ๋…ธ์„  / ์ง€ํ•˜์ฒ ์—ญ ๋ณ„ ์Šนํ•˜์ฐจ ๊ณ ๊ฐ ์ˆ˜ ( 2019๋…„ ์ƒ๋ฐ˜๊ธฐ - 1์›” ~ 6์›”, ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ œ๊ณต ) 

file = './rawfiles/CARD_SUBWAY_MONTH_201901.csv'
raw = pd.read_csv(file)
raw.head()

raw.info()

์ด 18334 ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๊ณ , ๊ฒฐ์ธก๊ฐ’์€ ์—†๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

(2) ๋‘๊ฐœ ์—‘์…€ํŒŒ์ผ ํ•ฉ์ณ๋ณด๊ธฐ  

raw = pd.DataFrame()  # ๋นˆ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ƒ์„ฑ 
raw.head()

# ์ฒซ ๋ฒˆ์งธ ํŒŒ์ผ
file = './rawfiles/CARD_SUBWAY_MONTH_201901.csv'
temp = pd.read_csv(file)
# temp.head()
raw = raw.append(temp)  # ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ถ”๊ฐ€ 
# raw.head()

# ๋‘ ๋ฒˆ์งธ ํŒŒ์ผ
file = './rawfiles/CARD_SUBWAY_MONTH_201902.csv'
temp = pd.read_csv(file)
raw = raw.append(temp)  # ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ถ”๊ฐ€ 

raw.info()

์šฐ์„  , ๋นˆ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•œ ํ›„, ํŒŒ์ผ์„ ์ฝ์–ด ์™€์„œ append๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋นˆ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค. ์—ฌ๋Ÿฌ๊ฐœ์˜ ํŒŒ์ผ์„ ํ•ฉ์น˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฐ˜๋ณต๋˜๋Š” ์ž‘์—…์„ for๋ฌธ์œผ๋กœ ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋œ๋‹ค. 

 

 

 

(3) ํด๋”์— ์žˆ๋Š” ๋ชจ๋“  ์—‘์…€ํŒŒ์ผ ๋ณ‘ํ•ฉํ•˜๊ธฐ

- ํด๋” ์•ˆ์˜ ํŒŒ์ผ ์ด๋ฆ„ ๊ฐ€์ ธ์˜ค๊ธฐ 

# ํด๋” , ํŒŒ์ผ์„ ๊ด€๋ฆฌํ•˜๋Š” os ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
import os
os.listdir()

os.listdir() ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ˜„์žฌ ํด๋”์— ์žˆ๋Š” ํด๋”์™€ ํŒŒ์ผ๋“ค์˜ ์ด๋ฆ„์„ ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. 

dirpath = './rawfiles/'
files = os.listdir(dirpath)
files

๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณธ ํฌ์ŠคํŒ…์—์„œ ์‚ฌ์šฉํ•  ํ˜„์žฌ ํด๋” ์•ˆ์˜ rawfiles์•ˆ์˜ ํŒŒ์ผ๋“ค์„ ๋ถˆ๋Ÿฌ ์˜จ๋‹ค. 

 

์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ผ์ผ์ด ํŒŒ์ผ ์ด๋ฆ„์„ ๋ณต๋ถ™ํ•˜์ง€ ์•Š์•„๋„ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. 

 

- ๋ณ‘ํ•ฉํ•˜๊ธฐ 

# ๋นˆ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ค€๋น„
raw = pd.DataFrame() 

# ๋ณ‘ํ•ฉ
for file in os.listdir('./rawfiles'):
#     print(file)
    fpath = './rawfiles/'+file
    print(fpath)
    temp = pd.read_csv(fpath)
    raw = raw.append(temp, ignore_index = True)   #ignore_index = True  --> ๊ธฐ์กด ์ธ๋ฑ์Šค๋Š” ๋ฌด์‹œํ•˜๋ผ.

์œ„์™€ ๊ฐ™์ด ๋นˆ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์— ํŒŒ์ผ์„ ํ•˜๋‚˜์”ฉ ์ถ”๊ฐ€ํ•œ๋‹ค. ignore_index=True๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์กด ์ธ๋ฑ์Šค๋ฅผ ๋ฌด์‹œํ•˜์ง€ ์•Š์œผ๋ฉด ์›๋ž˜ ์ž์‹ ์˜ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๋ฑ์Šค๊ฐ€ ๋’ค์ฃฝ๋ฐ•์ฃฝ์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ๊ฒŒ๋œ๋‹ค. ์ˆœ์„œ๋Œ€๋กœ ํ•ด์ฃผ๊ธฐ ์œ„ํ•จ์ด๋‹ค. 

 

 

 


 ๋ณ‘ํ•ฉํ•œ ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ 
raw.head()

raw.tail()

raw.info()

์ด 99342 ๊ฐœ์˜ row๋กœ , raw.tail()์„ ํ–ˆ์„ ๋•Œ index์˜ ๊ฒฐ๊ณผ์™€ ๊ฐ™์œผ๋ฏ€๋กœ ์ •์ƒ์ ์œผ๋กœ ๋ณ‘ํ•ฉ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 


 '์š”์ผ' ์ปฌ๋Ÿผ ์ถ”๊ฐ€ ํ•ด๋ณด๊ธฐ - datetime ์‚ฌ์šฉ 
from datetime import datetime

์ถœ์ฒ˜ - ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ

 

- datetime.strptime()

date_str = str(20190601)     # ์ˆซ์ž๊ฐ€ ์•„๋‹Œ ๋ฌธ์ž๋กœ ์ž…๋ ฅ๋˜์–ด์•ผ ํ•จ
date = datetime.strptime(date_str, "%Y%m%d")
date

datetime.datetime(2019, 6, 1, 0, 0)

# ์›”์š”์ผ : 0 ~ ์ผ์š”์ผ : 6
weekday = date.weekday()
weekday

5

์ง์ ‘ ๋‹ฌ๋ ฅ์„ ์ฐพ์•„๋ณด์ง€ ์•Š๊ณ ๋„ , datetime ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ด๋ ‡๊ฒŒ ์š”์ผ์„ ๋ฐ˜ํ™˜ํ•ด ์ค€๋‹ค. 

 

- '์š”์ผ' ์ปฌ๋Ÿผ ์ถ”๊ฐ€ 

weekday_dict = [ '์›”','ํ™”','์ˆ˜','๋ชฉ','๊ธˆ','ํ† ','์ผ']
weekday_list = []

for date_str in raw['์‚ฌ์šฉ์ผ์ž']:
    date = datetime.strptime(str(date_str), "%Y%m%d")
    weekday_index  = date.weekday()
    weekday = weekday_dict[weekday_index]
    weekday_list.append(weekday)

weekday_list์— ๊ฐ row์˜ ์š”์ผ์„ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค. 

raw['์š”์ผ'] = weekday_list
raw.sample(5)

'์š”์ผ' ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด ์ฃผ๊ณ  , sample์„ ์ด์šฉํ•˜์—ฌ ๋žœ๋ค์œผ๋กœ row๋ฅผ ๋ถˆ๋Ÿฌ์™€์„œ ํ™•์ธํ•ด ๋ณธ๋‹ค. 

raw.columns

new_columns = ['์‚ฌ์šฉ์ผ์ž',  '์š”์ผ', '๋…ธ์„ ๋ช…', '์—ญID', '์—ญ๋ช…', '์Šน์ฐจ์ด์Šน๊ฐ์ˆ˜', 'ํ•˜์ฐจ์ด์Šน๊ฐ์ˆ˜', '๋“ฑ๋ก์ผ์ž']
raw = raw[ new_columns ]
raw.head()

์š”์ผ์„ ์•ž์— ๋‘๊ธฐ ์œ„ํ•ด ์ปฌ๋Ÿผ์˜ ์œ„์น˜๋ฅผ ๋ฐ”๊พธ์–ด ์„ ํƒํ•ด์ค€๋‹ค. 

 

 

 

 


 ๋ฐ์ดํ„ฐ ์ €์žฅ 
raw.to_excel('./data/subway_raw.xlsx', index = False)

to_excel() ์„ ์ด์šฉํ•˜์—ฌ ์—‘์…€ ํŒŒ์ผ๋กœ ์ €์žฅํ•ด ์ค€๋‹ค. 


 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/55

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 1-2. ์•ผ๊ตฌ์„ ์ˆ˜๊ฐ€ ๊ฐ•ํ•ด์ง€๋Š” ๊ณ„์ ˆ์ด ์žˆ์„๊นŒ?

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/54 https://silvercoding.tistory.com/53 https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot,..

silvercoding.tistory.com

 

 


 ํŠน์ • ํŒ€์— ๊ฐ•ํ•œ ์„ ์ˆ˜๊ฐ€ ์žˆ์„๊นŒ? 

* idea : '์ƒ๋Œ€' ์ปฌ๋Ÿผ์œผ๋กœ ๊ฒจ๋ฃจ์—ˆ๋˜ ์ƒ๋Œ€ ํŒ€์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐ๊ฐ์˜ ์ƒ๋Œ€ ํŒ€๊ณผ์˜ ๊ฒฝ๊ธฐ์—์„œ์˜ ์ถœ๋ฃจ์œจ์„ ๊ณ„์‚ฐํ•œ๋‹ค.  

 

- ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

import pandas as pd

file  = './data/KBO_2019_player_gamestats.csv'
raw = pd.read_csv(file, encoding = 'cp949')
raw.head()

 

 

- ์ƒ๋Œ€ ํŒ€๋ณ„ ๊ธฐ๋ก ์ •๋ฆฌ 

raw['์ƒ๋Œ€'].unique()

์šฐ์„  unique() ๋ฅผ ์ด์šฉํ•˜์—ฌ '์ƒ๋Œ€' ์ปฌ๋Ÿผ์— ์žˆ๋Š” ๊ฐ’๋“ค์„ ํ™•์ธํ•ด ์ค€๋‹ค. ์•ž์— @๊ฐ€ ๋ถ™์€ ๊ฒฝ์šฐ๋Š” ์›์ •๊ฒฝ๊ธฐ, ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ํ™ˆ ๊ฒฝ๊ธฐ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด๋ฅผ 'ํ™ˆ์–ด์›จ์ด' ๋ผ๋Š” ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ตฌ๋ถ„ํ•˜๋„๋ก ํ•˜๊ณ  , '์ƒ๋Œ€ํŒ€' ์ปฌ๋Ÿผ์—๋Š” ํŒ€๋ช…๋งŒ์„ ๋„ฃ์–ด์ค€๋‹ค.  

opp_list = [ ]
home_away_list = [ ]

for opp in raw['์ƒ๋Œ€']:
    if "@" in opp:
        home_away = '์›์ •'
        opp = opp.replace('@', '')
    else:
        home_away = 'ํ™ˆ'
    home_away_list.append(home_away)
    opp_list.append(opp)

raw['ํ™ˆ์–ด์›จ์ด'] = home_away_list
raw['์ƒ๋Œ€ํŒ€'] = opp_list
raw.head()

for๋ฌธ์„ ์ด์šฉํ•˜์—ฌ ํ™ˆ์–ด์›จ์ด์™€ ์ƒ๋Œ€ํŒ€์„ ๊ตฌ๋ถ„ํ•˜๊ณ  ,  ๋‘๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด ์ฃผ์—ˆ๋‹ค. 

factors = ['ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ', '๋ฃจํƒ€', 'ํƒ€์ ','๋ณผ๋„ท', '์‚ฌ๊ตฌ', 'ํฌ๋น„']
data = raw.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ', '์ƒ๋Œ€ํŒ€'],
                      values = factors,
                       aggfunc = 'sum')
data.head()

์„ ์ˆ˜๋“ค์˜ ์ƒํƒœํŒ€ ๋ณ„ ์‹ค์ ์„ ์ง‘๊ณ„ํ•˜๊ธฐ ์œ„ํ•ด ํ”ผ๋ฒ— ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•œ๋‹ค. 

cond = data['ํƒ€์ˆ˜'] > 0 
data = data[ cond ]
data.head()

ํƒ€์ˆ˜๊ฐ€ ์—†๋Š” ์„ ์ˆ˜๋“ค์€ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ œ์™ธ์‹œํ‚จ๋‹ค. 

data = data.reset_index()
data.head()

reset_index๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ index๋ฅผ ๋ชจ๋‘ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

 

 

 

 

- ํƒ€์ž vs ์ƒ๋Œ€ํŒ€ ๋ณ„ ์‹ค์  ๊ณ„์‚ฐ 

def cal_hit(df):
    '''
    - ํƒ€์œจ : ๊ณต์„ ์ณ์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ --> ์•ˆํƒ€ / ํƒ€์ˆ˜
    - ์ถœ๋ฃจ์œจ: ์ง„๋ฃจํ•ด์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ -->  (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ)/(ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ+ํฌ์ƒํ”Œ๋ผ์ด)
    - ์žฅํƒ€์œจ : ํƒ€์œจ์— ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ๊ฐ€์ค‘์น˜ ์ถ”๊ฐ€ -->   ๋ฃจํƒ€ / ํƒ€์ˆ˜
    '''
    
    df['ํƒ€์œจ'] = df['์•ˆํƒ€'] / df['ํƒ€์ˆ˜']
    df['์ถœ๋ฃจ์œจ'] = (df['์•ˆํƒ€'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ']) / (df['ํƒ€์ˆ˜'] + df['์‚ฌ๊ตฌ'] + df['ํฌ๋น„'])
    df['์žฅํƒ€์œจ'] = df['๋ฃจํƒ€'] / df['ํƒ€์ˆ˜']
    df['OPS'] = df['์ถœ๋ฃจ์œจ'] + df['์žฅํƒ€์œจ']
    return df

์ด์ „ ๊ธ€์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ ํ•จ์ˆ˜ , ์‹ค์  ๊ณ„์‚ฐ์„ ํ•˜๊ณ  ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค. 

player_stats_opp = cal_hit(data)
player_stats_opp

 

 

 

 

- ๊ฒฐ๊ณผ ๋ณด๊ธฐ : DataFrame 

(1) ๋‘์‚ฐ์— ๊ฐ•ํ•œ ์„ ์ˆ˜ ? - ์ƒ์œ„ 10๋ช…

team = '๋‘์‚ฐ'
cond = (player_stats_opp['์ƒ๋Œ€ํŒ€'] == team) & (player_stats_opp['ํƒ€์ˆ˜'] > 10)
player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(10)

์ƒ๋Œ€ํŒ€์ด ๋‘์‚ฐ์ด๋ฉด์„œ , ํƒ€์ˆ˜๊ฐ€ 20๋ณด๋‹ค ํฐ ์„ ์ˆ˜๋“ค์„ ๋ฝ‘์•„ ๋‚ด๊ณ , ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ 10๋ช…์˜ ์„ ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•ด ๋ณธ๋‹ค. 

 

๋”ฐ๋ผ์„œ ์ƒ๋Œ€ํŒ€์ด '๋‘์‚ฐ' ์ผ ๋•Œ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 10๋ช…์˜ ์ด๋ฆ„์„ ๋ฝ‘์•„๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False)['์ด๋ฆ„'].head(10)

 

(2) ๋กฏ๋ฐ์— ๊ฐ•ํ•œ ์„ ์ˆ˜ ? - ์ƒ์œ„ 10๋ช…

team = '๋กฏ๋ฐ'
cond = (player_stats_opp['์ƒ๋Œ€ํŒ€'] == team) & (player_stats_opp['ํƒ€์ˆ˜'] > 20)
player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(10)

๋™์ผํ•˜๊ฒŒ ์‹œํ–‰ ํ•ด ์ค€๋‹ค. 

 

์ƒ๋Œ€ํŒ€์ด '๋‘์‚ฐ' ์ผ ๋•Œ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 10๋ช…์˜ ์ด๋ฆ„์„ ๋ฝ‘์•„๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False)['์ด๋ฆ„'].head(10)

 

 

(3) KBO ์ „์ฒด ํŒ€์„ ์ƒ๋Œ€๋กœ ํŒ€ ๋ณ„ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 5์ธ ํƒ€์ž๋“ค ํ™•์ธํ•ด ๋ณด๊ธฐ 

hitter_df = pd.DataFrame()

for team in player_stats_opp['์ƒ๋Œ€ํŒ€'].unique():
    print(team)
    cond = (player_stats_opp['์ƒ๋Œ€ํŒ€'] == team) & (player_stats_opp['ํƒ€์ˆ˜'] > 20)
    df = player_stats_opp[cond].sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(5)
    hitter_df = hitter_df.append(df)

hitter_df

ํŠน์ • ํŒ€ ์ƒ๋Œ€ ์ถœ๋ฃจ์œจ Top5 ์•ˆ์— ๋“ค์–ด ์žˆ๋Š” ํƒ€์ž ๋ฆฌ์ŠคํŠธ (์ค‘๋ณต ์ œ๊ฑฐ) 

hitter_df['์ด๋ฆ„'].unique()

 

 

 

- ๊ฒฐ๊ณผ๋ณด๊ธฐ : Heatmap (์‹œ๊ฐํ™”) 

cond = player_stats_opp['์ด๋ฆ„'].isin(hitter_df['์ด๋ฆ„'].unique())
top_df = player_stats_opp[cond]
top_pivot = top_df.pivot_table(index = ['ํŒ€','์ด๋ฆ„'], values = '์ถœ๋ฃจ์œจ', columns = '์ƒ๋Œ€ํŒ€', aggfunc = 'sum')
top_pivot

์œ„์—์„œ ๋งŒ๋“ค์–ด ๋†“์•˜๋˜ ํŠน์ • ํŒ€์„ ์ƒ๋Œ€๋กœ ์ถœ๋ฃจ์œจ top5 ์•ˆ์— ๋“ค์—ˆ๋˜ ์ด๋ฆ„๋“ค๋งŒ player_stats_opp์—์„œ ๋ฝ‘์•„์˜จ ํ›„ , ํ•ด๋‹น ์„ ์ˆ˜๋“ค์˜ ์ƒ๋Œ€ํŒ€ ๋ณ„ ์ถœ๋ฃจ์œจ pivot_table์„ ์ƒ์„ฑํ•œ๋‹ค.  

import matplotlib
from matplotlib import font_manager, rc
import platform
import matplotlib.pyplot as plt
import seaborn as sns

# ์ด๋ฏธ์ง€ ํ•œ๊ธ€ ํ‘œ์‹œ ์„ค์ •
if platform.system() == 'Windows':  # ์œˆ๋„์šฐ์ธ ๊ฒฝ์šฐ ๋ง‘์€๊ณ ๋”•
    font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
    rc('font', family=font_name)
else:    # Mac ์ธ ๊ฒฝ์šฐ ์• ํ”Œ๊ณ ๋”•
    rc('font', family='AppleGothic')

#๊ทธ๋ž˜ํ”„์—์„œ ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ๊ฐ€ ํ‘œ์‹œ๋˜๋„๋ก ํ•˜๋Š” ์„ค์ •์ž…๋‹ˆ๋‹ค.
matplotlib.rcParams['axes.unicode_minus'] = False
fig, ax = plt.subplots( figsize=(15,15) )

sns.heatmap(data = top_pivot, 
            annot = True, fmt = '.3f', 
            cmap = 'Reds',
            center= 0.4   # ์ปฌ๋Ÿฌ๋งต ์ค‘๊ฐ„๊ฐ’ ์ง€์ •
           )

์ƒ‰์ด ์ง„ํ• ์ˆ˜๋ก ์ถœ๋ฃจ์œจ์ด ๋†’์Œ์„ ์˜๋ฏธํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด , NC์˜ ์–‘์˜์ง€ ์„ ์ˆ˜๋Š” ํ•ด๋‹น ์‹œ์ฆŒ์—์„œ KIA๋ฅผ ์ƒ๋Œ€๋กœ ํ•œ ๊ฒฝ๊ธฐ์—์„œ ์ถœ๋ฃจ์œจ์ด ๋†’์•˜์œผ๋ฉฐ , ํ•œํ™”์˜ ์ •๊ทผ์šฐ ์„ ์ˆ˜๋Š” LG์™€์˜ ๊ฒฝ๊ธฐ์—์„œ ๊ฐ•ํ–ˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.  

sns.heatmap(data = top_pivot, 
            annot = True, fmt = '.3f', 
            cmap = 'Reds',
            center= 0.6   # ์ปฌ๋Ÿฌ๋งต ์ค‘๊ฐ„๊ฐ’ ์ง€์ •
           )

์ƒ๋Œ€์ ์ธ ํฌ๊ธฐ๋ฅผ ์‚ดํŽด๋ณด๊ณ ์ž ํ•  ๋•Œ๋Š” center๋ฅผ ๋ณ€๊ฒฝํ•ด ๊ฐ€๋ฉฐ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/54

 

[์‹œ๊ฐํ™” ๋ถ„์„ ํ”„๋กœ์ ํŠธ] 1. best baseball player ๋ถ„์„

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/53 https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot, jointplot, pairplot, boxplot, swarm..

silvercoding.tistory.com

 

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
import pandas as pd
file  = './data/KBO_2019_player_gamestats.csv'
raw = pd.read_csv(file, encoding = 'cp949')
raw.head()

์ด์ „ ํฌ์ŠคํŒ…์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ๋ถˆ๋Ÿฌ์˜จ๋‹ค. 

 

 


 ์•ผ๊ตฌ์„ ์ˆ˜๊ฐ€ ๊ฐ•ํ•ด์ง€๋Š” ๊ณ„์ ˆ์ด ์žˆ์„๊นŒ ? 

idea : '์ผ์ž' ์ปฌ๋Ÿผ์—์„œ ์›” ์ •๋ณด๋งŒ ์ถ”์ถœํ•ด ๋‚ด์–ด ์›” ๋ณ„ ์ถœ๋ฃจ์œจ์„ ์‹œ๊ฐํ™” ํ•ด๋ณธ๋‹ค. 

 

1. ์›” ๋ณ„ ๊ธฐ๋ก ์ •๋ฆฌํ•˜๊ธฐ 

- '์ผ์ž' ์ปฌ๋Ÿผ์—์„œ ์›” ์ถ”์ถœํ•˜๊ธฐ 

month_list = []
for monthdate in raw['์ผ์ž']:
    month, date = monthdate.split('-')
    month_list.append(month)
raw['์›”'] = month_list
raw.head()

 

- ๋ถ„์„์— ํ•„์š”ํ•œ ์ปฌ๋Ÿผ ์„ ํƒ 

columns_select = ['ํŒ€', '์ด๋ฆ„', '์ƒ์ผ','์ผ์ž', '์ƒ๋Œ€','ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ', '๋ฃจํƒ€', 'ํƒ€์ ','๋ณผ๋„ท', '์‚ฌ๊ตฌ', 'ํฌ๋น„', '์›”']

data = raw[columns_select]
data.head()

 

- ์›”๋ณ„ ์‹ค์  ์ง‘๊ณ„ 

data_player_month = data.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ', '์›”'], 
                               values = ['ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ','๋ฃจํƒ€','ํƒ€์ ','๋ณผ๋„ท','์‚ฌ๊ตฌ','ํฌ๋น„'], 
                              aggfunc = 'sum', fill_value = 0
                                )
data_player_month

ํŒ€, ์ด๋ฆ„, ์ƒ์ผ๋กœ ์„ ์ˆ˜๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ณ , ์›”๋ณ„ ์‹ค์ ์„ ์ง‘๊ณ„ํ•˜๋Š” pivot table์„ ์ƒ์„ฑํ•œ๋‹ค. ๋น„์–ด์žˆ๋Š” ๊ณณ์€ 0์œผ๋กœ ๊ฐ’์„ ์ฑ„์›Œ ์ค€๋‹ค. 

data_player_month = data_player_month.reset_index()
data_player_month

reset_index๋กœ ๋ฉ€ํ‹ฐ์ธ๋ฑ์Šค๋ฅผ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

 

 

- ํƒ€์œจ, ์ถœ๋ฃจ์œจ, ์žฅํƒ€์œจ, OPS ( ์ฃผ์š” ์‹ค์  ๊ณ„์‚ฐ ) ์ปฌ๋Ÿผ ์ถ”๊ฐ€ 

def cal_hit(df):
    '''
    - ํƒ€์œจ : ๊ณต์„ ์ณ์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ --> ์•ˆํƒ€ / ํƒ€์ˆ˜
    - ์ถœ๋ฃจ์œจ: ์ง„๋ฃจํ•ด์„œ ๋‚˜๊ฐ€๋Š” ๋น„์œจ -->  (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ)/(ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ+ํฌ์ƒํ”Œ๋ผ์ด)
    - ์žฅํƒ€์œจ : ํƒ€์œจ์— ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ๊ฐ€์ค‘์น˜ ์ถ”๊ฐ€ -->   ๋ฃจํƒ€ / ํƒ€์ˆ˜
    '''
    
    df['ํƒ€์œจ'] = df['์•ˆํƒ€'] / df['ํƒ€์ˆ˜']
    df['์ถœ๋ฃจ์œจ'] = (df['์•ˆํƒ€'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ']) / (df['ํƒ€์ˆ˜'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ'] + df['ํฌ๋น„'])
    df['์žฅํƒ€์œจ'] = df['๋ฃจํƒ€'] / df['ํƒ€์ˆ˜']
    df['OPS'] = df['์ถœ๋ฃจ์œจ'] + df['์žฅํƒ€์œจ']
    return df
player_month_stat = cal_hit(data_player_month)
player_month_stat.head()

 

player_month_stat.info()

๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

player_month_stat = player_month_stat.dropna()
player_month_stat.head()

dropna()๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” row๋ฅผ ์ œ๊ฑฐํ•ด ์ค€๋‹ค. 

 

player_month_stat.info()

๊ฒฐ์ธก๊ฐ’์ด ๋ชจ๋‘ ์ฑ„์›Œ์กŒ๋‹ค! 

 

- ์›”๋ณ„ ์ถœ๋ฃจ์œจ 

month_pivot = player_month_stat.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'],
                             columns = '์›”',
                             values = '์ถœ๋ฃจ์œจ')
month_pivot = month_pivot.reset_index()
month_pivot

์œ„์™€ ๊ฐ™์ด ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. 

 

 

2. ๊ฒฐ๊ณผ : KBO ์ถœ๋ฃจ์œจ ์ตœ๊ณ ํƒ€์ž๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ํ™•์ธํ•ด ๋ณด๊ธฐ 

- KBO ์ถœ๋ฃจ์œจ ์ตœ๊ณ ํƒ€์ž ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

file = './data/player_stat.csv'
player_stat = pd.read_csv(file, encoding = 'cp949')
player_stat.head(20)

์ €๋ฒˆ ๊ธ€์˜ ์ฃผ์ œ์˜€๋˜ ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•ด ๋†“์€ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด ๋ฐ์ดํ„ฐ์™€ ์œ„์—์„œ ๋งŒ๋“ค์–ด ๋†“์€ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ๋ณ‘ํ•˜์—ฌ ์ถœ๋ฃจ์œจ ์ƒ์œ„ 50๋ช…์˜ ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ์ถœ๋ฃจ์œจ ๊ธฐ๋ก์„ ํ™•์ธํ•ด ๋ณด์ž. 

 

 

- ๋ฐ์ดํ„ฐ ํ•ฉ๋ณ‘ 

df = pd.merge(player_stat, month_pivot, how = 'left', on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'])
# left_on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'], right_on = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ']
df.head(10)

df_sort = df.sort_values(by = '์ถœ๋ฃจ์œจ', ascending = False).head(50)
df_sort

๋‹ค์‹œํ•œ๋ฒˆ ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ ํ•˜๊ณ  , 50๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ df_sort ์— ๋„ฃ์–ด ์ค€๋‹ค. 

df_selected = df_sort[['ํŒ€', '์ด๋ฆ„', '์ถœ๋ฃจ์œจ', '03', '04', '05', '06', '07', '08', '09', '10']]
df_selected

๊ทธ๋ฆฌ๊ณ  df_sort์—์„œ ์ถœ๋ฃจ์œจ๊ณผ ๊ด€๋ จ๋œ ์ปฌ๋Ÿผ๋งŒ ๋ฝ‘์•„์„œ df_selectied ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•œ๋‹ค. 

์ˆซ์ž๋กœ ๋‚˜์™€์žˆ์œผ๋‹ˆ 50๋ช…์˜ ์„ ์ˆ˜๋“ค์˜ ์›” ๋ณ„ ๊ธฐ๋ก์ด ์–ด๋–ค์ง€ ๊ฐ€๋Š ์ด ์•ˆ ๊ฐ„๋‹ค. ๋”ฐ๋ผ์„œ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•˜์—ฌ ํ™•์ธํ•˜๋„๋ก ํ•œ๋‹ค. 

 

 

 

- ์‹œ๊ฐํ™”๋กœ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ 

df_selected = df_selected.set_index(['ํŒ€','์ด๋ฆ„'])
df_selected

ํžˆํŠธ๋งต์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐํ™”๋ฅผ ํ•  ๊ฒƒ์ด๋‹ค. ์ด ๋•Œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ values ๋ถ€๋ถ„์€ ๋ชจ๋‘ ์ˆซ์žํ˜• ๋ฐ์ดํ„ฐ๋กœ๋งŒ ๋˜์–ด ์žˆ์–ด์•ผ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ํŒ€, ์ด๋ฆ„ ์ปฌ๋Ÿผ์€ index๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

import matplotlib
from matplotlib import font_manager, rc
import platform
import matplotlib.pyplot as plt
import seaborn as sns

# ์ด๋ฏธ์ง€ ํ•œ๊ธ€ ํ‘œ์‹œ ์„ค์ •
if platform.system() == 'Windows':  # ์œˆ๋„์šฐ์ธ ๊ฒฝ์šฐ ๋ง‘์€๊ณ ๋”•
    font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
    rc('font', family=font_name)
else:    # Mac ์ธ ๊ฒฝ์šฐ ์• ํ”Œ๊ณ ๋”•
    rc('font', family='AppleGothic')

#๊ทธ๋ž˜ํ”„์—์„œ ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ๊ฐ€ ํ‘œ์‹œ๋˜๋„๋ก ํ•˜๋Š” ์„ค์ •์ž…๋‹ˆ๋‹ค.
matplotlib.rcParams['axes.unicode_minus'] = False
sns.heatmap(df_selected)

์กฐ๊ธˆ ๋” ์„ค์ •์„ ์ฃผ์–ด ๋ณด๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ค์–ด ๋ณด์ž. 

fig, ax = plt.subplots( figsize=(15,15) )
sns.heatmap(data = df_selected, 
            annot = True, fmt = '.3f', 
            cmap = 'Reds'
           )

์ง„ํ•œ ๋นจ๊ฐ„์ƒ‰์ผ ์ˆ˜๋ก ๋ณด๋‹ค ๋†’์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ€์ง„๋‹ค. ์•„์ง์€ ์›” ๋ณ„๋กœ ๋” ์ž˜ํ•œ ๊ฑด์ง€ ๋ชปํ•œ ๊ฑด์ง€ ํ•œ ๋ˆˆ์— ๋ณด๊ธฐ๋Š” ์–ด๋ ต๋‹ค. ๋”ฐ๋ผ์„œ ์‹œ์ฆŒ ์ „์ฒด ์ถœ๋ฃจ์œจ๊ณผ์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜์—ฌ  ํ™•์ธํ•ด ๋ณด์ž. 

for col in df_selected.columns[1:]:
    df_selected[col] = df_selected[col] - df_selected['์ถœ๋ฃจ์œจ'] 
df_selected['์ถœ๋ฃจ์œจ'] = 0.0

์‹œ์ฆŒ ์ถœ๋ฃจ์œจ๊ณผ ์›”๋ณ„ ์ถœ๋ฃจ์œจ์˜ ์ฐจ์ด๋ฅผ ๋ชจ๋‘ ๊ตฌํ•œ ํ›„, ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ์˜ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋‘ 0์œผ๋กœ ๋ฐ”๊พธ์–ด ์ค€๋‹ค. 

fig, ax = plt.subplots( figsize=(10,10) )

sns.heatmap(data = df_selected.head(50), 
            annot = True, fmt = '.3f', 
            cmap = 'RdBu_r'
           )

๋นจ๊ฐ„์ƒ‰์ด ์ง™์œผ๋ฉด ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ ๋ณด๋‹ค ๋†’์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ–๊ณ  ์žˆ์œผ๋ฉฐ, ํŒŒ๋ž€์ƒ‰์ด ์ง™๋‹ค๋ฉด , ์‹œ์ฆŒ ์ถœ๋ฃจ์œจ ๋ณด๋‹ค ๋” ๋‚ฎ์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ–๊ณ  ์žˆ์Œ์„ ์˜๋ฏธ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ•œํ™” ์ตœ์žฌํ›ˆ ์„ ์ˆ˜๋Š” 3์›” ์ถœ๋ฃจ์œจ์ด ์šฐ์„ธํ–ˆ๊ณ , KIA ์œ ๋ฏผ์ƒ ์„ ์ˆ˜๋Š” 6์›”์— ๋น„ํ•ด 7์›”์— ์—„์ฒญ๋‚œ ์ถœ๋ฃจ์œจ ์ƒ์Šน์„ ๋ณด์ธ๋‹ค. ๊ณ„์ ˆ์— ๋”ฐ๋ฅธ ์˜ํ–ฅ์ด ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ–ˆ๋Š”๋ฐ , (๋”์šด ์—ฌ๋ฆ„์ฒ ์—๋Š” ๊ธฐ๋ก์ด ์ค„์–ด๋“œ๋Š” ๋“ฑ ) ์„ ์ˆ˜ ๋งˆ๋‹ค ๋ชจ๋‘ ๋‹ค๋ฅด๊ณ , ํ™•์‹คํžˆ ํŠน์ • ๋‹ฌ์— ์ถœ๋ฃจ์œจ์ด ๋†’์•„์ง€๋Š” ์„ ์ˆ˜๋“ค์ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/53

 

[python ์‹œ๊ฐํ™”] 2. ์„œ์šธ์‹œ ๋Œ€ํ”ผ์†Œ ํ˜„ํ™ฉ ์ง€๋„ ๋งŒ๋“ค๊ธฐ , ์ง€๋„ ์‹œ๊ฐํ™” ( folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ )

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/52 [python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot, jointplot, pairplot, boxplot, swarmplot, heatmap) ๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ..

silvercoding.tistory.com

 

 


 ์‚ฌ์ „ ์ง€์‹ 
  • ํƒ€์œจ : ํƒ€๊ฒฉ์— ์„ฑ๊ณตํ•ด ์‚ด์•„๋‚˜๊ฐ€๋Š” ์ •๋„ 

= ํƒ€๊ฒฉ ์„ฑ๊ณต ํšŸ์ˆ˜ / ํƒ€๊ฒฉ ๊ธฐํšŒ ์ˆ˜ 

= ์•ˆํƒ€ ์ˆ˜ / ํƒ€์ˆ˜ 

  • ์ถœ๋ฃจ์œจ : ์‚ด์•„์„œ ๋‚˜๊ฐ€๋Š” ์ •๋„ 

= ์ง„๋ฃจ ์„ฑ๊ณต ํšŸ์ˆ˜ / ์ง„๋ฃจ ๊ธฐํšŒ ์ˆ˜ 

= (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์— ๋งž๋Š” ๋ณผ) / (ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์— ๋งž๋Š” ๋ณผ+ํฌ์ƒ ํ”Œ๋ผ์ด) 

  • ์žฅํƒ€์œจ : ํƒ€๊ฒฉ์— ์„ฑ๊ณตํ•ด ๋ฉ€๋ฆฌ ์‚ด์•„๋‚˜๊ฐ€๋Š” ์ •๋„ 

= ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ์ˆ˜ / ํƒ€๊ฒฉ ๊ธฐํšŒ ์ˆ˜ 

= ๋ฃจํƒ€ ์ˆ˜ / ํƒ€์ˆ˜ 

(ํƒ€์œจ์— ๊ฑฐ๋ฆฌ ๊ฐœ๋… ์ถ”๊ฐ€ : 2๋ฃจํƒ€ = 1๋ฃจํƒ€ x 2) 

  • OPS  : ์‚ด์•„์„œ ๋ฉ€๋ฆฌ ๋‚˜๊ฐ€๋Š” ์ •๋„   

= ์ถœ๋ฃจ์œจ + ์žฅํƒ€์œจ 

 

 

 

*** ๋ณธ ๊ธฐ์ดˆ ์‹œ๊ฐํ™” ํ”„๋กœ์ ํŠธ์—์„œ๋Š” ์ถœ๋ฃจ์œจ - ์žฅํƒ€์œจ - ops - ํƒ€์œจ ์„ ๊ธฐ์ค€์œผ๋กœ best player ๋ฅผ ๋ถ„์„ํ•œ๋‹ค. ***

 

 

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
file  = './data/KBO_2019_player_gamestats.csv'
raw = pd.read_csv(file, encoding = 'cp949')

๋ฐ์ดํ„ฐ : KBO 2019 ์‹œ์ฆŒ ํƒ€์ž ๊ธฐ๋ก์ง€ ๋ฐ์ดํ„ฐ (๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ œ๊ณต) 

raw.head()

ํƒ€์ž & ๊ฒŒ์ž„ ๋ณ„๋กœ ๊ธฐ๋ก๋˜์–ด ์žˆ๋‹ค. 

raw.info()

raw.columns

์‚ฌ์šฉํ•  ์ปฌ๋Ÿผ์„ ๋ฝ‘๊ธฐ ์œ„ํ•ด ์ „์ฒด ์ปฌ๋Ÿผ์„ ์‚ดํŽด ๋ณธ๋‹ค. 

columns_select = ['ํŒ€', '์ด๋ฆ„', '์ƒ์ผ','์ผ์ž', '์ƒ๋Œ€','ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ', '๋ฃจํƒ€', 'ํƒ€์ ','๋ณผ๋„ท', '์‚ฌ๊ตฌ', 'ํฌ๋น„']
data = raw[columns_select]
data.head()

์›ํ•˜๋Š” ์ปฌ๋Ÿผ๋งŒ ๋ฝ‘์•„์„œ DataFrame์„ ์ƒ์„ฑํ•˜๊ณ , data์— ๋„ฃ์–ด์ฃผ์—ˆ๋‹ค. 

 

 

 

 

 


 KBO best player ๋ถ„์„ํ•˜๊ธฐ 

- ์„ ์ˆ˜๋ณ„ ๊ธฐ๋ก ์ง‘๊ณ„ 

data_player = data.pivot_table(index = ['ํŒ€','์ด๋ฆ„','์ƒ์ผ'], 
                               values = ['ํƒ€์ˆ˜','์•ˆํƒ€','ํ™ˆ๋Ÿฐ','๋ฃจํƒ€','ํƒ€์ ','๋ณผ๋„ท','์‚ฌ๊ตฌ','ํฌ๋น„'], 
                              aggfunc = 'sum')

data_player

pivot_table ์„ ์‚ฌ์šฉํ•˜์—ฌ ํŒ€, ์ด๋ฆ„ , ์ƒ์ผ์„ ๊ธฐ์ค€์œผ๋กœ ํƒ€์ˆ˜, ์•ˆํƒ€, ํ™ˆ๋Ÿฐ, ๋ฃจํƒ€, ํƒ€์ , ๋ณผ๋„ท, ์‚ฌ๊ตฌ, ํฌ๋น„์˜ ์ด ํ•ฉ๊ณ„๋ฅผ ์ง‘๊ณ„ํ•œ๋‹ค. 

data_player['ํƒ€์ˆ˜'].hist()

ํŒ๋‹ค์Šค์—์„œ ๊ธฐ๋ณธ์œผ๋กœ ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š” ์‹œ๋ฆฌ์ฆˆ.hist()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํƒ€์ˆ˜์˜ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ . ( ํƒ€์ˆ˜๊ฐ€ ์ ์€ ์„ ์ˆ˜๋ฅผ ์ œ์™ธํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋Š ์ •๋„๊ฐ€ ์ ์€์ง€ ํŒ๋‹จํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ ) ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ณด๋‹ˆ, ํƒ€์ˆ˜๊ฐ€ 50 ์ดํ•˜์ธ ์„ ์ˆ˜๋ฅผ ์ œ์™ธํ•˜๋ฉด ์ ์ ˆํ•  ๊ฒƒ ๊ฐ™๋‹ค. 

cond = data_player['ํƒ€์ˆ˜'] > 50
data_player = data_player[cond].reset_index()    # ๋‹ค์ค‘ ์ธ๋ฑ์Šค --> ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•˜๊ธฐ
data_player

ํƒ€์ˆ˜๊ฐ€ 50 ์ดˆ๊ณผ์ธ ์„ ์ˆ˜๋“ค๋งŒ ์ถ”๋ฆฌ๊ณ , ์ˆ˜์›”ํ•œ ์ปจํŠธ๋กค์„ ์œ„ํ•ด reset_index()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์ธ๋ฑ์Šค๋ฅผ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•ด์ค€๋‹ค. 

def cal_hit(df):
    '''
    - ํƒ€์œจ : ํƒ€๊ฒฉ์— ์„ฑ๊ณตํ•ด์„œ ์ง„๋ฃจํ•˜๋Š” ๋น„์œจ --> ์•ˆํƒ€ / ํƒ€์ˆ˜
    - ์ถœ๋ฃจ์œจ: ์‚ด์•„์„œ ์ง„๋ฃจํ•˜๋Š” ๋น„์œจ -->  (์•ˆํƒ€+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ)/(ํƒ€์ˆ˜+๋ณผ๋„ท+๋ชธ์—๋งž๋Š”๋ณผ+ํฌ์ƒํ”Œ๋ผ์ด)
    - ์žฅํƒ€์œจ : ํƒ€์œจ์— ์ง„๋ฃจํ•œ ๋ฒ ์ด์Šค ๊ฐ€์ค‘์น˜ ์ถ”๊ฐ€ -->   ๋ฃจํƒ€ / ํƒ€์ˆ˜
    - OPS : ์ถœ๋ฃจ์œจ + ์žฅํƒ€์œจ 
    '''
    
    df['ํƒ€์œจ'] = df['์•ˆํƒ€'] / df['ํƒ€์ˆ˜']
    df['์ถœ๋ฃจ์œจ'] = (df['์•ˆํƒ€'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ']) / (df['ํƒ€์ˆ˜'] + df['๋ณผ๋„ท'] + df['์‚ฌ๊ตฌ'] + df['ํฌ๋น„'])
    df['์žฅํƒ€์œจ'] = df['๋ฃจํƒ€'] / df['ํƒ€์ˆ˜']
    df['OPS'] = df['์ถœ๋ฃจ์œจ'] + df['์žฅํƒ€์œจ']
    return df

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ๋„ฃ์–ด ์ฃผ๋ฉด best player ๋ฅผ ์„ ์ •ํ•˜๋Š” ๊ธฐ์ค€์ธ ํƒ€์œจ, ์ถœ๋ฃจ์œจ, ์žฅํƒ€์œจ, OPS ์ปฌ๋Ÿผ์„ ๊ณ„์‚ฐํ•˜์—ฌ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. 

player_stat = cal_hit(data_player)
player_stat

 

- ์ถœ๋ฃจ์œจ -> ์žฅํƒ€์œจ -> OPS -> ํƒ€์œจ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌํ•˜์—ฌ best 10 player ๋ฝ‘์•„๋ณด๊ธฐ 

player_stat = player_stat.sort_values(by = ['์ถœ๋ฃจ์œจ','์žฅํƒ€์œจ','OPS', 'ํƒ€์œจ'], ascending = False)
player_stat = player_stat.reset_index(drop = True)
player_stat.head(10)

์ธ๋ฑ์Šค๊ฐ€ ๋’ค์ฃฝ๋ฐ•์ฃฝ์œผ๋กœ ์ •๋ ฌ๋˜๋ฏ€๋กœ, reset_index()๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธ๋ฑ์Šค๋ฅผ ์ •๋ ฌํ•ด ์ค€๋‹ค. 

 

๊ฒฐ๋ก ์ ์œผ๋กœ ์ถœ๋ฃจ์œจ์„ ๊ธฐ์ค€์œผ๋กœ ํ•œ KBO 2019 ์‹œ์ฆŒ best player๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

player_stat['์ด๋ฆ„'][:10]

 

 

 

 


 ํŒ€๋ณ„ ์„ ์ˆ˜๋“ค์˜ ์ถœ๋ฃจ์œจ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ 
import matplotlib
from matplotlib import font_manager, rc
import platform
import matplotlib.pyplot as plt
import seaborn as sns

# ์ด๋ฏธ์ง€ ํ•œ๊ธ€ ํ‘œ์‹œ ์„ค์ •
if platform.system() == 'Windows':  # ์œˆ๋„์šฐ์ธ ๊ฒฝ์šฐ ๋ง‘์€๊ณ ๋”•
    font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
    rc('font', family=font_name)
else:    # Mac ์ธ ๊ฒฝ์šฐ ์• ํ”Œ๊ณ ๋”•
    rc('font', family='AppleGothic')

#๊ทธ๋ž˜ํ”„์—์„œ ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ๊ฐ€ ํ‘œ์‹œ๋˜๋„๋ก ํ•˜๋Š” ์„ค์ •์ž…๋‹ˆ๋‹ค.
matplotlib.rcParams['axes.unicode_minus'] = False

์šฐ์„  ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ธ์„ ๋•Œ ํ•œ๊ธ€์ด ๊นจ์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ํฐํŠธ์„ค์ •์„ ํ•˜๊ณ  , ๋งˆ์ด๋„ˆ์Šค ๊ธฐํ˜ธ ํ‘œ์‹œ ์—ฌ๋ถ€๋ฅผ ์„ค์ •ํ•œ๋‹ค. 

 

 

- ํŒ€ ๋ณ„ ์ถœ๋ฃจ์œจ ๋ถ„ํฌ ( boxplot & swarmplot ) 

sns.boxplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ')

๊ฐœ์ˆ˜๋„ ์ง๊ด€์ ์œผ๋กœ ๋ณด๊ธฐ ์œ„ํ•ด swarmplot์„ ์ถ”๊ฐ€ ํ•œ๋‹ค. 

sns.swarmplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ')
sns.boxplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ')

์ƒ‰๊น”์ด ๊ฒน์น˜๊ธฐ ๋•Œ๋ฌธ์— boxplot์— ์—ฌ๋Ÿฌ ์˜ต์…˜์„ ์ฃผ์–ด ๋ณด๊ธฐ ํŽธํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ์ค€๋‹ค. 

sns.swarmplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ')
sns.boxplot(data = player_stat, x = 'ํŒ€', y = '์ถœ๋ฃจ์œจ',
            showcaps=False,             # ๋ฐ•์Šค ์ƒ๋‹จ ๊ฐ€๋กœ๋ผ์ธ ๋ณด์ด์ง€ ์•Š๊ธฐ
            whiskerprops={'linewidth':0}, # ๋ฐ•์Šค ์ƒ๋‹จ ์„ธ๋กœ ๋ผ์ธ ๋ณด์ด์ง€ ์•Š๊ธฐ 
            showfliers=False,           # ๋ฐ•์Šค ๋ฒ”์œ„ ๋ฒ—์–ด๋‚œ ์•„์›ƒ๋ผ์ด์–ด ํ‘œ์‹œํ•˜์ง€ ์•Š๊ธฐ
            boxprops={'facecolor':'None'}, # ๋ฐ•์Šค ์ƒ‰์ƒ ์ง€์šฐ๊ธฐ
           )

์ด๋ ‡๊ฒŒ ํŒ€๋ณ„ ์„ ์ˆ˜๋“ค์˜ ์ถœ๋ฃจ์œจ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. NC ํŒ€์ด ๊ฐ€์žฅ ๋†’์€ ์ถœ๋ฃจ์œจ์„ ๊ฐ€์ง„ ์„ ์ˆ˜๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ๊ณ , ์ถœ๋ฃจ์œจ์ด ๊ฐ€์žฅ ๋งŽ์ด ๋ชฐ๋ ค ์žˆ๋Š” ํŒ€์€ LGํŒ€์ธ ๊ฒƒ์œผ๋กœ ๋ณด์—ฌ์ง€๋ฉฐ, ์ถœ๋ฃจ์œจ์˜ ์ค‘์œ„์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ํŒ€์€ ๋‘์‚ฐ์ธ ๊ฒƒ ๋“ฑ์„ ํ•œ ๋ˆˆ์— ์•Œ ์ˆ˜ ์žˆ๋‹ค.    

 

 

 

 


 ๋งˆ๋ฌด๋ฆฌ : csv ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๊ธฐ 
file = './data/player_stat.csv'
player_stat.to_csv(file, encoding = 'cp949', index = False)

๋ณธ ํฌ์ŠคํŒ…์—์„œ ๋ถ„์„์— ์‚ฌ์šฉํ•œ player_stat์„ csv ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. 


 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/52

 

[python ์‹œ๊ฐํ™”] 1. seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (distplot, relplot, jointplot, pairplot, boxplot, swarmplot, heatmap)

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ ์ˆ˜์น˜ํ˜• x ์ˆ˜์น˜ํ˜• : scatterplot, lmplot, jointplot ์ˆ˜์น˜ํ˜• x ์นดํ…Œ๊ณ ๋ฆฌํ˜• : boxplot, violinplot, barplot, heatmap ์ˆ˜์น˜ํ˜• x ์œ„์น˜ํ˜• : folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” seaborn ๋ผ์ด..

silvercoding.tistory.com

 

 

 

 

1. folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉํ•ด๋ณด๊ธฐ 


 folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
!pip install folium

folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ํ•˜๊ธฐ 

import folium

folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ import ํ•ด์ค€๋‹ค. 


 folium ์ง€๋„ ์‹œ๊ฐํ™” ์‚ฌ์šฉ ๋ฒ• 

์ง€๋„ ์‹œ๊ฐํ™”

  • ์ง€๋„์ƒ์„ฑํ•˜๊ธฐ
    • m = folium.Map(location = [์œ„๋„, ๊ฒฝ๋„], zoom_start = ํ™•๋Œ€์ •๋„)
  • ์ •๋ณด ์ถ”๊ฐ€ํ•˜๊ธฐ
    • ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
      • folium.Marker([์œ„๋„, ๊ฒฝ๋„]).add_to(m)
    • ์› ์ถ”๊ฐ€ํ•˜๊ธฐ
      • folium.CircleMarker([์œ„๋„, ๊ฒฝ๋„], radius= ์›ํฌ๊ธฐ).add_to(m)
    • ์ถ”๊ฐ€์˜ต์…˜:
      • tooltip="๋งˆ์šฐ์Šค ์˜ฌ๋ฆฌ๋ฉด ๋ณด์—ฌ์งˆ ์ •๋ณด"
      • popup="ํด๋ฆญํ•˜๋ฉด ๋ณด์—ฌ์งˆ ์ •๋ณด"
    • ๊ธฐํƒ€) ClickForMarker('์ฒดํฌ').add_to(m) ์ง€๋„์—์„œ ํด๋ฆญํ•  ๊ฒฝ์šฐ ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ

 ์„œ์šธ์—ญ ์ง€๋„ ์‹œ๊ฐํ™”
m = folium.Map(location=[37.5536067,126.9674308],  
               zoom_start=12)
m

 

- ๋งˆ์ปค ์ถ”๊ฐ€ 

folium.Marker([37.5536067,126.9674308],    # ์„œ์šธ์—ญ์œ„์น˜
              tooltip="์„œ์šธ์—ญ(๋งˆ์šฐ์Šค์˜ฌ๋ฆฌ๋ฉด๋ณด์—ฌ์ง)",
              popup="์„œ์šธ์—ญ(ํด๋ฆญํ•˜๋ฉด ๋ณด์—ฌ์ง)",
              ).add_to(m)
m

tooltip๊ณผ popup์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งˆ์šฐ์Šค๋ฅผ ์˜ฌ๋ฆฌ๊ฑฐ๋‚˜ , ํด๋ฆญํ•˜๋ฉด ๋ฉ”์‹œ์ง€๋ฅผ ๋„์šธ ์ˆ˜ ์žˆ๋‹ค. 

 

- ์„œํด๋งˆ์ปค ์ถ”๊ฐ€ 

folium.CircleMarker([37.5536067,126.9674308],
                    radius=20,
                    tooltip = '๋งˆ์šฐ์Šค์˜ฌ๋ฆด๊ฒฝ์šฐ'
               ).add_to(m)
m

 

- ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ MiniMap 

from folium.plugins import MiniMap

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],   # ๊ธฐ์ค€์ขŒํ‘œ: ์„œ์šธ์—ญ
               zoom_start=12)

# ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ํ•˜๊ธฐ
minimap = MiniMap() 
minimap.add_to(m)


# ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
folium.Marker([37.5536067,126.9674308],    # ์„œ์šธ์—ญ์œ„์น˜
              tooltip="์„œ์šธ์—ญ(๋งˆ์šฐ์Šค์˜ฌ๋ฆฌ๋ฉด๋ณด์—ฌ์ง)",
              popup="์„œ์šธ์—ญ(ํด๋ฆญํ•˜๋ฉด ๋ณด์—ฌ์ง)",
              ).add_to(m)
m

๋ฏธ๋‹ˆ๋งต์„ ์ƒ์„ฑํ•˜๊ณ , add_to๋กœ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค. 

 

 

 - ํด๋ฆญํ•ด์„œ ๋งˆํฌ ์ถ”๊ฐ€ํ•˜๊ธฐ 

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],   # ๊ธฐ์ค€์ขŒํ‘œ: ์„œ์šธ์—ญ
               zoom_start=12)

# ์จํด๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
folium.CircleMarker([37.5536067,126.9674308],
                    radius=20,
                    tooltip = '๋งˆ์šฐ์Šค์˜ฌ๋ฆด๊ฒฝ์šฐ'
               ).add_to(m)
folium.ClickForMarker('์ฒดํฌ์ถ”๊ฐ€').add_to(m)    
m

๋งˆ์šฐ์Šค๋กœ ํด๋ฆญํ•ด์„œ ์ง์ ‘ ๋งˆ์ปค๋ฅผ ์ถ”๊ฐ€ํ•ด์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. (ClickForMarker ์‚ฌ์šฉ) 


 

 

 

 

2. ์‘์šฉ : ์„œ์šธ ๋Œ€ํ”ผ์†Œ ํ˜„ํ™ฉ ์ง€๋„ ๋งŒ๋“ค๊ธฐ 


< ๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ > 

http://data.seoul.go.kr/dataList/OA-2189/S/1/datasetView.do

 

์„œ์šธ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ

๋ชจ๋“  ์„œ์šธ์‹œ๋ฏผ์„ ์œ„ํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ์—์„œ ์„œ์šธ์‹œ์™€ ์—ฐ๊ณ„ ๊ธฐ๊ด€์ด ๊ณต๊ฐœํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธ์‹œ์™€ ๊ด€๋ จ๋œ ๋‹ค์–‘ํ•œ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”.

data.seoul.go.kr


import pandas as pd

DataFrame์„ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— pandas๋ฅผ import ํ•ด์ค€๋‹ค. 

file = './data/์„œ์šธ์‹œ ๋Œ€ํ”ผ์†Œ ๋ฐฉ์žฌ์‹œ์„ค ํ˜„ํ™ฉ (์ขŒํ‘œ๊ณ„_ WGS1984).csv'
raw = pd.read_csv(file, encoding = 'cp949')   # encoding = 'cp949' : MS ํ”„๋กœ๊ทธ๋žจ ์‚ฌ์šฉ์‹œ, ๊ทธ์™ธ์˜ ๊ฒฝ์šฐ encoding = 'utf-8'  (๊ธฐ๋ณธ๊ฐ’)
raw.head()

raw.info()

 

์ˆ˜์—…์—์„œ๋Š” ์œ„๋„, ๊ฒฝ๋„๋ฅผ ์„ค์ •ํ•˜์—ฌ ๋งˆ์ปค๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  , ๋งˆ์šฐ์Šค๋ฅผ ๋งˆ์ปค ์œ„์— ์˜ฌ๋ฆฌ๋ฉด ๋Œ€ํ”ผ์†Œ ๋ช…์นญ์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ ๊นŒ์ง€ ๋ฐฐ์› ๊ณ  , ๊ฑฐ๊ธฐ์— ๋”ํ•ด์„œ ํด๋ฆญํ•˜๋ฉด ์ตœ๋Œ€ ์ˆ˜์šฉ์ธ์›์ด ๋‚˜์˜ค๋„๋ก ํ•˜๋Š” ์ง€๋„๋ฅผ ๋งŒ๋“ค์–ด ๋ณธ๋‹ค. 

i = 7

lat = raw.loc[i, '์œ„๋„']
long = raw.loc[i, '๊ฒฝ๋„']
name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']

print(lat, long, name, num)

37.5365343 126.9650202 ๋‚จ์ •์ดˆ๋“ฑํ•™๊ต 300

์šฐ์„  ์ธ๋ฑ์Šค๊ฐ€ 7์ผ ๋•Œ์˜ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์–ป์–ด๋ณด๊ธฐ 

for i in range(len(raw)):
    lat = raw.loc[i, '์œ„๋„']
    long = raw.loc[i, '๊ฒฝ๋„']
    name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
    num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']


    print(name, lat, long, num)

์ด 694๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‚˜์™”์„ ๊ฒƒ์ด๋‹ค. 

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],
               zoom_start=12)
m
# ๋Œ€ํ”ผ์†Œ ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ

for i in range(len(raw)):
    lat = raw.loc[i, '์œ„๋„']
    long = raw.loc[i, '๊ฒฝ๋„']
    name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
    num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']


    folium.Marker([lat, long],tooltip= name, popup='%d๋ช…'%num).add_to(m)    
m

for๋ฌธ์„ ์ด์šฉํ•˜์—ฌ ๋Œ€ํ”ผ์†Œ ๋ชจ๋“  ๊ณณ์— ๋งˆ์ปค๋ฅผ ์ฐ์–ด ์ค€๋‹ค. 

 


* ๋ฌธ์ œ์  : ๋งˆ์ปค๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์•„์„œ ์ง€๋„๋ฅผ ์‚ดํŽด๋ณด๊ธฐ์— ์–ด๋ ค์›€์ด ์ƒ๊ธด๋‹ค 

* ํ•ด๊ฒฐ : MarkerCluster ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ทผ์ฒ˜์— ์žˆ๋Š” ๋งˆ์ปค๋“ค ๋ผ๋ฆฌ ๊ทธ๋ฃน์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. 


 MarkerCluster ์‚ฌ์šฉํ•˜์—ฌ ๊ทธ๋ฃน ์ƒ์„ฑํ•˜๊ธฐ 
# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],
               zoom_start=12)
marker_cluster = MarkerCluster().add_to(m)  # ํด๋Ÿฌ์Šคํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ

# ๋Œ€ํ”ผ์†Œ ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ

for i in range(len(raw)):
    lat = raw.loc[i, '์œ„๋„']
    long = raw.loc[i, '๊ฒฝ๋„']
    name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
    num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']


    folium.Marker([lat, long],tooltip= name, popup='%d๋ช…'%num).add_to(marker_cluster)    
m

๊ทธ๋ฃน์„ ํด๋ฆญํ•˜์—ฌ ์ž์„ธํžˆ ํ™•๋Œ€๊ฐ€ ๋˜๋ฉด, ๋งˆ์ปค๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋ณด๋‹ค ์ง€๋„๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. 

 

 

- ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ 

from folium.plugins import MiniMap

# ์ง€๋„ ์ƒ์„ฑํ•˜๊ธฐ
m = folium.Map(location=[37.5536067,126.9674308],
               zoom_start=12)
marker_cluster = MarkerCluster().add_to(m)  # ํด๋Ÿฌ์Šคํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ


# ๋ฏธ๋‹ˆ๋งต ์ถ”๊ฐ€ 
minimap = MiniMap()
m.add_child(minimap) 


# ๋Œ€ํ”ผ์†Œ ๋งˆ์ปค ์ถ”๊ฐ€ํ•˜๊ธฐ
for i in range(len(raw)):
    lat = raw.loc[i, '์œ„๋„']
    long = raw.loc[i, '๊ฒฝ๋„']
    name = raw.loc[i,'๋Œ€ํ”ผ์†Œ๋ช…์นญ']
    num = raw.loc[i, '์ตœ๋Œ€์ˆ˜์šฉ์ธ์›']


    folium.Marker([lat, long],tooltip= name, popup='%d๋ช…'%num).add_to(marker_cluster)    
m


 ๋งˆ๋ฌด๋ฆฌ : ์ง€๋„ ์ €์žฅ (html) 
m.save('./data/Sheltermap.html')

html๋กœ ์ €์žฅํ•˜์—ฌ ํฐ ํ™”๋ฉด์œผ๋กœ ์–ธ์ œ๋“  ์ง€๋„๋ฅผ ๊บผ๋‚ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 


 

+ Recent posts