๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/48

 

[python pandas] pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (1)

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ * ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ ๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ : read_excel(), read_csv() ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ : df.loc(), df.iloc() ์ธ๋ฑ์Šค/ ์ปฌ๋Ÿผ ๋ณ€๊ฒฝํ•˜๊ธฐ : columns/ index , reset_index()  pandas vs excel panda..

silvercoding.tistory.com

 

 


 

 1. pandas ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
import pandas as pd

 

 2. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
fpath = './data/exam.xlsx' 
data = pd.read_excel(fpath, index_col = '๋ฒˆํ˜ธ')

index_col='๋ฒˆํ˜ธ' ๋กœ ์ง€์ •ํ•˜์—ฌ ์—‘์…€ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

 

 

* head(), info(), describe() ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๋Š” ์Šต๊ด€ ๊ฐ–๊ธฐ 

data.head()

data.info()

 

data.describe()


 

 

 3. ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ 

df[ '์ปฌ๋Ÿผ๋ช…' ] =  data ( df.์ปฌ๋Ÿผ๋ช… = data ํ˜•ํƒœ๋Š” ์‚ฌ์šฉ ๋ถˆ๊ฐ€๋Šฅ )  

- ํ•˜๋‚˜์˜ ๊ฐ’ ์ถ”๊ฐ€ : ์ „์ฒด ๋ชจ๋‘ ๋™์ผํ•œ ๊ฐ’์œผ๋กœ ์ถ”๊ฐ€๋จ 

- ๊ทธ๋ฃน ์ถ”๊ฐ€ : ๋ฆฌ์ŠคํŠธ, ํŒ๋‹ค์Šค์˜ ์‹œ๋ฆฌ์ฆˆ๋กœ ์ถ”๊ฐ€ 

 

data['์ˆ˜ํ•™']
data.์ˆ˜ํ•™

๋ฐ์ดํ„ฐ๋ฅผ ์„ ํƒํ•  ๋• ์œ„์™€ ๊ฐ™์€ ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ž‘์„ฑํ•ด์ฃผ์—ˆ๋‹ค. 

 

- ํ•œ๊ฐœ ๊ฐ’ ์ถ”๊ฐ€ 

data['์Œ์•…'] = 90             
data.head()

๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•  ๋• data.์Œ์•… ์˜ ํ˜•ํƒœ๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅ ํ•˜๋‹ค. ํ•œ๊ฐœ์˜ ๊ฐ’์„ ์ถ”๊ฐ€ํ•˜๋ฉด ๋ชจ๋“  row์— ๊ฐ™์€ ๊ฐ’์ด ๋“ค์–ด๊ฐ€๊ฒŒ ๋œ๋‹ค. 

 

- ์—ฌ๋Ÿฌ ๊ฐ’ ์ถ”๊ฐ€ 

data['์ฒด์œก'] =  [100, 80, 60]
data.head()

๋ฆฌ์ŠคํŠธ๋กœ ์—ฌ๋Ÿฌ ๊ฐ’์„ ์ถ”๊ฐ€ํ•ด ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. ์ด ๋•Œ ์ฃผ์˜ํ•  ์ ์€ ๋ฆฌ์ŠคํŠธ ์›์†Œ ๊ฐœ์ˆ˜์™€ row๊ฐœ์ˆ˜๊ฐ€ ๊ฐ™์•„์•ผ ํ•œ๋‹ค. 

 

data['๊ตญ์˜์ˆ˜'] =  (data['๊ตญ์–ด'] + data['์˜์–ด'] + data['์ˆ˜ํ•™'] ) / 3
data.head()

์ด๋ ‡๊ฒŒ ์ปฌ๋Ÿผ ๊ฐ„์˜ ์—ฐ์‚ฐ์„ ํ†ตํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ๋งŒ๋“ค์–ด ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

 

 4. ๋ฐ์ดํ„ฐ ํ‘œ ๋ณ‘ํ•ฉํ•˜๊ธฐ 
fpath = './data/exam.xlsx'
A = pd.read_excel(fpath, index_col = '๋ฒˆํ˜ธ')
A.head()

ํŒŒ์ผ์„ ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์™€์„œ A ๋ณ€์ˆ˜์— ์ €์žฅํ•ด ์ค€๋‹ค. 

fpath2 = './data/exam_extra.xlsx'
B = pd.read_excel(fpath2, index_col = '๋ฒˆํ˜ธ')
B.head()

์ถ”๊ฐ€ ํ•  ์—‘์…€ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์™€ B ๋ณ€์ˆ˜์— ์ €์žฅํ•ด ์ค€๋‹ค.

 

 

- merge()

์ถœ์ฒ˜ - ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ

๋ณ‘ํ•ฉ ๊ธฐ์ค€์„ ์ธ์ž์— ๋„ฃ์–ด ์„ค์ •ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋•Œ, left_on ๊ณผ left_index ์ค‘ 1๊ฐœright_on ๊ณผ right_index ์ค‘ 1๊ฐœ๋ฅผ ์จ์•ผ ํ•˜๊ณ , ๋‘๊ฐ€์ง€๋ฅผ ํ•œ๋ฒˆ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค

total = pd.merge(A, B, how = 'left', left_index = True, right_index = True)
total.head()

left์ผ ๊ฒฝ์šฐ A๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋ณ‘์ด ๋œ๋‹ค. 4๋ฒˆ, 5๋ฒˆ์€ ๋‚˜์˜ค์ง€ ์•Š๊ณ , B์˜ 3๋ฒˆ์€ NaN์œผ๋กœ ์ฑ„์›Œ์ง„๋‹ค. 

pd.merge(A, B, how = 'right', left_index = True, right_index = True)

์œ„์™€ ๊ฐ™์ด ์ž‘์„ฑ๋˜์—ˆ์„ ๋•Œ , B์— ๋งž์ถ”์–ด ํ•ฉ๋ณ‘๋œ๋‹ค. ๋”ฐ๋ผ์„œ 3๋ฒˆ์€ ์—†๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

pd.merge(A, B, how = 'inner', left_index = True, right_index = True)

inner๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๊ฒฝ์šฐ , A ์™€ B ๋ชจ๋‘ ์กด์žฌํ•˜๋Š” ์ธ๋ฑ์Šค์˜๋งŒ ํ•ฉ๋ณ‘ํ•ด์ค€๋‹ค. 

pd.merge(A, B, how = 'outer', left_index= True, right_index=True)

outer๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๊ฒฝ์šฐ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ๋ณ‘ํ•ด ์ค€๋‹ค. 

 

 

 

 5 . ์ €์žฅํ•˜๊ธฐ 
total = pd.merge(A, B, how = 'left', left_index = True, right_index = True)
total

์ตœ์ข… ๋ชจ๋ธ์€ left, A๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋ณ‘ํ•œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ total ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•˜๊ณ  , ์ €์žฅ์„ ํ•ด๋ณด์ž ! 

total.to_excel('./data/exam_total.xlsx')

total.to_excel('./data/exam_total_withoutindex.xlsx', index = False)

 

index = False ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ '๋ฒˆํ˜ธ' ์ปฌ๋Ÿผ์„ ์ œ์™ธํ•˜๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค. 

 


 

 

 

+ Recent posts