๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/50

 

[python pandas] 3. pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (3) - ์ง‘๊ณ„, ๊ฒฐ์ธก๊ฐ’, ์ •๋ ฌ

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/49 https://silvercoding.tistory.com/48 [python pandas] pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (1) ๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ * ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ ๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ :..

silvercoding.tistory.com

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
file = './data/babyNamesUS.csv'
raw = pd.read_csv(file)
raw.head()

raw.info()

 


 

 ๋‚จ๋…€ ๊ตฌ๋ถ„์—†์ด '๋งŽ์ด' ์‚ฌ์šฉ๋˜๋Š” ๊ณตํ†ต ์ด๋ฆ„ ? 

idea : ๋‚จ๋…€ ์ด๋ฆ„ ๊ฐœ์ˆ˜์˜ ๋น„์œจ ์ฐจ์ด๊ฐ€ ์ž‘์„์ˆ˜๋ก ์„ฑ๋ณ„ ๊ตฌ๋ถ„์ด ์—†๋Š” ์ด๋ฆ„์ผ ๊ฒƒ์ด๋‹ค !  

# ์„ฑ๋ณ„์— ๋”ฐ๋ฅธ ์ด๋ฆ„ ๊ฐœ์ˆ˜ ์ง‘๊ณ„
name_df = raw.pivot_table(index = 'Name', columns = 'Sex', values = 'Number', aggfunc='sum')

# ๊ฒฐ์ธก๊ฐ’ ์ฑ„์šฐ๊ธฐ (0) 
name_df = name_df.fillna(0)

# float -> int 
name_df = name_df.astype(int)
name_df.head()

์—ฌ๊ธฐ๊นŒ์ง€ ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ํ–ˆ๋˜ ๋‚ด์šฉ์ด๋‹ค. 

name_df['Sum'] = name_df['M'] + name_df['F']
name_df.head()

๋‚จ๋…€ ์ด๋ฆ„ ๊ฐœ์ˆ˜๋ฅผ ๋ชจ๋‘ ๋”ํ•ด์„œ sum ์ด๋ผ๋Š” ์ปฌ๋Ÿผ์„ ์ƒ์„ฑํ•œ๋‹ค. 

# ๋‚จ, ๋…€ ๋น„์œจ ๊ณ„์‚ฐ 
name_df['F_ratio'] = name_df['F'] / name_df['Sum']
name_df['M_ratio'] = name_df['M'] / name_df['Sum']

# ๋‚จ, ๋…€ ๋น„์œจ ๊ฐ„ ์ฐจ์ด
name_df['M_F_Gap'] = abs(name_df['F_ratio'] - name_df['M_ratio'])
name_df.head()

-1 ~ 1 ์˜ ๋ฒ”์œ„๋ฅผ abs() (์ ˆ๋Œ“๊ฐ’) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 0 ~ 1 ๋ฒ”์œ„๋กœ ๋ฐ”๊พธ์–ด ์ค€๋‹ค. 

# ์ด๋ฆ„ ์ด ๊ฐœ์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ 
name_df = name_df.sort_values(by = 'Sum', ascending=False)
name_df.head(20)

๋งŽ์ด ์‚ฌ์šฉ๋œ ์ด๋ฆ„์„ ๋ฝ‘๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์šฐ์„  ์ด ํ•ฉ๊ณ„ ์ปฌ๋Ÿผ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•ด์ค€๋‹ค. 

cond = name_df['M_F_Gap'] < 0.1
name_df[cond].head(10)

์ด ๋•Œ ๋น„์œจ์ฐจ์ด๊ฐ€ ์ ์€ ๊ฒƒ์„ 0.1 ๋ฏธ๋งŒ์œผ๋กœ ๊ธฐ์ค€ ์žก๊ณ ,  M_F_Gap ์ปฌ๋Ÿผ์ด 0.1 ๋ณด๋‹ค ์ž‘์€ ํ–‰๋“ค์„ ์ถœ๋ ฅ์‹œํ‚จ๋‹ค.  

# ์„ฑ๋ณ„ ๊ตฌ๋ถ„์—†์ด ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฆ„ Top 10 
name_df[cond].head(10).index

 

 

 

 

 

 ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ๋ฏธ๊ตญ์˜ ์ด๋ฆ„ ? ( ์ตœ๊ทผ ํŠธ๋ Œ๋“œ ) 

idea : ์„ธ๋Œ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ตœ๊ทผ ์„ธ๋Œ€(2020, 1990) ์ด๋ฆ„ ๊ฐœ์ˆ˜์˜ ๋น„์œจ์ด ํฐ ์ด๋ฆ„์ด ์ตœ๊ทผ ํŠธ๋ Œ๋“œ์— ๋งž๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฏธ๊ตญ ์ด๋ฆ„์ผ ๊ฒƒ์ด๋‹ค ! 

raw.head()

# unique() ๋ฅผ ํ†ตํ•ด, ๊ธฐ๊ฐ„์— ๋“ค์–ด๊ฐ€๋Š” ๊ฐ’๋“ค์„ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค. 
raw['YearOfBirth'].unique()

array([1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015], dtype=int64)

 

 

* ์„ธ๋Œ€ ๋‚˜๋ˆ„๊ธฐ 

ํ•œ ์„ธ๋Œ€ ๋‚˜๋ˆ„๋Š” ๊ธฐ์ค€ 30๋…„ : 2020๋…„ ๊ธฐ์ค€ 30๋…„์”ฉ ๊ตฌ๋ถ„

  • 1930๋…„๋Œ€ ์ด์ „
  • 1960๋…„๋Œ€ ์ด์ „
  • 1990๋…„๋Œ€ ์ด์ „
  • 2020๋…„ ์ด์ „
year_class_list = [ ]

for year in raw['YearOfBirth']:
    if year <= 1930: 
        year_class = '1930๋…„์ด์ „'
    elif year<= 1960: 
        year_class = '1960๋…„์ด์ „'
    elif year <= 1990:
        year_class = '1990๋…„์ด์ „'
    else:
        year_class = '2020๋…„์ด์ „'
    year_class_list.append(year_class)

์œ„์™€๊ฐ™์ด ๋ฐ˜๋ณต๋ฌธ๊ณผ if๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ถœ์ƒ๋…„๋„๋ฅผ 4๊ฐœ์˜ ์„ธ๋Œ€ ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„์–ด ์ค€๋‹ค. 

raw['year_class'] = year_class_list
raw.head()

์„ธ๋Œ€ ๊ทธ๋ฃน์„ ์ €์žฅํ•œ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ด์šฉํ•˜์—ฌ year_class ์ปฌ๋Ÿผ์„ ์ƒ์„ฑํ•œ๋‹ค. 

name_period = raw.pivot_table(index = ['Name', 'Sex'], columns = 'year_class', values = 'Number', aggfunc='sum')
name_period = name_period.fillna(0)
name_period = name_period.astype(int)
name_period.head()

์ด๋ฆ„๊ณผ ์„ฑ๋ณ„์„ ์ธ๋ฑ์Šค๋กœ ์„ค์ •ํ•˜๊ณ , year_class์— ๋”ฐ๋ฅธ number์˜ ํ•ฉ๊ณ„๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. 

name_period['sum'] = name_period.sum(axis = 1)
name_period.head()

์ด๋ฆ„ ์ด ๊ฐœ์ˆ˜๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด sum(axis=1) ์„ ์‚ฌ์šฉํ•œ๋‹ค.  axis=1์ด๋ฉด ๊ฐ€๋กœ๋ฐฉํ–ฅ์œผ๋กœ ๊ณ„์‚ฐ์„ ํ•˜๊ฒŒ ๋œ๋‹ค. 

# ์„ธ๋Œ€ ๋ณ„ ๋น„์œจ ๊ณ„์‚ฐ 
for col in name_period.columns:
    col_new = col+"๋น„์œจ"
    name_period[col_new] = name_period[col] / name_period['sum']
    
name_period.head()

์„ธ๋Œ€ ๋ณ„ ๋น„์œจ์„ ๊ณ„์‚ฐํ•˜์—ฌ ๊ฐ ์ปฌ๋Ÿผ์„ ๋งŒ๋“ค์–ด ์ค€๋‹ค. 

# ์ด๋ฆ„ ์‚ฌ์šฉ์ˆ˜ ํ•ฉ๊ณ„, 2020๋…„ ์ด์ „ ๋น„์œจ, 1990๋…„์ด์ „ ๋น„์œจ ๊ธฐ์ค€ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ 
name_period = name_period.sort_values(by = ['sum', '2020๋…„์ด์ „๋น„์œจ','1990๋…„์ด์ „๋น„์œจ'], ascending=False)
name_period

1์ˆœ์œ„ ์ด๋ฆ„ ๊ฐœ์ˆ˜ ์ด ํ•ฉ , 2์ˆœ์œ„ 2020๋…„ ์ด์ „ ๋น„์œจ , 3์ˆœ์œ„ 1990๋…„ ์ด์ „ ๋น„์œจ ๋กœ ์ •๋ ฌ์„ ํ•˜์—ฌ ์ตœ์‹  ํŠธ๋ Œ๋“œ์— ๋งž๋Š” ๋ฏธ๊ตญ ๋Œ€ํ‘œ์ด๋ฆ„์„ ์•Œ์•„๋ณธ๋‹ค. 

# ์ธ๋ฑ์Šค๊ฐ€ ์—ฌ๋Ÿฌ ๋ ˆ๋ฒจ๋กœ ๋˜์–ด์žˆ์„ ๊ฒฝ์šฐ, ์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•ด ์ปจํŠธ๋กค ํ•˜๋Š” ๊ฒƒ์€ ๋ณต์žก
# reset_index()๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ธ๋ฑ์Šค๋กœ ์„ค์ •๋œ ์ด๋ฆ„๊ณผ ์„ฑ๋ณ„์„ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝ
name_period = name_period.reset_index()
name_period.head()

์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ปจํŠธ๋กค์ด ์–ด๋ ค์šฐ๋ฏ€๋กœ ํ•„์š”ํ•œ ์ง‘๊ณ„, ์—ฐ์‚ฐ์ด ๋๋‚œ ๋’ค์—๋Š” reset_index๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ column์œผ๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

# ๋‚จ์ž ์ด๋ฆ„๋งŒ ์„ ํƒ
cond = name_period['Sex'] =='M'
name_period[cond].head(10)

์„ฑ๋ณ„์ด ๋‚จ์„ฑ์ธ ์ด๋ฆ„๋“ค ์ค‘์—์„œ ์ƒ์œ„ 10๊ฐœ๋ฅผ ๋ฝ‘์•„๋ณด๋ฉด ์œ„์™€ ๊ฐ™๋‹ค. 

# ์ด๋ฒˆ์—๋Š” ์—ฌ์ž์ด๋ฆ„
cond = name_period['Sex'] =='F'
name_period[cond].head(10)

์ด๋ฒˆ์—” ์„ฑ๋ณ„์ด ์—ฌ์„ฑ์ธ ์ด๋ฆ„์˜ ์ƒ์œ„ 10๊ฐœ๋ฅผ ๋ฝ‘์€ ๊ฒƒ์ด๋‹ค. 

 

๊ทธ๋Ÿฐ๋ฐ ์•„์ง์€ ์ด์ƒํ•˜๋‹ค. ํŠนํžˆ ์„ฑ๋ณ„์ด ์—ฌ์ž์ธ ์ด๋ฆ„์˜ ํ‘œ์—์„œ ์ฒซ๋ฒˆ์งธ row๋Š” 1960๋…„ ์ด์ „ ์„ธ๋Œ€์—์„œ ์•ฝ 50%์˜ ๋น„์œจ์„ ์ฐจ์ง€ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์กฐ๊ฑด์„ ๊ฑด๋‹ค. 

cond_age = name_period['2020๋…„์ด์ „๋น„์œจ'] > 0.3
cond_sex = name_period['Sex'] == 'M'
cond = cond_age & cond_sex
name_period[cond].head(5)

 

2020๋…„ ์ด์ „๋น„์œจ์ด 0.3 ์ด์ƒ์ด๋ฉด์„œ ์„ฑ๋ณ„์ด ๋‚จ์„ฑ์ธ ์กฐ๊ฑด์ธ row๋ฅผ ์„ ํƒํ•œ๋‹ค. 

๊ฒฐ๊ณผ : < ๋‚จ์„ฑ Top 5 ์ด๋ฆ„ Christopher, Daniel, Matthew, Anthony, Andrew >

cond_age = name_period['2020๋…„์ด์ „๋น„์œจ'] > 0.3
cond_sex = name_period['Sex'] == 'F'
cond = cond_age & cond_sex
name_period[cond].head(5)

์—ฌ์„ฑ์˜ ๊ฒฝ์šฐ๋„ ๋™์ผํ•œ ์กฐ๊ฑด์œผ๋กœ ์ง„ํ–‰ํ•œ๋‹ค. 

๊ฒฐ๊ณผ : < ์—ฌ์„ฑ Top 5 ์ด๋ฆ„ Jessica, Sarah, Ashley, Stephanie, Emily >

 


 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/49

 

[python pandas] 2. pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (2) - ์ถ”๊ฐ€, ๋ณ‘ํ•ฉ, ์ €์žฅ

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/48 [python pandas] pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (1) ๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ * ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ ๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ : read_excel(), read_csv() ๋ฐ์ดํ„ฐ ์„ ํƒ..

silvercoding.tistory.com

 

 

 


 1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd

pandas import ํ•ด์ฃผ๊ธฐ 

file = './data/babyNamesUS.csv'
raw = pd.read_csv(file)

์˜ค๋Š˜ ํ•™์Šตํ•  csv ํŒŒ์ผ์„ pandas๋กœ ๋ถˆ๋Ÿฌ์™€ ์ค€๋‹ค. 

raw.head()

raw.info()

1048575 ๊ฐœ์˜ row์™€ 5๊ฐœ์˜ column ์ด ์กด์žฌํ•˜๋ฉฐ, ๊ฒฐ์ธก๊ฐ’์„ ์—†๋‹ค. 

[ ์ปฌ๋Ÿผ ์ •๋ณด : ์ฃผ, ์„ฑ๋ณ„, ์ถœ์ƒ๋…„๋„, ์ด๋ฆ„, ์ด๋ฆ„ ๊ฐœ์ˆ˜ ]

 


 

 2. ์ง‘๊ณ„ํ•˜๊ธฐ ( pivot_table ) 

pd.pivot_table(index = '์ปฌ๋Ÿผ๋ช…', columns = '์ปฌ๋Ÿผ๋ช…', values = '์ปฌ๋Ÿผ๋ช…', aggfunc = 'sum')

raw.pivot_table(index = 'Name', values = 'Number', aggfunc='sum')

์ด๋ฆ„ ๋ณ„ ๋นˆ๋„์ˆ˜ ์ง‘๊ณ„ํ•ด์„œ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. columns๋ฅผ ๋”ฐ๋กœ ์„ค์ •ํ•˜์ง€ ์•Š์œผ๋ฉด values๊ฐ€ column์ด ๋œ๋‹ค. 

name_df = raw.pivot_table(index = 'Name', values = 'Number', columns = 'Sex', aggfunc='sum')
name_df.head()

์ด๋ ‡๊ฒŒ ์„ฑ๋ณ„์„ ๊ธฐ์ค€์œผ๋กœ ์ด๋ฆ„์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•  ์ˆ˜ ์žˆ๋‹ค. 

name_df.info()

์œ„์˜ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ info๋ฅผ ๋ณด๋ฉด F, M ๋ชจ๋‘ ๊ฒฐ์ธก๊ฐ’์ด ๊ฝค ์žˆ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 3. ๊ฒฐ์ธก๊ฐ’ ์ฑ„์šฐ๊ธฐ 
  • ๊ณตํ†ต๋œ ๊ฐ’์„ ์ž…๋ ฅ(ex 0)
  • ์ž„์˜์˜ ์ˆ˜๋ฅผ ์ž…๋ ฅ(ex ํ‰๊ท , ์ตœ๋Œ€๊ฐ’, ์ตœ์†Œ๊ฐ’, ๋น„์–ด์žˆ๋Š” ์ž๋ฆฌ ์ฃผ๋ณ€์˜ ๊ฐ’ ๋“ฑ)
  • ๋น„์–ด์žˆ๋Š” ๋ฐ์ดํ„ฐ๋Š” ๋ถ„์„์—์„œ ์ œ์™ธ

- fillna()

name_df = name_df.fillna(0)
name_df.head()

์ด ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ๊ฐœ์ˆ˜๊ฐ€ ์ฑ„์›Œ์ ธ ์žˆ์ง€ ์•Š์€ ๊ฒƒ์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ 0์œผ๋กœ ๋ชจ๋“  ๊ฒฐ์ธก๊ฐ’์„ ์ฑ„์šด๋‹ค. 

name_df.info()

๊ฒฐ์ธก๊ฐ’์ด ๋ชจ๋‘ ์ฑ„์›Œ์ง„ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค 

 

 

 

 

 4. ์ •๋ ฌํ•˜๊ธฐ : sort_values(by='์ปฌ๋Ÿผ๋ช…', ascending=True)

- ๋‚จ์ž, ์—ฌ์ž ๊ฐ๊ฐ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฆ„ ์•Œ์•„๋ณด๊ธฐ 

name_df.sort_values(by = 'M')

๋‚จ์„ฑ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๊ธฐ๋ณธ์ด ascending=True ์ด๋ฏ€๋กœ ์˜ค๋ฆ„์ฐจ์ˆœ์œผ๋กœ ๋˜์–ด์žˆ๋‹ค. ์ƒ์œ„ 5๊ฐœ๋ฅผ ์•Œ์•„๋ณผ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐ”๊พธ์–ด ์ค€๋‹ค. 

name_df.sort_values(by = 'M', ascending = False)

name_df.sort_values(by = 'M', ascending = False).head().index

Index(['Michael', 'James', 'Robert', 'John', 'David'], dtype='object', name='Name')

index๋งŒ ์ถ”์ถœํ•ด์„œ ์ƒ์œ„ 5๊ฐœ์˜ ์ด๋ฆ„๋งŒ ๋ฝ‘์€ ๊ฒƒ์ด๋‹ค. 

name_df.sort_values(by = 'F', ascending = False).head().index

Index(['Mary', 'Jennifer', 'Elizabeth', 'Patricia', 'Linda'], dtype='object', name='Name')

์—ฌ์„ฑ์˜ ์ƒ์œ„ 5๊ฐœ ์ด๋ฆ„๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ฝ‘์•„์ค€๋‹ค. 

 

 

 

 

 5. ์ปฌ๋Ÿผ๋ณ„ ๋ฐ์ดํ„ฐ ์ข…๋ฅ˜ ํ™•์ธํ•˜๊ธฐ 

- unique : ์ข…๋ฅ˜ ์•Œ์•„๋ณด๊ธฐ 

raw['StateCode'].unique()

- value_counts() : ์ข…๋ฅ˜ + ๊ฐœ์ˆ˜

raw['StateCode'].value_counts()

raw['YearOfBirth'].value_counts()

2007๋…„์— ๊ธฐ๋ก๋œ ์ด๋ฆ„์ด ๊ฐ€์žฅ ๋งŽ๋‹ค! 


 

 

 

 

 

 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/48

 

[python pandas] pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (1)

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ * ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ ๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ : read_excel(), read_csv() ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ : df.loc(), df.iloc() ์ธ๋ฑ์Šค/ ์ปฌ๋Ÿผ ๋ณ€๊ฒฝํ•˜๊ธฐ : columns/ index , reset_index()  pandas vs excel panda..

silvercoding.tistory.com

 

 


 

 1. pandas ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
import pandas as pd

 

 2. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
fpath = './data/exam.xlsx' 
data = pd.read_excel(fpath, index_col = '๋ฒˆํ˜ธ')

index_col='๋ฒˆํ˜ธ' ๋กœ ์ง€์ •ํ•˜์—ฌ ์—‘์…€ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

 

 

* head(), info(), describe() ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๋Š” ์Šต๊ด€ ๊ฐ–๊ธฐ 

data.head()

data.info()

 

data.describe()


 

 

 3. ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ 

df[ '์ปฌ๋Ÿผ๋ช…' ] =  data ( df.์ปฌ๋Ÿผ๋ช… = data ํ˜•ํƒœ๋Š” ์‚ฌ์šฉ ๋ถˆ๊ฐ€๋Šฅ )  

- ํ•˜๋‚˜์˜ ๊ฐ’ ์ถ”๊ฐ€ : ์ „์ฒด ๋ชจ๋‘ ๋™์ผํ•œ ๊ฐ’์œผ๋กœ ์ถ”๊ฐ€๋จ 

- ๊ทธ๋ฃน ์ถ”๊ฐ€ : ๋ฆฌ์ŠคํŠธ, ํŒ๋‹ค์Šค์˜ ์‹œ๋ฆฌ์ฆˆ๋กœ ์ถ”๊ฐ€ 

 

data['์ˆ˜ํ•™']
data.์ˆ˜ํ•™

๋ฐ์ดํ„ฐ๋ฅผ ์„ ํƒํ•  ๋• ์œ„์™€ ๊ฐ™์€ ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ž‘์„ฑํ•ด์ฃผ์—ˆ๋‹ค. 

 

- ํ•œ๊ฐœ ๊ฐ’ ์ถ”๊ฐ€ 

data['์Œ์•…'] = 90             
data.head()

๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•  ๋• data.์Œ์•… ์˜ ํ˜•ํƒœ๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅ ํ•˜๋‹ค. ํ•œ๊ฐœ์˜ ๊ฐ’์„ ์ถ”๊ฐ€ํ•˜๋ฉด ๋ชจ๋“  row์— ๊ฐ™์€ ๊ฐ’์ด ๋“ค์–ด๊ฐ€๊ฒŒ ๋œ๋‹ค. 

 

- ์—ฌ๋Ÿฌ ๊ฐ’ ์ถ”๊ฐ€ 

data['์ฒด์œก'] =  [100, 80, 60]
data.head()

๋ฆฌ์ŠคํŠธ๋กœ ์—ฌ๋Ÿฌ ๊ฐ’์„ ์ถ”๊ฐ€ํ•ด ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. ์ด ๋•Œ ์ฃผ์˜ํ•  ์ ์€ ๋ฆฌ์ŠคํŠธ ์›์†Œ ๊ฐœ์ˆ˜์™€ row๊ฐœ์ˆ˜๊ฐ€ ๊ฐ™์•„์•ผ ํ•œ๋‹ค. 

 

data['๊ตญ์˜์ˆ˜'] =  (data['๊ตญ์–ด'] + data['์˜์–ด'] + data['์ˆ˜ํ•™'] ) / 3
data.head()

์ด๋ ‡๊ฒŒ ์ปฌ๋Ÿผ ๊ฐ„์˜ ์—ฐ์‚ฐ์„ ํ†ตํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ๋งŒ๋“ค์–ด ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

 

 4. ๋ฐ์ดํ„ฐ ํ‘œ ๋ณ‘ํ•ฉํ•˜๊ธฐ 
fpath = './data/exam.xlsx'
A = pd.read_excel(fpath, index_col = '๋ฒˆํ˜ธ')
A.head()

ํŒŒ์ผ์„ ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์™€์„œ A ๋ณ€์ˆ˜์— ์ €์žฅํ•ด ์ค€๋‹ค. 

fpath2 = './data/exam_extra.xlsx'
B = pd.read_excel(fpath2, index_col = '๋ฒˆํ˜ธ')
B.head()

์ถ”๊ฐ€ ํ•  ์—‘์…€ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์™€ B ๋ณ€์ˆ˜์— ์ €์žฅํ•ด ์ค€๋‹ค.

 

 

- merge()

์ถœ์ฒ˜ - ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ

๋ณ‘ํ•ฉ ๊ธฐ์ค€์„ ์ธ์ž์— ๋„ฃ์–ด ์„ค์ •ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋•Œ, left_on ๊ณผ left_index ์ค‘ 1๊ฐœright_on ๊ณผ right_index ์ค‘ 1๊ฐœ๋ฅผ ์จ์•ผ ํ•˜๊ณ , ๋‘๊ฐ€์ง€๋ฅผ ํ•œ๋ฒˆ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค

total = pd.merge(A, B, how = 'left', left_index = True, right_index = True)
total.head()

left์ผ ๊ฒฝ์šฐ A๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋ณ‘์ด ๋œ๋‹ค. 4๋ฒˆ, 5๋ฒˆ์€ ๋‚˜์˜ค์ง€ ์•Š๊ณ , B์˜ 3๋ฒˆ์€ NaN์œผ๋กœ ์ฑ„์›Œ์ง„๋‹ค. 

pd.merge(A, B, how = 'right', left_index = True, right_index = True)

์œ„์™€ ๊ฐ™์ด ์ž‘์„ฑ๋˜์—ˆ์„ ๋•Œ , B์— ๋งž์ถ”์–ด ํ•ฉ๋ณ‘๋œ๋‹ค. ๋”ฐ๋ผ์„œ 3๋ฒˆ์€ ์—†๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

pd.merge(A, B, how = 'inner', left_index = True, right_index = True)

inner๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๊ฒฝ์šฐ , A ์™€ B ๋ชจ๋‘ ์กด์žฌํ•˜๋Š” ์ธ๋ฑ์Šค์˜๋งŒ ํ•ฉ๋ณ‘ํ•ด์ค€๋‹ค. 

pd.merge(A, B, how = 'outer', left_index= True, right_index=True)

outer๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๊ฒฝ์šฐ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ๋ณ‘ํ•ด ์ค€๋‹ค. 

 

 

 

 5 . ์ €์žฅํ•˜๊ธฐ 
total = pd.merge(A, B, how = 'left', left_index = True, right_index = True)
total

์ตœ์ข… ๋ชจ๋ธ์€ left, A๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋ณ‘ํ•œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ total ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•˜๊ณ  , ์ €์žฅ์„ ํ•ด๋ณด์ž ! 

total.to_excel('./data/exam_total.xlsx')

total.to_excel('./data/exam_total_withoutindex.xlsx', index = False)

 

index = False ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ '๋ฒˆํ˜ธ' ์ปฌ๋Ÿผ์„ ์ œ์™ธํ•˜๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค. 

 


 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ 

 

* ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ 

๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ : read_excel(), read_csv()

๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ : df.loc(), df.iloc()

์ธ๋ฑ์Šค/ ์ปฌ๋Ÿผ ๋ณ€๊ฒฝํ•˜๊ธฐ : columns/ index , reset_index() 

 

 

 


 

 pandas vs excel 

pandas : ๊ฐ€๋ณ๊ณ  ๋นจ๋ผ์„œ ๋Œ€์šฉ๋Ÿ‰ ํŒŒ์ผ ์ž‘์—…์„ ์ž์œ ๋กญ๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค. 

excel : ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ๋ˆˆ์— ๋ณด์ธ๋‹ค. (๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์•„ ์ง์ ‘ ๋ณด๊ธฐ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๋‹ค. ) 

 

 

 pandas ๊ตฌ์กฐ 
  • DataFrame : ํ‘œ ํ˜•ํƒœ 

- index : DB์˜ key ๊ฐœ๋… , ์—‘์…€์—์„œ๋Š” ๋ณดํ†ต ์ฒซ ๋ฒˆ์งธ ์—ด์— ๋ฐฐ์น˜ํ•˜๋Š” ๋ฐ์ดํ„ฐ (vlookup ๋“ฑ์— ํ™œ์šฉ) 

- columns : ํ•˜๋‚˜์˜ ์†์„ฑ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์˜ ์ง‘ํ•ฉ -> index + column ํ•˜๋‚˜๋กœ ๋‚˜๋ˆ„์–ด ์‚ดํŽด ๋ณผ ์ˆ˜ ์žˆ์Œ 

  • Series : ํ•˜๋‚˜์˜ ์†์„ฑ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ ( DataFrame ์—์„œ ํ•˜๋‚˜์˜ ์—ด ๋ฐ์ดํ„ฐ ) 
  •  

 

 1. Pandas ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

- pandas ์„ค์น˜ 

!pip install pandas

- pandas ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

import pandas as pd

 

 

 2. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ 

* ํŒŒ์ผ์˜ ๊ฒฝ๋กœ 

- ์ ˆ๋Œ€๊ฒฝ๋กœ : "c:ํด๋”1/ํด๋”2/.../ํŒŒ์ผ๋ช….ํ™•์žฅ์ž" 

- ์ƒ๋Œ€๊ฒฝ๋กœ : "./ํด๋”3/.../ํŒŒ์ผ๋ช….ํ™•์žฅ์ž" , "../ํด๋”4/.../ํŒŒ์ผ๋ช….ํ™•์žฅ์ž" (์ฅฌํ”ผํ„ฐ ๋…ธํŠธ๋ถ ํŒŒ์ผ ์œ„์น˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ง€์ •) 

- ./ : ํ˜„์žฌ ์œ„์น˜ ../ : ๋ถ€๋ชจ ํด๋”

 

* ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ๋“ค์ธ ๋’ค์—๋Š” head(), info(), descrive() ๋ช…๋ น์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ดํŽด๋ณด๋Š” ์Šต๊ด€ ๊ฐ–๊ธฐ 

 

temp = pd.read_excel('./data/exam.xlsx')
temp

temp.head(2)

head ์ธ์ž์— ๊ฐœ์ˆ˜๋ฅผ ์ง€์ •ํ•ด ์ค„ ์ˆ˜ ์žˆ๋‹ค. 

temp.tail()

 

- info () : ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์ธ๋ฑ์Šค, ์ปฌ๋Ÿผ์˜ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜์™€ ์ข…๋ฅ˜ ํ™•์ธ 

temp.info()

- describe() : ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ (inf, float) ๊ฐ€ ๋“ค์–ด์žˆ๋Š” ์ปฌ๋Ÿผ์˜ ๊ธฐ์ดˆํ†ต๊ณ„๋Ÿ‰ (๊ฐœ์ˆ˜,ํ‰๊ท ,ํ‘œ์ค€ํŽธ์ฐจ,์‚ฌ๋ถ„์œ„ ๋“ฑ) ํ™•์ธ

temp.describe()

 

 

 

 2-1 ์ธ๋ฑ์Šค ์ง€์ • 

- set_index() : ์ธ๋ฑ์Šค ์ปฌ๋Ÿผ ์ง€์ •ํ•˜๊ธฐ (์ปฌ๋Ÿผ -> ์ธ๋ฑ์Šค) 

data = temp.set_index('๋ฒˆํ˜ธ')
data.head()

set_index๋ฅผ ์ด์šฉํ•˜์—ฌ '๋ฒˆํ˜ธ' ์ปฌ๋Ÿผ์„ ์ธ๋ฑ์Šค๋กœ ์ง€์ •ํ•ด ์ฃผ์—ˆ๋‹ค. 

 

 

- index_col : ์—‘์…€ ํŒŒ์ผ ์ฝ์–ด์˜ฌ ๋•Œ ์ธ๋ฑ์Šค ์ง€์ • 

temp2 = pd.read_excel('./data/exam.xlsx', index_col = 0) # index_col = '๋ฒˆํ˜ธ' (์ปฌ๋Ÿผ๋ช… ํ™œ์šฉ)
temp2.head()

 

 

 

 3. ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ 

- ์…€ ์„ ํƒํ•˜๊ธฐ (1๊ฐœ)

df.iloc[row, column] : ์ธ๋ฑ์Šค ๋ฒˆํ˜ธ

df.lic[row, column] : ์ด๋ฆ„ 

data

data.iloc[1, 2]

55

data.loc['1๋ฒˆ','์ˆ˜ํ•™']

75

print(data.loc['3๋ฒˆ','์˜์–ด'])
print(data.iloc[2, 1])

100

100

print(data.loc['1๋ฒˆ', '๊ตญ์–ด'])
print(data.iloc[0, 0])

70

70

 

 

- ์…€ ์„ ํƒํ•˜๊ธฐ (๋ณต์ˆ˜) 

: ๋ฆฌ์ŠคํŠธ ( [์กฐ๊ฑด1, ์กฐ๊ฑด2, ... ์กฐ๊ฑดn] ) ํ˜น์€ ์‹œ์ž‘:์ข…๋ฃŒ ํ˜•ํƒœ๋กœ ๋ฒ”์œ„ ์ง€์ • 

data.loc['1๋ฒˆ', ['๊ตญ์–ด', '์˜์–ด']]

๊ตญ์–ด 70

์˜์–ด 80

Name: 1๋ฒˆ, dtype: int64

data.loc[ ['1๋ฒˆ','2๋ฒˆ'] , '์ˆ˜ํ•™']

๋ฒˆํ˜ธ

1๋ฒˆ 75

2๋ฒˆ 55

Name: ์ˆ˜ํ•™, dtype: int64

 

data.loc['1๋ฒˆ', '์˜์–ด': ]

์˜์–ด 80

์ˆ˜ํ•™ 75

Name: 1๋ฒˆ, dtype: int64

 

 

- ์ปฌ๋Ÿผ ์„ ํƒํ•˜๊ธฐ (1๊ฐœ)

: data.์ปฌ๋Ÿผ๋ช… or data.['์ปฌ๋Ÿผ๋ช…']

data.loc[ : , '์ˆ˜ํ•™']

data['์ˆ˜ํ•™']

data['์˜์–ด']

 

 

- ์ปฌ๋Ÿผ ์„ ํƒํ•˜๊ธฐ (๋ณต์ˆ˜)

: ์›ํ•˜๋Š” ์ˆœ์„œ๋Œ€๋กœ ์„ ํƒ ๊ฐ€๋Šฅ

data[ ['์ˆ˜ํ•™','์˜์–ด'] ]

data[  ['์ˆ˜ํ•™','์˜์–ด','๊ตญ์–ด']  ]

์›๋ž˜๋Š” ๊ตญ์–ด ์˜์–ด ์ˆ˜ํ•™ ์ˆœ์„œ์˜€๋Š”๋ฐ ์œ„์™€๊ฐ™์ด ์ˆœ์„œ๋ฅผ ๋‹ฌ๋ฆฌ ํ•˜์—ฌ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

- ํŠน์ • ์กฐ๊ฑด ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ (ํ•œ๊ฐœ)

pd[condition] : True์ธ ๋ฐ์ดํ„ฐ๋งŒ ์ถœ๋ ฅ 

-> condition : True / False ๋กœ ๊ตฌ์„ฑ๋œ ๋ฆฌ์ŠคํŠธ or ์‹œ๋ฆฌ์ฆˆ 

data

cond = data['์ˆ˜ํ•™'] < 80
cond

์ด๋ ‡๊ฒŒ ์ˆ˜ํ•™ ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด ์กฐ๊ฑด์„ ์ƒ์„ฑํ•˜๋ฉด boolํƒ€์ž…์„ ๋ฐ˜ํ™˜ํ•ด ์ค€๋‹ค. 

data[ cond ]

์œ„์˜ ์กฐ๊ฑด์„ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์— ์ ์šฉํ•˜๋ฉด True์ธ row๋“ค๋งŒ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. 

cond = [True, False, True]    # data['์˜์–ด'] >  80
data[cond]

๋ฆฌ์ŠคํŠธ์— ์ง์ ‘ boolํƒ€์ž…์„ ๋„ฃ์–ด ๋ฝ‘์•„์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. ์ด ๋•Œ ๋ฆฌ์ŠคํŠธ์˜ ๊ฐœ์ˆ˜์™€ row์˜ ๊ฐœ์ˆ˜๋Š” ๊ฐ™์•„์•ผ ํ•œ๋‹ค. 

 

 

- ํŠน์ • ์กฐ๊ฑด ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ (์—ฌ๋Ÿฌ๊ฐœ์˜ ์กฐ๊ฑด)

& : and , ๋ชจ๋“  ์กฐ๊ฑด ๋งŒ์กฑ True

| : or , ํ•œ ๊ฐœ๋ผ๋„ ๋งŒ์กฑํ•˜๋ฉด True

cond3 = (data['์˜์–ด'] > 80)
cond4 = (data['์ˆ˜ํ•™'] > 80)

data[ cond3 | cond4]

cond3 = (data['์˜์–ด'] > 80)
cond4 = (data['์ˆ˜ํ•™'] > 80)

cond = cond3 & cond4
data[ cond ]

cond = (data['์˜์–ด'] >= 70)  & (data['์ˆ˜ํ•™'] >= 70)  & (data['์ˆ˜ํ•™'] < 90) 
data[ cond ]

cond = (data['์˜์–ด'] >= 70) \
    & (data['์ˆ˜ํ•™'] >= 70) \
     & (data['์ˆ˜ํ•™'] < 90) 

data[ cond ]

์ค„์„ ๋ฐ”๊ฟ€ ๋• \(์—ญ์Šฌ๋ž˜์‰ฌ) ๋ฅผ ์‚ฌ์šฉํ•ด ์ค€๋‹ค. ๊ฐ€๋…์„ฑ์ด ์ข‹์•„์ง„๋‹ค. 

cond_first  =  ( data['๊ตญ์–ด']  > 80)
cond_second =  ( data['์˜์–ด']  > 80)

cond = cond_first   &   cond_second
data[cond]

 

cond_first  =  ( data['๊ตญ์–ด'] > 80 )
cond_second = ( data['์˜์–ด'] > 80 )


cond = cond_first     |   cond_second

data[cond]

 

 

 

 index & column 
data.index

Index(['1๋ฒˆ', '2๋ฒˆ', '3๋ฒˆ'], dtype='object', name='๋ฒˆํ˜ธ')

 

data.index = ['๊ฐ€๋ฐ˜', '๋‚˜๋ฐ˜', '๋‹ค๋ฐ˜']

์ธ๋ฑ์Šค๋ฅผ ๋ฆฌ์ŠคํŠธ๋กœ ์„ค์ •ํ•ด ์ค„ ์ˆ˜ ์žˆ๋‹ค. 

data

์„ค์ •ํ•œ ๋Œ€๋กœ ๋ฐ”๋€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

data.columns

Index(['๊ตญ์–ด', '์˜์–ด', '์ˆ˜ํ•™'], dtype='object')

data.columns = ['Korean','English', 'Math']

๋™์ผํ•˜๊ฒŒ ์ปฌ๋Ÿผ๋„ ๋ฐ”๊ฟ”์ค„ ์ˆ˜ ์žˆ๋‹ค. 

data

data.reset_index()

* reset_index : drop=False๊ฐ€ ๊ธฐ๋ณธ ๊ฐ’ ( ํ˜„์žฌ ์ธ๋ฑ์Šค๋ฅผ ์ปฌ๋Ÿผ์œผ๋กœ ์˜ฎ๊ฒจ ์ฃผ๊ณ  ์ธ๋ฑ์Šค๋ฅผ ๋ฆฌ์…‹) , 

drop = True ( ํ˜„์žฌ ์ธ๋ฑ์Šค์— ์žˆ๋Š” ๊ฐ’์„ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  ์ธ๋ฑ์Šค ์ดˆ๊ธฐํ™” ) 

 


 

 

 

+ Recent posts