๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

์ˆ˜์น˜ํ˜• x ์ˆ˜์น˜ํ˜• : scatterplot, lmplot, jointplot 

์ˆ˜์น˜ํ˜• x ์นดํ…Œ๊ณ ๋ฆฌํ˜• : boxplot, violinplot, barplot, heatmap 

์ˆ˜์น˜ํ˜• x ์œ„์น˜ํ˜• : folium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ 

 

 

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ distplot, relplot, jointplot, pairplot, boxplot, swarmplot, heatmap ์„ ์‚ฌ์šฉํ•œ๋‹ค. 

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import seaborn as sns

seaborn ์„ import ํ•ด์ฃผ๊ณ , ์•ฝ์–ด๋Š” sns๋ฅผ ์“ด๋‹ค. 

raw = sns.load_dataset('tips')

seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ˜•์‹์œผ๋กœ, pandas๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋‹ค. 

raw.head()

total_bill ( ์ด ๊ธˆ์•ก ) , tip ( ํŒ ) , sex ( ์„ฑ๋ณ„ ) , smoker ( ํก์—ฐ ์—ฌ๋ถ€ ) , day ( ์š”์ผ ) , time ( ์‹์‚ฌ ์‹œ๊ฐ„ ) , size ( ์ธ์› ์ˆ˜ ) 

raw.info()

info()๋ฅผ ์ด์šฉํ•˜์—ฌ row, column์˜ ๊ฐœ์ˆ˜, data type, ๊ฒฐ์ธก๊ฐ’ ํ™•์ธ์„ ํ•ด ์ค€๋‹ค. 

 


 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜•) 
  • sns.distplot( df[ '์ปฌ๋Ÿผ๋ช…' ] )
raw['total_bill']

์ˆ˜์น˜ํ˜• ์ž๋ฃŒ์ธ total_bill ์ปฌ๋Ÿผ์˜ ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ณธ๋‹ค. 

sns.distplot(raw['total_bill'])

 

 

 

 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ ( ์ˆ˜์น˜ํ˜• vs ์ˆ˜์น˜ํ˜• ) 
  • relplot( data=df, x=, y=, hue=, kind='scatter' ) 

: ๋‘ ๊ฐœ์˜ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜ ๋ถ„ํฌ ํ™•์ธ 

- kind = 'scatter' (default

- kind = 'line' 

sns.relplot(x = 'tip', y = 'total_bill', data = raw)  # kind ์˜ต์…˜ ๊ฐ’ ๋ฏธ์ง€์ •์‹œ "scatter"

sns.relplot(x = 'tip', y = 'total_bill', data = raw, kind = 'line')

sns.relplot(x = 'tip', y = 'total_bill', data = raw, hue = 'sex')

 

hue๋ฅผ ์„ฑ๋ณ„๋กœ ์ง€์ •ํ•˜์—ฌ ์„ฑ๋ณ„์„ ๊ธฐ์ค€์œผ๋กœ ๋‚˜๋ˆ„์–ด ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 


 ๊ด€๊ณ„ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์ˆ˜์น˜ํ˜• ) 
  • jointplot ( data=df, x=, y=, kind='scatter' ) 

- kind = 'scatter' : (default / point) 

- kind = 'reg' : (point + regression)

- kind = 'kde' : ๋ˆ„์ ๋ถ„ํฌ์ฐจํŠธ like ์ง€๋„

 

sns.jointplot(data = raw, x = 'tip', y = 'total_bill')  # kind ๊ฐ’ ๋ฏธ ์ง€์ •์‹œ ๊ธฐ๋ณธ ๊ฐ’์€ kind = 'scatter'

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'kde')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'reg')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'hex')

kind='hex'๋Š” kde์™€ ๋น„์Šทํ•œ๋ฐ, ๊ฒน์น˜์ง€ ์•Š๋Š” ์ •์œก๊ฐํ˜•์œผ๋กœ ํ‘œ์‹œ๋œ๋‹ค. 

 

 

 

  • pairplot( data=df ) 

: df ์˜ ๋ชจ๋“  ์ˆ˜์น˜ํ˜•๋ฐ์ดํ„ฐ ์ปฌ๋Ÿผ์—์„œ ๋‘ ์ปฌ๋Ÿผ์”ฉ ๊ด€๊ณ„๋ฅผ ์‹œ๊ฐํ™” ํ•จ 

 

sns.pairplot(data = raw)

์„œ๋กœ ๊ฐ™์€ ์ปฌ๋Ÿผ์ด๋ฉด ํžˆ์Šคํ† ๊ทธ๋žจ์„ ๊ทธ๋ฆฌ๊ณ , ๋‹ค๋ฅธ ์ปฌ๋Ÿผ์€ ๋‘ ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ์˜ ๊ด€๊ณ„๋ฅผ ์‚ฐ์ ๋„๋กœ ํ‘œ์‹œํ•ด ์ค€๋‹ค. 

sns.pairplot(data = raw, hue = 'sex')

hue๋ฅผ ์„ฑ๋ณ„๋กœ ์ง€์ •ํ•˜์—ฌ ์„ฑ๋ณ„ ๊ฐ„ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 


 ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์นดํŽ˜๊ณ ๋ฆฌํ˜•) 
  • boxplot( data = df, x = , y = , hue = )
sns.boxplot(data = raw, x = 'day', y = 'tip')

์š”์ผ ๋ณ„ ํŒ์˜ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•œ๋‹ค. 

sns.boxplot(data = raw, x = 'day', y = 'tip', hue = 'smoker')

hue๋ฅผ smoker๋กœ ์ง€์ •ํ•˜์—ฌ ํก์—ฐ์ž ์—ฌ๋ถ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

  • swarmplot( data=, x=, y=, hue=, dodge= )

boxplot๊ณผ ๋น„์Šทํ•˜๋‹ค. boxplot์€ ํ•˜๋‚˜์˜ ๊ธฐ์ค€์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ๋ฒ”์œ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์šฉ์ดํ•˜์ง€๋งŒ , ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ํ‘œํ˜„ํ•˜์ง€๋Š” ์•Š๊ธฐ ๋•Œ๋ฌธ์— , ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๊ฐ’์„ ๋น„๊ตํ•˜๊ธฐ์—๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋•Œ swarmplot์„ ์ด์šฉํ•œ๋‹ค. 

sns.swarmplot(data = raw, x = 'day', y = 'tip', hue = 'smoker', dodge=True)

 

dodge=False๋กœ ํ•˜๋ฉด ๋‘ ์ƒ‰์ด ๊ฒน์ณ ๋‚˜์˜จ๋‹ค. 

sns.boxplot(data = raw, x = 'day', y = 'tip', hue = 'smoker')
sns.swarmplot(data = raw, x = 'day', y = 'tip', hue = 'smoker', dodge=True)

 

boxplot๊ณผ swarplot์„ ๊ฒน์ณ์…” ๊ทธ๋ ค๋ณผ ์ˆ˜๋„ ์žˆ๋‹ค. 

sns.boxplot(data = raw, x = 'size', y = 'tip', hue = 'sex')

 

 

 

 

  • barplot( data=df, x=, y=, hue= ) 
sns.barplot(data = raw, x = 'size', y = 'tip', hue = 'sex')

 

 

 

 ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ (์ˆ˜์น˜ํ˜• vs ์นดํ…Œ๊ณ ๋ฆฌํ˜• vs ์นดํ…Œ๊ณ ๋ฆฌํ˜•)
  • heatmap( data = df )
df = raw.pivot_table(index = 'day', columns = 'size', values = 'tip', aggfunc='mean')
df

์šฐ์„  ์ธ๋ฑ์Šค๋ฅผ day, column์„ size, value๋ฅผ tip์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ์š”์ผ ๋ณ„ ์ธ์›์ˆ˜์— ๋”ฐ๋ฅธ tip ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•œ๋‹ค. 

sns.heatmap(data = df)

์นดํ…Œ๊ณ ๋ฆฌํ˜• - ์นดํ…Œ๊ณ ๋ฆฌํ˜•์˜ ๊ด€๊ณ„๋ฅผ ์ˆ˜์น˜ํ˜•์„ ์ด์šฉํ•˜์—ฌ ํ‘œํ˜„ํ•˜๋Š” heatmap์ด๋‹ค. 

 

๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ ์„ค์ •์„ ํ•  ์ˆ˜ ์žˆ๋‹ค. 

sns.heatmap(data = df, 
           annot = True, fmt = '.2f')

 

sns.heatmap(data = df, 
           annot = True, fmt = '0.2f',
           cmap = 'Pastel1')

* annot : ์ˆซ์ž ํ‘œ์‹œ, fmt : ์ˆซ์ž ํฌ๋งคํŒ… , cmap : ์ƒ‰๊น” 

* cmap ์ถ”์ฒœ ์ƒ‰ : Reds , Blues, Vlag, Pastel1, RdBu_r


 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/50

 

[python pandas] 3. pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (3) - ์ง‘๊ณ„, ๊ฒฐ์ธก๊ฐ’, ์ •๋ ฌ

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/49 https://silvercoding.tistory.com/48 [python pandas] pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (1) ๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ * ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ ๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ :..

silvercoding.tistory.com

 

 


 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd
file = './data/babyNamesUS.csv'
raw = pd.read_csv(file)
raw.head()

raw.info()

 


 

 ๋‚จ๋…€ ๊ตฌ๋ถ„์—†์ด '๋งŽ์ด' ์‚ฌ์šฉ๋˜๋Š” ๊ณตํ†ต ์ด๋ฆ„ ? 

idea : ๋‚จ๋…€ ์ด๋ฆ„ ๊ฐœ์ˆ˜์˜ ๋น„์œจ ์ฐจ์ด๊ฐ€ ์ž‘์„์ˆ˜๋ก ์„ฑ๋ณ„ ๊ตฌ๋ถ„์ด ์—†๋Š” ์ด๋ฆ„์ผ ๊ฒƒ์ด๋‹ค !  

# ์„ฑ๋ณ„์— ๋”ฐ๋ฅธ ์ด๋ฆ„ ๊ฐœ์ˆ˜ ์ง‘๊ณ„
name_df = raw.pivot_table(index = 'Name', columns = 'Sex', values = 'Number', aggfunc='sum')

# ๊ฒฐ์ธก๊ฐ’ ์ฑ„์šฐ๊ธฐ (0) 
name_df = name_df.fillna(0)

# float -> int 
name_df = name_df.astype(int)
name_df.head()

์—ฌ๊ธฐ๊นŒ์ง€ ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ํ–ˆ๋˜ ๋‚ด์šฉ์ด๋‹ค. 

name_df['Sum'] = name_df['M'] + name_df['F']
name_df.head()

๋‚จ๋…€ ์ด๋ฆ„ ๊ฐœ์ˆ˜๋ฅผ ๋ชจ๋‘ ๋”ํ•ด์„œ sum ์ด๋ผ๋Š” ์ปฌ๋Ÿผ์„ ์ƒ์„ฑํ•œ๋‹ค. 

# ๋‚จ, ๋…€ ๋น„์œจ ๊ณ„์‚ฐ 
name_df['F_ratio'] = name_df['F'] / name_df['Sum']
name_df['M_ratio'] = name_df['M'] / name_df['Sum']

# ๋‚จ, ๋…€ ๋น„์œจ ๊ฐ„ ์ฐจ์ด
name_df['M_F_Gap'] = abs(name_df['F_ratio'] - name_df['M_ratio'])
name_df.head()

-1 ~ 1 ์˜ ๋ฒ”์œ„๋ฅผ abs() (์ ˆ๋Œ“๊ฐ’) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 0 ~ 1 ๋ฒ”์œ„๋กœ ๋ฐ”๊พธ์–ด ์ค€๋‹ค. 

# ์ด๋ฆ„ ์ด ๊ฐœ์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ 
name_df = name_df.sort_values(by = 'Sum', ascending=False)
name_df.head(20)

๋งŽ์ด ์‚ฌ์šฉ๋œ ์ด๋ฆ„์„ ๋ฝ‘๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์šฐ์„  ์ด ํ•ฉ๊ณ„ ์ปฌ๋Ÿผ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•ด์ค€๋‹ค. 

cond = name_df['M_F_Gap'] < 0.1
name_df[cond].head(10)

์ด ๋•Œ ๋น„์œจ์ฐจ์ด๊ฐ€ ์ ์€ ๊ฒƒ์„ 0.1 ๋ฏธ๋งŒ์œผ๋กœ ๊ธฐ์ค€ ์žก๊ณ ,  M_F_Gap ์ปฌ๋Ÿผ์ด 0.1 ๋ณด๋‹ค ์ž‘์€ ํ–‰๋“ค์„ ์ถœ๋ ฅ์‹œํ‚จ๋‹ค.  

# ์„ฑ๋ณ„ ๊ตฌ๋ถ„์—†์ด ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฆ„ Top 10 
name_df[cond].head(10).index

 

 

 

 

 

 ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ๋ฏธ๊ตญ์˜ ์ด๋ฆ„ ? ( ์ตœ๊ทผ ํŠธ๋ Œ๋“œ ) 

idea : ์„ธ๋Œ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ตœ๊ทผ ์„ธ๋Œ€(2020, 1990) ์ด๋ฆ„ ๊ฐœ์ˆ˜์˜ ๋น„์œจ์ด ํฐ ์ด๋ฆ„์ด ์ตœ๊ทผ ํŠธ๋ Œ๋“œ์— ๋งž๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฏธ๊ตญ ์ด๋ฆ„์ผ ๊ฒƒ์ด๋‹ค ! 

raw.head()

# unique() ๋ฅผ ํ†ตํ•ด, ๊ธฐ๊ฐ„์— ๋“ค์–ด๊ฐ€๋Š” ๊ฐ’๋“ค์„ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค. 
raw['YearOfBirth'].unique()

array([1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015], dtype=int64)

 

 

* ์„ธ๋Œ€ ๋‚˜๋ˆ„๊ธฐ 

ํ•œ ์„ธ๋Œ€ ๋‚˜๋ˆ„๋Š” ๊ธฐ์ค€ 30๋…„ : 2020๋…„ ๊ธฐ์ค€ 30๋…„์”ฉ ๊ตฌ๋ถ„

  • 1930๋…„๋Œ€ ์ด์ „
  • 1960๋…„๋Œ€ ์ด์ „
  • 1990๋…„๋Œ€ ์ด์ „
  • 2020๋…„ ์ด์ „
year_class_list = [ ]

for year in raw['YearOfBirth']:
    if year <= 1930: 
        year_class = '1930๋…„์ด์ „'
    elif year<= 1960: 
        year_class = '1960๋…„์ด์ „'
    elif year <= 1990:
        year_class = '1990๋…„์ด์ „'
    else:
        year_class = '2020๋…„์ด์ „'
    year_class_list.append(year_class)

์œ„์™€๊ฐ™์ด ๋ฐ˜๋ณต๋ฌธ๊ณผ if๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ถœ์ƒ๋…„๋„๋ฅผ 4๊ฐœ์˜ ์„ธ๋Œ€ ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„์–ด ์ค€๋‹ค. 

raw['year_class'] = year_class_list
raw.head()

์„ธ๋Œ€ ๊ทธ๋ฃน์„ ์ €์žฅํ•œ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ด์šฉํ•˜์—ฌ year_class ์ปฌ๋Ÿผ์„ ์ƒ์„ฑํ•œ๋‹ค. 

name_period = raw.pivot_table(index = ['Name', 'Sex'], columns = 'year_class', values = 'Number', aggfunc='sum')
name_period = name_period.fillna(0)
name_period = name_period.astype(int)
name_period.head()

์ด๋ฆ„๊ณผ ์„ฑ๋ณ„์„ ์ธ๋ฑ์Šค๋กœ ์„ค์ •ํ•˜๊ณ , year_class์— ๋”ฐ๋ฅธ number์˜ ํ•ฉ๊ณ„๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. 

name_period['sum'] = name_period.sum(axis = 1)
name_period.head()

์ด๋ฆ„ ์ด ๊ฐœ์ˆ˜๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด sum(axis=1) ์„ ์‚ฌ์šฉํ•œ๋‹ค.  axis=1์ด๋ฉด ๊ฐ€๋กœ๋ฐฉํ–ฅ์œผ๋กœ ๊ณ„์‚ฐ์„ ํ•˜๊ฒŒ ๋œ๋‹ค. 

# ์„ธ๋Œ€ ๋ณ„ ๋น„์œจ ๊ณ„์‚ฐ 
for col in name_period.columns:
    col_new = col+"๋น„์œจ"
    name_period[col_new] = name_period[col] / name_period['sum']
    
name_period.head()

์„ธ๋Œ€ ๋ณ„ ๋น„์œจ์„ ๊ณ„์‚ฐํ•˜์—ฌ ๊ฐ ์ปฌ๋Ÿผ์„ ๋งŒ๋“ค์–ด ์ค€๋‹ค. 

# ์ด๋ฆ„ ์‚ฌ์šฉ์ˆ˜ ํ•ฉ๊ณ„, 2020๋…„ ์ด์ „ ๋น„์œจ, 1990๋…„์ด์ „ ๋น„์œจ ๊ธฐ์ค€ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ 
name_period = name_period.sort_values(by = ['sum', '2020๋…„์ด์ „๋น„์œจ','1990๋…„์ด์ „๋น„์œจ'], ascending=False)
name_period

1์ˆœ์œ„ ์ด๋ฆ„ ๊ฐœ์ˆ˜ ์ด ํ•ฉ , 2์ˆœ์œ„ 2020๋…„ ์ด์ „ ๋น„์œจ , 3์ˆœ์œ„ 1990๋…„ ์ด์ „ ๋น„์œจ ๋กœ ์ •๋ ฌ์„ ํ•˜์—ฌ ์ตœ์‹  ํŠธ๋ Œ๋“œ์— ๋งž๋Š” ๋ฏธ๊ตญ ๋Œ€ํ‘œ์ด๋ฆ„์„ ์•Œ์•„๋ณธ๋‹ค. 

# ์ธ๋ฑ์Šค๊ฐ€ ์—ฌ๋Ÿฌ ๋ ˆ๋ฒจ๋กœ ๋˜์–ด์žˆ์„ ๊ฒฝ์šฐ, ์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•ด ์ปจํŠธ๋กค ํ•˜๋Š” ๊ฒƒ์€ ๋ณต์žก
# reset_index()๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ธ๋ฑ์Šค๋กœ ์„ค์ •๋œ ์ด๋ฆ„๊ณผ ์„ฑ๋ณ„์„ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝ
name_period = name_period.reset_index()
name_period.head()

์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ปจํŠธ๋กค์ด ์–ด๋ ค์šฐ๋ฏ€๋กœ ํ•„์š”ํ•œ ์ง‘๊ณ„, ์—ฐ์‚ฐ์ด ๋๋‚œ ๋’ค์—๋Š” reset_index๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ column์œผ๋กœ ๋ณ€๊ฒฝํ•ด ์ค€๋‹ค. 

# ๋‚จ์ž ์ด๋ฆ„๋งŒ ์„ ํƒ
cond = name_period['Sex'] =='M'
name_period[cond].head(10)

์„ฑ๋ณ„์ด ๋‚จ์„ฑ์ธ ์ด๋ฆ„๋“ค ์ค‘์—์„œ ์ƒ์œ„ 10๊ฐœ๋ฅผ ๋ฝ‘์•„๋ณด๋ฉด ์œ„์™€ ๊ฐ™๋‹ค. 

# ์ด๋ฒˆ์—๋Š” ์—ฌ์ž์ด๋ฆ„
cond = name_period['Sex'] =='F'
name_period[cond].head(10)

์ด๋ฒˆ์—” ์„ฑ๋ณ„์ด ์—ฌ์„ฑ์ธ ์ด๋ฆ„์˜ ์ƒ์œ„ 10๊ฐœ๋ฅผ ๋ฝ‘์€ ๊ฒƒ์ด๋‹ค. 

 

๊ทธ๋Ÿฐ๋ฐ ์•„์ง์€ ์ด์ƒํ•˜๋‹ค. ํŠนํžˆ ์„ฑ๋ณ„์ด ์—ฌ์ž์ธ ์ด๋ฆ„์˜ ํ‘œ์—์„œ ์ฒซ๋ฒˆ์งธ row๋Š” 1960๋…„ ์ด์ „ ์„ธ๋Œ€์—์„œ ์•ฝ 50%์˜ ๋น„์œจ์„ ์ฐจ์ง€ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์กฐ๊ฑด์„ ๊ฑด๋‹ค. 

cond_age = name_period['2020๋…„์ด์ „๋น„์œจ'] > 0.3
cond_sex = name_period['Sex'] == 'M'
cond = cond_age & cond_sex
name_period[cond].head(5)

 

2020๋…„ ์ด์ „๋น„์œจ์ด 0.3 ์ด์ƒ์ด๋ฉด์„œ ์„ฑ๋ณ„์ด ๋‚จ์„ฑ์ธ ์กฐ๊ฑด์ธ row๋ฅผ ์„ ํƒํ•œ๋‹ค. 

๊ฒฐ๊ณผ : < ๋‚จ์„ฑ Top 5 ์ด๋ฆ„ Christopher, Daniel, Matthew, Anthony, Andrew >

cond_age = name_period['2020๋…„์ด์ „๋น„์œจ'] > 0.3
cond_sex = name_period['Sex'] == 'F'
cond = cond_age & cond_sex
name_period[cond].head(5)

์—ฌ์„ฑ์˜ ๊ฒฝ์šฐ๋„ ๋™์ผํ•œ ์กฐ๊ฑด์œผ๋กœ ์ง„ํ–‰ํ•œ๋‹ค. 

๊ฒฐ๊ณผ : < ์—ฌ์„ฑ Top 5 ์ด๋ฆ„ Jessica, Sarah, Ashley, Stephanie, Emily >

 


 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/49

 

[python pandas] 2. pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (2) - ์ถ”๊ฐ€, ๋ณ‘ํ•ฉ, ์ €์žฅ

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ < ์ด์ „ ๊ธ€ > https://silvercoding.tistory.com/48 [python pandas] pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (1) ๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ * ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ ๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ : read_excel(), read_csv() ๋ฐ์ดํ„ฐ ์„ ํƒ..

silvercoding.tistory.com

 

 

 


 1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
import pandas as pd

pandas import ํ•ด์ฃผ๊ธฐ 

file = './data/babyNamesUS.csv'
raw = pd.read_csv(file)

์˜ค๋Š˜ ํ•™์Šตํ•  csv ํŒŒ์ผ์„ pandas๋กœ ๋ถˆ๋Ÿฌ์™€ ์ค€๋‹ค. 

raw.head()

raw.info()

1048575 ๊ฐœ์˜ row์™€ 5๊ฐœ์˜ column ์ด ์กด์žฌํ•˜๋ฉฐ, ๊ฒฐ์ธก๊ฐ’์„ ์—†๋‹ค. 

[ ์ปฌ๋Ÿผ ์ •๋ณด : ์ฃผ, ์„ฑ๋ณ„, ์ถœ์ƒ๋…„๋„, ์ด๋ฆ„, ์ด๋ฆ„ ๊ฐœ์ˆ˜ ]

 


 

 2. ์ง‘๊ณ„ํ•˜๊ธฐ ( pivot_table ) 

pd.pivot_table(index = '์ปฌ๋Ÿผ๋ช…', columns = '์ปฌ๋Ÿผ๋ช…', values = '์ปฌ๋Ÿผ๋ช…', aggfunc = 'sum')

raw.pivot_table(index = 'Name', values = 'Number', aggfunc='sum')

์ด๋ฆ„ ๋ณ„ ๋นˆ๋„์ˆ˜ ์ง‘๊ณ„ํ•ด์„œ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. columns๋ฅผ ๋”ฐ๋กœ ์„ค์ •ํ•˜์ง€ ์•Š์œผ๋ฉด values๊ฐ€ column์ด ๋œ๋‹ค. 

name_df = raw.pivot_table(index = 'Name', values = 'Number', columns = 'Sex', aggfunc='sum')
name_df.head()

์ด๋ ‡๊ฒŒ ์„ฑ๋ณ„์„ ๊ธฐ์ค€์œผ๋กœ ์ด๋ฆ„์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•  ์ˆ˜ ์žˆ๋‹ค. 

name_df.info()

์œ„์˜ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ info๋ฅผ ๋ณด๋ฉด F, M ๋ชจ๋‘ ๊ฒฐ์ธก๊ฐ’์ด ๊ฝค ์žˆ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 3. ๊ฒฐ์ธก๊ฐ’ ์ฑ„์šฐ๊ธฐ 
  • ๊ณตํ†ต๋œ ๊ฐ’์„ ์ž…๋ ฅ(ex 0)
  • ์ž„์˜์˜ ์ˆ˜๋ฅผ ์ž…๋ ฅ(ex ํ‰๊ท , ์ตœ๋Œ€๊ฐ’, ์ตœ์†Œ๊ฐ’, ๋น„์–ด์žˆ๋Š” ์ž๋ฆฌ ์ฃผ๋ณ€์˜ ๊ฐ’ ๋“ฑ)
  • ๋น„์–ด์žˆ๋Š” ๋ฐ์ดํ„ฐ๋Š” ๋ถ„์„์—์„œ ์ œ์™ธ

- fillna()

name_df = name_df.fillna(0)
name_df.head()

์ด ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ๊ฐœ์ˆ˜๊ฐ€ ์ฑ„์›Œ์ ธ ์žˆ์ง€ ์•Š์€ ๊ฒƒ์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ 0์œผ๋กœ ๋ชจ๋“  ๊ฒฐ์ธก๊ฐ’์„ ์ฑ„์šด๋‹ค. 

name_df.info()

๊ฒฐ์ธก๊ฐ’์ด ๋ชจ๋‘ ์ฑ„์›Œ์ง„ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค 

 

 

 

 

 4. ์ •๋ ฌํ•˜๊ธฐ : sort_values(by='์ปฌ๋Ÿผ๋ช…', ascending=True)

- ๋‚จ์ž, ์—ฌ์ž ๊ฐ๊ฐ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฆ„ ์•Œ์•„๋ณด๊ธฐ 

name_df.sort_values(by = 'M')

๋‚จ์„ฑ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๊ธฐ๋ณธ์ด ascending=True ์ด๋ฏ€๋กœ ์˜ค๋ฆ„์ฐจ์ˆœ์œผ๋กœ ๋˜์–ด์žˆ๋‹ค. ์ƒ์œ„ 5๊ฐœ๋ฅผ ์•Œ์•„๋ณผ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐ”๊พธ์–ด ์ค€๋‹ค. 

name_df.sort_values(by = 'M', ascending = False)

name_df.sort_values(by = 'M', ascending = False).head().index

Index(['Michael', 'James', 'Robert', 'John', 'David'], dtype='object', name='Name')

index๋งŒ ์ถ”์ถœํ•ด์„œ ์ƒ์œ„ 5๊ฐœ์˜ ์ด๋ฆ„๋งŒ ๋ฝ‘์€ ๊ฒƒ์ด๋‹ค. 

name_df.sort_values(by = 'F', ascending = False).head().index

Index(['Mary', 'Jennifer', 'Elizabeth', 'Patricia', 'Linda'], dtype='object', name='Name')

์—ฌ์„ฑ์˜ ์ƒ์œ„ 5๊ฐœ ์ด๋ฆ„๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ฝ‘์•„์ค€๋‹ค. 

 

 

 

 

 5. ์ปฌ๋Ÿผ๋ณ„ ๋ฐ์ดํ„ฐ ์ข…๋ฅ˜ ํ™•์ธํ•˜๊ธฐ 

- unique : ์ข…๋ฅ˜ ์•Œ์•„๋ณด๊ธฐ 

raw['StateCode'].unique()

- value_counts() : ์ข…๋ฅ˜ + ๊ฐœ์ˆ˜

raw['StateCode'].value_counts()

raw['YearOfBirth'].value_counts()

2007๋…„์— ๊ธฐ๋ก๋œ ์ด๋ฆ„์ด ๊ฐ€์žฅ ๋งŽ๋‹ค! 


 

 

 

 

 

 

 

 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ 

 

 

< ์ด์ „ ๊ธ€ > 

https://silvercoding.tistory.com/48

 

[python pandas] pandas ๊ธฐ์ดˆ ์‚ฌ์šฉ (1)

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ * ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ ๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ : read_excel(), read_csv() ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ : df.loc(), df.iloc() ์ธ๋ฑ์Šค/ ์ปฌ๋Ÿผ ๋ณ€๊ฒฝํ•˜๊ธฐ : columns/ index , reset_index()  pandas vs excel panda..

silvercoding.tistory.com

 

 


 

 1. pandas ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
import pandas as pd

 

 2. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ์‚ดํŽด๋ณด๊ธฐ 
fpath = './data/exam.xlsx' 
data = pd.read_excel(fpath, index_col = '๋ฒˆํ˜ธ')

index_col='๋ฒˆํ˜ธ' ๋กœ ์ง€์ •ํ•˜์—ฌ ์—‘์…€ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

 

 

* head(), info(), describe() ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๋Š” ์Šต๊ด€ ๊ฐ–๊ธฐ 

data.head()

data.info()

 

data.describe()


 

 

 3. ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ 

df[ '์ปฌ๋Ÿผ๋ช…' ] =  data ( df.์ปฌ๋Ÿผ๋ช… = data ํ˜•ํƒœ๋Š” ์‚ฌ์šฉ ๋ถˆ๊ฐ€๋Šฅ )  

- ํ•˜๋‚˜์˜ ๊ฐ’ ์ถ”๊ฐ€ : ์ „์ฒด ๋ชจ๋‘ ๋™์ผํ•œ ๊ฐ’์œผ๋กœ ์ถ”๊ฐ€๋จ 

- ๊ทธ๋ฃน ์ถ”๊ฐ€ : ๋ฆฌ์ŠคํŠธ, ํŒ๋‹ค์Šค์˜ ์‹œ๋ฆฌ์ฆˆ๋กœ ์ถ”๊ฐ€ 

 

data['์ˆ˜ํ•™']
data.์ˆ˜ํ•™

๋ฐ์ดํ„ฐ๋ฅผ ์„ ํƒํ•  ๋• ์œ„์™€ ๊ฐ™์€ ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ž‘์„ฑํ•ด์ฃผ์—ˆ๋‹ค. 

 

- ํ•œ๊ฐœ ๊ฐ’ ์ถ”๊ฐ€ 

data['์Œ์•…'] = 90             
data.head()

๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•  ๋• data.์Œ์•… ์˜ ํ˜•ํƒœ๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅ ํ•˜๋‹ค. ํ•œ๊ฐœ์˜ ๊ฐ’์„ ์ถ”๊ฐ€ํ•˜๋ฉด ๋ชจ๋“  row์— ๊ฐ™์€ ๊ฐ’์ด ๋“ค์–ด๊ฐ€๊ฒŒ ๋œ๋‹ค. 

 

- ์—ฌ๋Ÿฌ ๊ฐ’ ์ถ”๊ฐ€ 

data['์ฒด์œก'] =  [100, 80, 60]
data.head()

๋ฆฌ์ŠคํŠธ๋กœ ์—ฌ๋Ÿฌ ๊ฐ’์„ ์ถ”๊ฐ€ํ•ด ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. ์ด ๋•Œ ์ฃผ์˜ํ•  ์ ์€ ๋ฆฌ์ŠคํŠธ ์›์†Œ ๊ฐœ์ˆ˜์™€ row๊ฐœ์ˆ˜๊ฐ€ ๊ฐ™์•„์•ผ ํ•œ๋‹ค. 

 

data['๊ตญ์˜์ˆ˜'] =  (data['๊ตญ์–ด'] + data['์˜์–ด'] + data['์ˆ˜ํ•™'] ) / 3
data.head()

์ด๋ ‡๊ฒŒ ์ปฌ๋Ÿผ ๊ฐ„์˜ ์—ฐ์‚ฐ์„ ํ†ตํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ๋งŒ๋“ค์–ด ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

 

 4. ๋ฐ์ดํ„ฐ ํ‘œ ๋ณ‘ํ•ฉํ•˜๊ธฐ 
fpath = './data/exam.xlsx'
A = pd.read_excel(fpath, index_col = '๋ฒˆํ˜ธ')
A.head()

ํŒŒ์ผ์„ ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์™€์„œ A ๋ณ€์ˆ˜์— ์ €์žฅํ•ด ์ค€๋‹ค. 

fpath2 = './data/exam_extra.xlsx'
B = pd.read_excel(fpath2, index_col = '๋ฒˆํ˜ธ')
B.head()

์ถ”๊ฐ€ ํ•  ์—‘์…€ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์™€ B ๋ณ€์ˆ˜์— ์ €์žฅํ•ด ์ค€๋‹ค.

 

 

- merge()

์ถœ์ฒ˜ - ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ

๋ณ‘ํ•ฉ ๊ธฐ์ค€์„ ์ธ์ž์— ๋„ฃ์–ด ์„ค์ •ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋•Œ, left_on ๊ณผ left_index ์ค‘ 1๊ฐœright_on ๊ณผ right_index ์ค‘ 1๊ฐœ๋ฅผ ์จ์•ผ ํ•˜๊ณ , ๋‘๊ฐ€์ง€๋ฅผ ํ•œ๋ฒˆ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค

total = pd.merge(A, B, how = 'left', left_index = True, right_index = True)
total.head()

left์ผ ๊ฒฝ์šฐ A๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋ณ‘์ด ๋œ๋‹ค. 4๋ฒˆ, 5๋ฒˆ์€ ๋‚˜์˜ค์ง€ ์•Š๊ณ , B์˜ 3๋ฒˆ์€ NaN์œผ๋กœ ์ฑ„์›Œ์ง„๋‹ค. 

pd.merge(A, B, how = 'right', left_index = True, right_index = True)

์œ„์™€ ๊ฐ™์ด ์ž‘์„ฑ๋˜์—ˆ์„ ๋•Œ , B์— ๋งž์ถ”์–ด ํ•ฉ๋ณ‘๋œ๋‹ค. ๋”ฐ๋ผ์„œ 3๋ฒˆ์€ ์—†๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

pd.merge(A, B, how = 'inner', left_index = True, right_index = True)

inner๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๊ฒฝ์šฐ , A ์™€ B ๋ชจ๋‘ ์กด์žฌํ•˜๋Š” ์ธ๋ฑ์Šค์˜๋งŒ ํ•ฉ๋ณ‘ํ•ด์ค€๋‹ค. 

pd.merge(A, B, how = 'outer', left_index= True, right_index=True)

outer๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๊ฒฝ์šฐ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ๋ณ‘ํ•ด ์ค€๋‹ค. 

 

 

 

 5 . ์ €์žฅํ•˜๊ธฐ 
total = pd.merge(A, B, how = 'left', left_index = True, right_index = True)
total

์ตœ์ข… ๋ชจ๋ธ์€ left, A๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋ณ‘ํ•œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ total ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•˜๊ณ  , ์ €์žฅ์„ ํ•ด๋ณด์ž ! 

total.to_excel('./data/exam_total.xlsx')

total.to_excel('./data/exam_total_withoutindex.xlsx', index = False)

 

index = False ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ '๋ฒˆํ˜ธ' ์ปฌ๋Ÿผ์„ ์ œ์™ธํ•˜๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค. 

 


 

 

 

๋Ÿฌ๋‹์Šคํ‘ผ ์ˆ˜์—… ์ •๋ฆฌ 

 

* ํŒ๋‹ค์Šค ๊ธฐ๋ณธ ํ•จ์ˆ˜ 

๋ฐ์ดํ„ฐ ํŒŒ์ผ ์ฝ๊ธฐ : read_excel(), read_csv()

๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ : df.loc(), df.iloc()

์ธ๋ฑ์Šค/ ์ปฌ๋Ÿผ ๋ณ€๊ฒฝํ•˜๊ธฐ : columns/ index , reset_index() 

 

 

 


 

 pandas vs excel 

pandas : ๊ฐ€๋ณ๊ณ  ๋นจ๋ผ์„œ ๋Œ€์šฉ๋Ÿ‰ ํŒŒ์ผ ์ž‘์—…์„ ์ž์œ ๋กญ๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค. 

excel : ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ๋ˆˆ์— ๋ณด์ธ๋‹ค. (๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์•„ ์ง์ ‘ ๋ณด๊ธฐ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๋‹ค. ) 

 

 

 pandas ๊ตฌ์กฐ 
  • DataFrame : ํ‘œ ํ˜•ํƒœ 

- index : DB์˜ key ๊ฐœ๋… , ์—‘์…€์—์„œ๋Š” ๋ณดํ†ต ์ฒซ ๋ฒˆ์งธ ์—ด์— ๋ฐฐ์น˜ํ•˜๋Š” ๋ฐ์ดํ„ฐ (vlookup ๋“ฑ์— ํ™œ์šฉ) 

- columns : ํ•˜๋‚˜์˜ ์†์„ฑ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์˜ ์ง‘ํ•ฉ -> index + column ํ•˜๋‚˜๋กœ ๋‚˜๋ˆ„์–ด ์‚ดํŽด ๋ณผ ์ˆ˜ ์žˆ์Œ 

  • Series : ํ•˜๋‚˜์˜ ์†์„ฑ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ ( DataFrame ์—์„œ ํ•˜๋‚˜์˜ ์—ด ๋ฐ์ดํ„ฐ ) 
  •  

 

 1. Pandas ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

- pandas ์„ค์น˜ 

!pip install pandas

- pandas ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

import pandas as pd

 

 

 2. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ & ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ 

* ํŒŒ์ผ์˜ ๊ฒฝ๋กœ 

- ์ ˆ๋Œ€๊ฒฝ๋กœ : "c:ํด๋”1/ํด๋”2/.../ํŒŒ์ผ๋ช….ํ™•์žฅ์ž" 

- ์ƒ๋Œ€๊ฒฝ๋กœ : "./ํด๋”3/.../ํŒŒ์ผ๋ช….ํ™•์žฅ์ž" , "../ํด๋”4/.../ํŒŒ์ผ๋ช….ํ™•์žฅ์ž" (์ฅฌํ”ผํ„ฐ ๋…ธํŠธ๋ถ ํŒŒ์ผ ์œ„์น˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ง€์ •) 

- ./ : ํ˜„์žฌ ์œ„์น˜ ../ : ๋ถ€๋ชจ ํด๋”

 

* ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ๋“ค์ธ ๋’ค์—๋Š” head(), info(), descrive() ๋ช…๋ น์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ดํŽด๋ณด๋Š” ์Šต๊ด€ ๊ฐ–๊ธฐ 

 

temp = pd.read_excel('./data/exam.xlsx')
temp

temp.head(2)

head ์ธ์ž์— ๊ฐœ์ˆ˜๋ฅผ ์ง€์ •ํ•ด ์ค„ ์ˆ˜ ์žˆ๋‹ค. 

temp.tail()

 

- info () : ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์ธ๋ฑ์Šค, ์ปฌ๋Ÿผ์˜ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜์™€ ์ข…๋ฅ˜ ํ™•์ธ 

temp.info()

- describe() : ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ (inf, float) ๊ฐ€ ๋“ค์–ด์žˆ๋Š” ์ปฌ๋Ÿผ์˜ ๊ธฐ์ดˆํ†ต๊ณ„๋Ÿ‰ (๊ฐœ์ˆ˜,ํ‰๊ท ,ํ‘œ์ค€ํŽธ์ฐจ,์‚ฌ๋ถ„์œ„ ๋“ฑ) ํ™•์ธ

temp.describe()

 

 

 

 2-1 ์ธ๋ฑ์Šค ์ง€์ • 

- set_index() : ์ธ๋ฑ์Šค ์ปฌ๋Ÿผ ์ง€์ •ํ•˜๊ธฐ (์ปฌ๋Ÿผ -> ์ธ๋ฑ์Šค) 

data = temp.set_index('๋ฒˆํ˜ธ')
data.head()

set_index๋ฅผ ์ด์šฉํ•˜์—ฌ '๋ฒˆํ˜ธ' ์ปฌ๋Ÿผ์„ ์ธ๋ฑ์Šค๋กœ ์ง€์ •ํ•ด ์ฃผ์—ˆ๋‹ค. 

 

 

- index_col : ์—‘์…€ ํŒŒ์ผ ์ฝ์–ด์˜ฌ ๋•Œ ์ธ๋ฑ์Šค ์ง€์ • 

temp2 = pd.read_excel('./data/exam.xlsx', index_col = 0) # index_col = '๋ฒˆํ˜ธ' (์ปฌ๋Ÿผ๋ช… ํ™œ์šฉ)
temp2.head()

 

 

 

 3. ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ 

- ์…€ ์„ ํƒํ•˜๊ธฐ (1๊ฐœ)

df.iloc[row, column] : ์ธ๋ฑ์Šค ๋ฒˆํ˜ธ

df.lic[row, column] : ์ด๋ฆ„ 

data

data.iloc[1, 2]

55

data.loc['1๋ฒˆ','์ˆ˜ํ•™']

75

print(data.loc['3๋ฒˆ','์˜์–ด'])
print(data.iloc[2, 1])

100

100

print(data.loc['1๋ฒˆ', '๊ตญ์–ด'])
print(data.iloc[0, 0])

70

70

 

 

- ์…€ ์„ ํƒํ•˜๊ธฐ (๋ณต์ˆ˜) 

: ๋ฆฌ์ŠคํŠธ ( [์กฐ๊ฑด1, ์กฐ๊ฑด2, ... ์กฐ๊ฑดn] ) ํ˜น์€ ์‹œ์ž‘:์ข…๋ฃŒ ํ˜•ํƒœ๋กœ ๋ฒ”์œ„ ์ง€์ • 

data.loc['1๋ฒˆ', ['๊ตญ์–ด', '์˜์–ด']]

๊ตญ์–ด 70

์˜์–ด 80

Name: 1๋ฒˆ, dtype: int64

data.loc[ ['1๋ฒˆ','2๋ฒˆ'] , '์ˆ˜ํ•™']

๋ฒˆํ˜ธ

1๋ฒˆ 75

2๋ฒˆ 55

Name: ์ˆ˜ํ•™, dtype: int64

 

data.loc['1๋ฒˆ', '์˜์–ด': ]

์˜์–ด 80

์ˆ˜ํ•™ 75

Name: 1๋ฒˆ, dtype: int64

 

 

- ์ปฌ๋Ÿผ ์„ ํƒํ•˜๊ธฐ (1๊ฐœ)

: data.์ปฌ๋Ÿผ๋ช… or data.['์ปฌ๋Ÿผ๋ช…']

data.loc[ : , '์ˆ˜ํ•™']

data['์ˆ˜ํ•™']

data['์˜์–ด']

 

 

- ์ปฌ๋Ÿผ ์„ ํƒํ•˜๊ธฐ (๋ณต์ˆ˜)

: ์›ํ•˜๋Š” ์ˆœ์„œ๋Œ€๋กœ ์„ ํƒ ๊ฐ€๋Šฅ

data[ ['์ˆ˜ํ•™','์˜์–ด'] ]

data[  ['์ˆ˜ํ•™','์˜์–ด','๊ตญ์–ด']  ]

์›๋ž˜๋Š” ๊ตญ์–ด ์˜์–ด ์ˆ˜ํ•™ ์ˆœ์„œ์˜€๋Š”๋ฐ ์œ„์™€๊ฐ™์ด ์ˆœ์„œ๋ฅผ ๋‹ฌ๋ฆฌ ํ•˜์—ฌ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

- ํŠน์ • ์กฐ๊ฑด ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ (ํ•œ๊ฐœ)

pd[condition] : True์ธ ๋ฐ์ดํ„ฐ๋งŒ ์ถœ๋ ฅ 

-> condition : True / False ๋กœ ๊ตฌ์„ฑ๋œ ๋ฆฌ์ŠคํŠธ or ์‹œ๋ฆฌ์ฆˆ 

data

cond = data['์ˆ˜ํ•™'] < 80
cond

์ด๋ ‡๊ฒŒ ์ˆ˜ํ•™ ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด ์กฐ๊ฑด์„ ์ƒ์„ฑํ•˜๋ฉด boolํƒ€์ž…์„ ๋ฐ˜ํ™˜ํ•ด ์ค€๋‹ค. 

data[ cond ]

์œ„์˜ ์กฐ๊ฑด์„ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์— ์ ์šฉํ•˜๋ฉด True์ธ row๋“ค๋งŒ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. 

cond = [True, False, True]    # data['์˜์–ด'] >  80
data[cond]

๋ฆฌ์ŠคํŠธ์— ์ง์ ‘ boolํƒ€์ž…์„ ๋„ฃ์–ด ๋ฝ‘์•„์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. ์ด ๋•Œ ๋ฆฌ์ŠคํŠธ์˜ ๊ฐœ์ˆ˜์™€ row์˜ ๊ฐœ์ˆ˜๋Š” ๊ฐ™์•„์•ผ ํ•œ๋‹ค. 

 

 

- ํŠน์ • ์กฐ๊ฑด ๋ฐ์ดํ„ฐ ์„ ํƒํ•˜๊ธฐ (์—ฌ๋Ÿฌ๊ฐœ์˜ ์กฐ๊ฑด)

& : and , ๋ชจ๋“  ์กฐ๊ฑด ๋งŒ์กฑ True

| : or , ํ•œ ๊ฐœ๋ผ๋„ ๋งŒ์กฑํ•˜๋ฉด True

cond3 = (data['์˜์–ด'] > 80)
cond4 = (data['์ˆ˜ํ•™'] > 80)

data[ cond3 | cond4]

cond3 = (data['์˜์–ด'] > 80)
cond4 = (data['์ˆ˜ํ•™'] > 80)

cond = cond3 & cond4
data[ cond ]

cond = (data['์˜์–ด'] >= 70)  & (data['์ˆ˜ํ•™'] >= 70)  & (data['์ˆ˜ํ•™'] < 90) 
data[ cond ]

cond = (data['์˜์–ด'] >= 70) \
    & (data['์ˆ˜ํ•™'] >= 70) \
     & (data['์ˆ˜ํ•™'] < 90) 

data[ cond ]

์ค„์„ ๋ฐ”๊ฟ€ ๋• \(์—ญ์Šฌ๋ž˜์‰ฌ) ๋ฅผ ์‚ฌ์šฉํ•ด ์ค€๋‹ค. ๊ฐ€๋…์„ฑ์ด ์ข‹์•„์ง„๋‹ค. 

cond_first  =  ( data['๊ตญ์–ด']  > 80)
cond_second =  ( data['์˜์–ด']  > 80)

cond = cond_first   &   cond_second
data[cond]

 

cond_first  =  ( data['๊ตญ์–ด'] > 80 )
cond_second = ( data['์˜์–ด'] > 80 )


cond = cond_first     |   cond_second

data[cond]

 

 

 

 index & column 
data.index

Index(['1๋ฒˆ', '2๋ฒˆ', '3๋ฒˆ'], dtype='object', name='๋ฒˆํ˜ธ')

 

data.index = ['๊ฐ€๋ฐ˜', '๋‚˜๋ฐ˜', '๋‹ค๋ฐ˜']

์ธ๋ฑ์Šค๋ฅผ ๋ฆฌ์ŠคํŠธ๋กœ ์„ค์ •ํ•ด ์ค„ ์ˆ˜ ์žˆ๋‹ค. 

data

์„ค์ •ํ•œ ๋Œ€๋กœ ๋ฐ”๋€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

data.columns

Index(['๊ตญ์–ด', '์˜์–ด', '์ˆ˜ํ•™'], dtype='object')

data.columns = ['Korean','English', 'Math']

๋™์ผํ•˜๊ฒŒ ์ปฌ๋Ÿผ๋„ ๋ฐ”๊ฟ”์ค„ ์ˆ˜ ์žˆ๋‹ค. 

data

data.reset_index()

* reset_index : drop=False๊ฐ€ ๊ธฐ๋ณธ ๊ฐ’ ( ํ˜„์žฌ ์ธ๋ฑ์Šค๋ฅผ ์ปฌ๋Ÿผ์œผ๋กœ ์˜ฎ๊ฒจ ์ฃผ๊ณ  ์ธ๋ฑ์Šค๋ฅผ ๋ฆฌ์…‹) , 

drop = True ( ํ˜„์žฌ ์ธ๋ฑ์Šค์— ์žˆ๋Š” ๊ฐ’์„ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  ์ธ๋ฑ์Šค ์ดˆ๊ธฐํ™” ) 

 


 

 

 

์ฝ”๋“œ - ํŒจ์บ  ์ˆ˜์—… ์ฝ”๋“œ ์ฐธ๊ณ  (ํŒจ์บ  ์ˆ˜์—… ์ •๋ฆฌ)

 

 

<์ด์ „ ๊ธ€>

https://silvercoding.tistory.com/46

 

[python ๊ธฐ์ดˆ] 14. SQLITE ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—ฐ๋™ (2) - ํ…Œ์ด๋ธ” ์กฐํšŒ

์ฝ”๋“œ - ํŒจ์บ  ์ˆ˜์—… ์ฝ”๋“œ ์ฐธ๊ณ  (ํŒจ์บ  ์ˆ˜์—… ์ •๋ฆฌ) <์ด์ „ ๊ธ€> https://silvercoding.tistory.com/45 https://silvercoding.tistory.com/44 https://silvercoding.tistory.com/43 https://silvercoding.tistory.com/42..

silvercoding.tistory.com

 

- sqlite3 import 

import sqlite3

 

 

- DB ์—ฐ๊ฒฐ (์ƒ์„ฑ) / (ํŒŒ์ผ) 

conn = sqlite3.connect('./resource/database.db')

 

 

- Cursor ์—ฐ๊ฒฐ 

c = conn.cursor()

 

 

- ์ปค๋ฐ‹

conn.commit()

์ด๋ฒˆ์—” ์˜คํ† ์ปค๋ฐ‹ ์„ค์ •์„ ํ•˜์ง€ ์•Š์„ ๊ฒƒ์ด๋ฏ€๋กœ ์ฝ”๋“œ์— ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰์—๋Š” ์ปค๋ฐ‹์„ ์„ ์–ธํ•ด ๋†“์•„์•ผ ํ•œ๋‹ค. ์ด ์œ„์— ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์ž. 

 

 

- ๋ฐ์ดํ„ฐ ์ˆ˜์ • 

(1) ๋ฐฉ๋ฒ• 1 : ํŠœํ”Œ 

c.execute("UPDATE users SET username = ? WHERE id = ?", ('niceman', 2))

id = 2 ์ธ username์„ niceman์œผ๋กœ ๋ฐ”๊พธ๋ผ๋Š” ๋œป์˜ ์ฝ”๋“œ์ด๋‹ค.

 

 

(2) ๋ฐฉ๋ฒ• 2 : ๋”•์…”๋„ˆ๋ฆฌ 

c.execute("UPDATE users SET username = :name WHERE id = :id", {'name': 'goodgirl', 'id': 5})

๋”•์…”๋„ˆ๋ฆฌ๋กœ key๋ฅผ ์—ฐ๊ฒฐํ•˜์—ฌ ์ˆ˜์ •ํ•  ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

(3) ๋ฐฉ๋ฒ• 3 : format

c.execute("UPDATE users SET username = '%s' WHERE id ='%s'" % ('badboy', 3))

% ๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ฐ๊ฒฐํ•ด ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

* ์ค‘๊ฐ„ ๋ฐ์ดํ„ฐ ํ™•์ธ 1

for user in c.execute("SELECT * FROM users"):
    print(user)

 (1, 'Kim', 'silver@naver.com', '010-0000-0000', 'kim.com', '2021-08-03 13:28:23') 
 (2, 'niceman', 'Park@daum.net', '010-1111-1111', 'Park.com', '2021-08-03 13:28:23') 
 (3, 'badboy', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08-03 13:28:23') 
 (4, 'Cho', 'Cho@daum.net', '010-3333-3333', 'Cho.com', '2021-08-03 13:28:23') 
 (5, 'goodgirl', 'Yoo@google.com', '010-4444-4444', 'Yoo.net', '2021-08-03 13:28:23') 

์ด๋ ‡๊ฒŒ ๋ฐ์ดํ„ฐ ์ˆœํšŒ๋ฅผ ํ†ตํ•ด ๋ฐ”๋€ ๊ฒƒ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

- ๋ฐ์ดํ„ฐ ์‚ญ์ œ 

(1) Row Delete 

# ๋ฐฉ๋ฒ• 1
c.execute("DELETE FROM users WHERE id = ?", (2,))

# ๋ฐฉ๋ฒ• 2 
c.execute("DELETE FROM users WHERE id = :id", {'id' : 5})

# ๋ฐฉ๋ฒ• 3 
c.execute("DELETE FROM users WHERE id = '%s'" % 4)

์‚ญ์ œ๋„ ๊ฐ™์€ 3๊ฐ€์ง€์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์›ํ•˜๋Š” ํ–‰์„ ์„ ํƒํ•˜์—ฌ ์‚ญ์ œํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

* ์ค‘๊ฐ„ ๋ฐ์ดํ„ฐ ํ™•์ธ 2 

for user in c.execute("SELECT * FROM users"):
    print(user)

 (1, 'Kim', 'silver@naver.com', '010-0000-0000', 'kim.com', '2021-08-03 13:28:23')    
 (3, 'badboy', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08-03 13:28:23')   

 

 

(2) ํ…Œ์ด๋ธ” ์ „์ฒด ๋ฐ์ดํ„ฐ ์‚ญ์ œ 

print("users db deleted : ", conn.execute('DELETE FROM users').rowcount, " rows")

 users db deleted :  2  rows 

๋ชจ๋‘ ์‚ญ์ œ๋˜์—ˆ๋‹ค!

 

 

- ์ปค๋ฐ‹ & ์ ‘์† ํ•ด์ œ 

# ์ปค๋ฐ‹
conn.commit()

# ์ ‘์† ํ•ด์ œ 
conn.close()

์ปค๋ฐ‹์€ ๊ณ„์† ์„ ์–ธ์ด ๋˜์–ด ์žˆ์–ด์•ผ ํ•˜๊ณ , with๋ฌธ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด ์ ‘์†ํ•ด์ œ๋ฅผ ๊ผญ ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค. 

 

 

 

์ฝ”๋“œ - ํŒจ์บ  ์ˆ˜์—… ์ฝ”๋“œ ์ฐธ๊ณ  (ํŒจ์บ  ์ˆ˜์—… ์ •๋ฆฌ)

 

 

<์ด์ „ ๊ธ€>

https://silvercoding.tistory.com/45

 

[python ๊ธฐ์ดˆ] 13. SQLITE ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—ฐ๋™ (1) - ํ…Œ์ด๋ธ” ์ƒ์„ฑ ๋ฐ ์‚ฝ์ž… ์‚ญ์ œ

์ฝ”๋“œ - ํŒจ์บ  ์ˆ˜์—… ์ฝ”๋“œ ์ฐธ๊ณ  (ํŒจ์บ  ์ˆ˜์—… ์ •๋ฆฌ) <์ด์ „ ๊ธ€> https://silvercoding.tistory.com/44 https://silvercoding.tistory.com/43 https://silvercoding.tistory.com/42 https://silvercoding.tistory.com/41..

silvercoding.tistory.com

 

* sqlite3 import 

import sqlite3

 

 

* DB ํŒŒ์ผ ์กฐํšŒ (์—†์œผ๋ฉด ์ƒˆ๋กœ ์ƒ์„ฑ) 

conn = sqlite3.connect('./resource/database.db') # ๋ณธ์ธ DB ๊ฒฝ๋กœ

 

 

* ์ปค์„œ ๋ฐ”์ธ๋”ฉ

c = conn.cursor()

 

 

* ํ…Œ์ด๋ธ” ์กฐํšŒ 

(1) ์ „์ฒด ๋ฐ์ดํ„ฐ ์กฐํšŒ 

c.execute('SELECT * FROM users')

์ด๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ์•„๋ฌด์ผ๋„ ์ผ์–ด๋‚˜์ง€ ์•Š๋Š”๋‹ค. ์ด์ œ ์ด ์ƒํƒœ์—์„œ ์ปค์„œ ์œ„์น˜๋ฅผ ๋ณ€๊ฒฝํ•ด๊ฐ€๋ฉฐ ๋ฐ์ดํ„ฐ๋ฅผ ์กฐํšŒํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

(2) ์ปค์„œ ์œ„์น˜ ๋ณ€๊ฒฝ 

- 1๊ฐœ ๋กœ์šฐ ์„ ํƒ 

print('One -> \n', c.fetchone())

 One -> (1, 'Kim', 'silver@naver.com', '010-0000-0000', 'kim.com', '2021-08-03 13:28:23') 

fetchone()์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํ•œ๊ฐœ๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

- ์ง€์ • ๋กœ์šฐ ์„ ํƒ 

print('Three -> \n', c.fetchmany(size=3))

 Three -> [(2, 'Park', 'Park@daum.net', '010-1111-1111', 'Park.com', '2021-08-03 13:28:23'), 
(3, 'Lee', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08-03 13:28:23'), (4, 
'Cho', 'Cho@daum.net', '010-3333-3333', 'Cho.com', '2021-08-03 13:28:23')] 

fetchmany()์˜ ์ธ์ž์— size ๊ฐœ์ˆ˜๋ฅผ ์ •ํ•ด์ฃผ๋ฉด ์ปค์„œ์˜ ํ˜„์žฌ ์œ„์น˜๋ถ€ํ„ฐ size๊ฐœ์ˆ˜ ๋งŒํผ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. 

 

 

- ์ „์ฒด ๋กœ์šฐ ์„ ํƒ 

print('All -> \n', c.fetchall())

 All -> [(5, 'Yoo', 'Yoo@google.com', '010-4444-4444', 'Yoo.net', '2021-08-03 13:28:23')]  

fetchall() ๋กœ ํ˜„์žฌ ์ปค์„œ์˜ ์œ„์น˜ ๋ถ€ํ„ฐ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. 

print('All -> \n', c.fetchall())

 All -> [] 

5๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์กฐํšŒํ–ˆ์œผ๋ฏ€๋กœ ๋‚จ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๋‹ค! 

 

 

 

์ปค์„œ์˜ ์œ„์น˜๊ฐ€ ๋๋‚˜๋ฉด ์‹คํ–‰์ด ์•ˆ๋˜๋ฏ€๋กœ ์ฃผ์„์ฒ˜๋ฆฌ ํ•ด๊ฐ€๋ฉฐ ์‹คํ–‰ ํ•œ๋‹ค. 

(3) ์ˆœํšŒ 

- ์ˆœํšŒ 1 

rows = c.fetchall()
for row in rows: 
    print('retrieve1 > ', row)

 retrieve1 >  (1, 'Kim', 'silver@naver.com', '010-0000-0000', 'kim.com', '2021-08-03 13:28:23') 
 retrieve1 >  (2, 'Park', 'Park@daum.net', '010-1111-1111', 'Park.com', '2021-08-03 13:28:23') 
 retrieve1 >  (3, 'Lee', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08-03 13:28:23') 
 retrieve1 >  (4, 'Cho', 'Cho@daum.net', '010-3333-3333', 'Cho.com', '2021-08-03 13:28:23') 
 retrieve1 >  (5, 'Yoo', 'Yoo@google.com', '010-4444-4444', 'Yoo.net', '2021-08-03 13:28:23') 

fetchall()์„ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜๋ชฉ๋ฌธ์œผ๋กœ ์ˆœํšŒํ•ด ์ค„ ์ˆ˜ ์žˆ๋‹ค. 

 

 

- ์ˆœํšŒ 2 

for row in c.fetchall():
    print('retrieve2 > ', row)

 retrieve2 >  (1, 'Kim', 'silver@naver.com', '010-0000-0000', 'kim.com', '2021-08-03 13:28:23') 
 retrieve2 >  (2, 'Park', 'Park@daum.net', '010-1111-1111', 'Park.com', '2021-08-03 13:28:23') 
 retrieve2 >  (3, 'Lee', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08-03 13:28:23') 
 retrieve2 >  (4, 'Cho', 'Cho@daum.net', '010-3333-3333', 'Cho.com', '2021-08-03 13:28:23') 
 retrieve2 >  (5, 'Yoo', 'Yoo@google.com', '010-4444-4444', 'Yoo.net', '2021-08-03 13:28:23') 

์ˆœํšŒ 1 ๊ณผ ๊ฐ™์ง€๋งŒ ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด ์ฃผ์ง€ ์•Š์•˜์„ ๋ฟ์ด๋‹ค. 

 

 

- ์ˆœํšŒ 3 : ๊ฐ€๋…์„ฑ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Œ 

for row in c.execute('SELECT * FROM users ORDER BY id desc'):
    print('retrieve3 > ', row)

 retrieve3 >  (5, 'Yoo', 'Yoo@google.com', '010-4444-4444', 'Yoo.net', '2021-08-03 13:28:23') 
 retrieve3 >  (4, 'Cho', 'Cho@daum.net', '010-3333-3333', 'Cho.com', '2021-08-03 13:28:23') 
 retrieve3 >  (3, 'Lee', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08-03 13:28:23') 
 retrieve3 >  (2, 'Park', 'Park@daum.net', '010-1111-1111', 'Park.com', '2021-08-03 13:28:23') 
 retrieve3 >  (1, 'Kim', 'silver@naver.com', '010-0000-0000', 'kim.com', '2021-08-03 13:28:23') 

fetchall()์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  execute๋กœ ๋ฐ”๋กœ ์ˆœํšŒํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ์ด๋ฒˆ์—” ORDER BY ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์„ ํ•˜์—ฌ ์ˆœํšŒํ–ˆ๋‹ค. 

 

 

 

(4) WHERE Retrieve 

- ๋ฐฉ๋ฒ• 1 : ํŠœํ”Œ

param1 = (3,)
c.execute('SELECT * FROM users WHERE id=?', param1)
print('param1', c.fetchone())
print('param1', c.fetchall()) # ๋ฐ์ดํ„ฐ ์—†์Œ

 param1 (3, 'Lee', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08-03 13:28:23') 
 param1 [] 

WHERE์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ ธ์˜ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์„ ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. execute์˜ ์ธ์ž์— ํŠœํ”Œ๋กœ id๊ฐ’์„ ๋„ฃ์–ด์ฃผ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. 

 

 

- ๋ฐฉ๋ฒ• 2 : format 

param2 = 4
c.execute('SELECT * FROM users WHERE id="%s"' % param2)  # %s, %f, %d
print('param2', c.fetchone())
print('param2', c.fetchall()) # ๋ฐ์ดํ„ฐ ์—†์Œ

 param2 (4, 'Cho', 'Cho@daum.net', '010-3333-3333', 'Cho.com', '2021-08-03 13:28:23')

 param2 [] 

% ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ id๊ฐ’์„ ๋„ฃ์–ด์ค„ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

- ๋ฐฉ๋ฒ• 3 : dictionary 

c.execute('SELECT * FROM users WHERE id=:Id', {"Id":5})
print('param3', c.fetchone())
print('param3', c.fetchall()) # ๋ฐ์ดํ„ฐ ์—†์Œ

 param3 (5, 'Yoo', 'Yoo@google.com', '010-4444-4444', 'Yoo.net', '2021-08-03 13:28:23') 
 param3 [] 

execute์˜ ์ธ์ž์— dictionary๋ฅผ ์ž‘์„ฑํ•˜๊ณ , key๊ฐ’์„ ์ผ์น˜์ง€์ผœ ์ค€๋‹ค. 

 

 

- ๋ฐฉ๋ฒ• 4 : IN & tuple

param4 = (3, 5)
c.execute('SELECT * FROM users WHERE id IN(?,?)', param4)
print('param4', c.fetchall())

 param4 [(3, 'Lee', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08- 03 13:28:23'), (5, 'Yoo', 'Yoo@google.com',  '010-4444-4444', 'Yoo.net', '2021-08-03 13:28:23')] 

IN ๊ณผ ํŠœํ”Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. 

 

 

- ๋ฐฉ๋ฒ• 5 : IN & format

c.execute('SELECT * FROM users WHERE id IN("%d", "%d")' % (3, 4))
print('param5', c.fetchall())

 param5 [(3, 'Lee', 'Lee@naver.com', '010-2222-2222', 'Lee.com', '2021-08-03 13:28:23'), (4, 'Cho', 'Cho@daum.net', '010-3333-3333', 'Cho.com', '2021-08-03 13:28:23')]  

IN๊ณผ % ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐœ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

- ๋ฐฉ๋ฒ• 6 : OR & dictionary

c.execute('SELECT * FROM users WHERE id=:id1 OR id=:id2', {'id1':2, 'id2':5})
print('param6', c.fetchall())

 param6 [(2, 'Park', 'Park@daum.net', '010-1111-1111', 'Park.com', '2021-08-03 13:28:23'), (5, 'Yoo', 'Yoo@google.com', '010-4444-4444', 'Yoo.net', '2021-08-03 13:28:23')] 

OR๊ณผ ์ธ์ž์— dictionary์— key๋กœ ์—ฐ๊ฒฐ์‹œ์ผœ ์ฃผ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค. 

 

 

 

(5) Dump ์ถœ๋ ฅ ( ์ค‘์š”! ) 

with conn:
    with open('./resource/dump.sql', 'w') as f:
        for line in conn.iterdump():
            f.write('%s\n' % line)
        print('Dump print Complete')

 Dump print Complete 

์ด๋ ‡๊ฒŒ ํ…Œ์ด๋ธ” ์ƒ์„ฑ, ๊ฐ’ ์‚ฝ์ž… ์—ฐ์‚ฐ๋“ค์„ ํ•œ ๋ฒˆ์— ๋ณผ ์ˆ˜ ์žˆ๋Š” ํŒŒ์ผ์ด ์ƒ์„ฑ ๋œ๋‹ค. 

 

 

 

(6) ์—ฐ๊ฒฐ ํ•ด์ œ

* f.close(), conn.close() : with๋ฌธ์„ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ž๋™ ํ˜ธ์ถœ ๋œ๋‹ค. 

 

 

 

 

์ฝ”๋“œ - ํŒจ์บ  ์ˆ˜์—… ์ฝ”๋“œ ์ฐธ๊ณ  (ํŒจ์บ  ์ˆ˜์—… ์ •๋ฆฌ)

 

 

<์ด์ „ ๊ธ€>

https://silvercoding.tistory.com/44

 

[python ๊ธฐ์ดˆ] 12. ์™ธ๋ถ€ ํŒŒ์ผ ์ฒ˜๋ฆฌ (Excel, CSV ํŒŒ์ผ ์ฝ๊ธฐ ๋ฐ ์“ฐ๊ธฐ)

์ฝ”๋“œ - ํŒจ์บ  ์ˆ˜์—… ์ฝ”๋“œ ์ฐธ๊ณ  (ํŒจ์บ  ์ˆ˜์—… ์ •๋ฆฌ) <์ด์ „ ๊ธ€> https://silvercoding.tistory.com/43 https://silvercoding.tistory.com/42 https://silvercoding.tistory.com/41 https://silvercoding.tistory.com/40..

silvercoding.tistory.com

 

import sqlite3
import datetime

์šฐ์„  ์‚ฌ์šฉํ•  ๊ฒƒ import ํ•ด์ฃผ๊ธฐ 

 

 

- SQliteDatabaseBrowserPortable.exe ๋ฅผ ์„ค์น˜ํ•ด ์ค€๋‹ค. 

์ด๊ณณ์—์„œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์ง๊ด€์ ์œผ๋กœ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

- ์‚ฝ์ž… ๋‚ ์งœ ์ƒ์„ฑ 

now = datetime.datetime.now()
print('now: ', now)

nowDateTime = now.strftime('%Y-%m-%d %H:%M:%S')
print('nowDateTime: ', nowDateTime)

 now:  2021-08-03 11:43:31.850770 
 nowDateTime:  2021-08-03 11:43:31 

๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๋„ฃ์–ด์ค„ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์ „ ์ƒ์„ฑํ•œ๋‹ค.  

 

 

 

- sqlite3 ์‚ดํŽด๋ณด๊ธฐ 

print('sqlite3.version: ', sqlite3.version)
print('sqlite3.sqlite_version: ', sqlite3.sqlite_version)

 sqlite3.version:  2.6.0 
 sqlite3.sqlite_version:  3.33.0 

 

 

 

 

 

 

* sqlite ์‚ฌ์šฉ 

(1) DB ์ƒ์„ฑ & Auto Commit (Rollback) 

conn = sqlite3.connect('./resource/database.db', isolation_level=None)

DB๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์—ฐ๊ฒฐํ•ด ์ค€๋‹ค. DB์— ๋ช…๋ นํ•  ๋•Œ ์ปค๋ฐ‹(commit)์„ ๊ผญ ํ•ด์ฃผ์–ด์•ผ ํ•˜๋Š”๋ฐ , isolation_level=None์œผ๋กœ ์„ค์ •ํ•˜๋ฉด AutoCommit์ด ๋œ๋‹ค. 

 

๊ทธ๋ฆฌ๊ณ  ์•„๊นŒ ๋‹ค์šด๋ฐ›์•˜๋˜ SQliteDatabaseBrowserPortable.exe ์—์„œ [ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—ด๊ธฐ - database.db ] ๋ฅผ ์‹คํ–‰ ํ•ด ์ค€๋‹ค. 

 

 

(2) Cursor 

# Cursor 
c = conn.cursor()
print('Cursor Type: ', type(c))

 Cursor Type:  <class 'sqlite3.Cursor'> 

cursor ์—ฐ๊ฒฐ์„ ํ•ด์ค€๋‹ค. ์ด ์ปค์„œ๋กœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๋ช…๋ น์„ ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

(3) ํ…Œ์ด๋ธ” ์ƒ์„ฑ CREATE TABLE 

( Data type : TEXT, NUMERIC, INTEGER, REAL BLOB )

c.execute('CREATE TABLE IF NOT EXISTS users(id INTEGER PRIMARY KEY, username text, email text, phone text, website text, regdate text)')

 

ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ CREATE TABLE ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋‹ค์Œ ํ…Œ์ด๋ธ”์ด๋ฆ„(์†์„ฑ ํƒ€์ž…) ์ด๋Ÿฐ ํ˜•ํƒœ๋กœ ๋„ฃ์–ด์ค€๋‹ค. id ์†์„ฑ์„ PRIMARY KEY๋กœ ์„ค์ •ํ•ด ์ฃผ์—ˆ๋‹ค. 

์ด๋ ‡๊ฒŒ user ํ…Œ์ด๋ธ”์ด ์ƒ์„ฑ๋˜์—ˆ๋‹ค! 

 

 

(4) ๋ฐ์ดํ„ฐ ์‚ฝ์ž… INSERT INTO

# ๋ฐฉ๋ฒ• 1 
c.execute("INSERT INTO users VALUES(1, 'Kim', 'silver@naver.com', '010-0000-0000', 'kim.com', ?)", (nowDateTime,))
# ๋ฐฉ๋ฒ• 2 
c.execute('INSERT INTO users(id, username, email, phone, website, regdate) VALUES (?, ?, ?, ?, ?, ?)', (2, 'Park', 'Park@daum.net', '010-1111-1111', 'Park.com', nowDateTime))

- INSERT INTO ํ…Œ์ด๋ธ”์ด๋ฆ„ VALUES() 

- INSERT INTO ํ…Œ์ด๋ธ”์ด๋ฆ„() VALUES() 

๋ฐฉ๋ฒ• 1์€ ๋ฐ”๋กœ VALUES์•ˆ์— ๊ฐ’์„ ๋„ฃ๊ณ , ๋ณ€์ˆ˜๋ฅผ ๋„ฃ์„ ๊ฒฝ์šฐ ?๋กœ ์ž‘์„ฑํ•œ ํ›„ execute์˜ ์ธ์ž์—์„œ ํŠœํ”Œ์•ˆ์— ๋ณ€์ˆ˜๋ฅผ ๋„ฃ์–ด์ค€๋‹ค.  ๋ฐฉ๋ฒ• 2๋Š” ํ…Œ์ด๋ธ”์ด๋ฆ„() ์•ˆ์— ์†์„ฑ ์ด๋ฆ„์„ ๋ช…์‹œ์ ์œผ๋กœ ๋‚˜์—ดํ•˜๊ณ  , VALUES() ์•ˆ์— ๋ชจ๋‘ ? ๋กœ ์ž‘์„ฑํ•ด ์ฃผ๊ณ , execute์˜ ์ธ์ž์—์„œ ํŠœํ”Œ ์•ˆ์— ๊ฐ’ ๋˜๋Š” ๋ณ€์ˆ˜๋ฅผ ๋„ฃ์–ด์ค€๋‹ค. 

์ƒˆ๋กœ๊ณ ์นจ์„ ํ•ด์ฃผ๋ฉด ์œ„์™€ ๊ฐ™์ด ๊ฐ’์ด ์ƒ์„ฑ๋˜์—ˆ๋‹ค! 

 

 

(5) Many ์‚ฝ์ž… ( ํŠœํ”Œ, ๋ฆฌ์ŠคํŠธ ) 

userList = (
    (3, 'Lee', 'Lee@naver.com', '010-2222-2222', 'Lee.com', nowDateTime),
    (4, 'Cho', 'Cho@daum.net', '010-3333-3333', 'Cho.com', nowDateTime),
    (5, 'Yoo', 'Yoo@google.com', '010-4444-4444', 'Yoo.net', nowDateTime)
)

c.executemany('INSERT INTO users(id, username, email, phone, website, regdate) VALUES(?, ?, ?, ?, ?, ?)', userList)

excutemany์˜ ๋‘๋ฒˆ์งธ ์ธ์ž์— ํŠœํ”Œ์ด๋‚˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋„ฃ์–ด์ฃผ๋ฉด ํ•œ๊บผ๋ฒˆ์— ๋ฐ์ดํ„ฐ ์‚ฝ์ž…์ด ๊ฐ€๋Šฅํ•˜๋‹ค. 

 

 

(6) ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ ์‚ญ์ œ 

conn.execute('DELETE FROM users')

์ด๋ ‡๊ฒŒ ์ž‘์„ฑํ•ด ์ฃผ๋ฉด users ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ญ์ œ๋œ๋‹ค. 

 

 

- ์ง€์›Œ์ง„ ๊ฐœ์ˆ˜ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์‚ญ์ œํ•˜๊ธฐ 

print('users db delete : ', conn.execute('DELETE FROM users').rowcount)

 users db delete :  5 

rowcount๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค. 

 

 

(7) ์ปค๋ฐ‹ & ๋กค๋ฐฑ 

: isolation_level="None" ์ผ ๊ฒฝ์šฐ ์ž๋™ ๋ฐ˜์˜ 

# ์ปค๋ฐ‹
# conn.commit()

# ๋กค๋ฐฑ
# conn.rollback()

๋ณธ ํฌ์ŠคํŒ…์—์„  ์˜คํ† ์ปค๋ฐ‹ ์„ค์ •์„ ํ•ด๋†จ์œผ๋ฏ€๋กœ ์ฃผ์„์ฒ˜๋ฆฌ๋ฅผ ํ•ด ๋†“๋Š”๋‹ค. 

 

 

(8) ์ ‘์† ํ•ด์ œ 

conn.close()

๋งˆ๋ฌด๋ฆฌ ํ•  ๋•Œ๋Š” ์ ‘์†ํ•ด์ œ๋ฅผ ๊ผญ ํ•ด์ฃผ๋„๋ก ํ•œ๋‹ค! 

 

 

 

+ Recent posts