(์ด๋ฒˆ ํ”„๋กœ์ ํŠธ ์ฝ”๋“œ๋Š” ํŒจ์บ  ๋”ฅ๋Ÿฌ๋‹ ๊ฐ•์˜๋ฅผ ์ฐธ๊ณ ํ•œ ์ฝ”๋“œ์ด๋‹ค)

 

์˜ค๋Š˜ ์•Œ์•„๋ณผ ๋ฐ์ดํ„ฐ๋Š” ๋”ฅ๋Ÿฌ๋‹ ์ž…๋ฌธ ๋•Œ ๋ฌด์กฐ๊ฑด ๋ฐฐ์šฐ๋Š” ์œ ๋ช…ํ•œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์ธ MNIST ๋ฐ์ดํ„ฐ์…‹์ด๋‹ค. 

์ถœ์ฒ˜ https://ko.wikipedia.org/wiki/MNIST_%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%B2%A0%EC%9D%B4%EC%8A%A4

์‚ฌ์ง„๊ณผ ๊ฐ™์ด ์†์œผ๋กœ ์ง์ ‘ ์“ด ์ˆซ์ž๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ ๋Œ€ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์ด๋‹ค. ์ด๋ฒˆ ๋”ฅ๋Ÿฌ๋‹ ์ฒซ ํ”„๋กœ์ ํŠธ๋กœ๋Š” RNN์œผ๋กœ MNIST๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฅ˜ํ•ด๋‚ด๋Š” ์ž‘์—…์„ ์ง„ํ–‰ํ•  ๊ฒƒ์ด๋‹ค. 

 

 

 

<MNIST ๋ฐ์ดํ„ฐ ๋‹ค์šด๋ฐ›๊ธฐ>

http://yann.lecun.com/exdb/mnist/

 

MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges

 

yann.lecun.com

์—ฌ๊ธฐ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์šด๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค.

 

๋‚˜๋Š” ์‚ฌ์ดํŠธ์—์„œ ๋‹ค์šด๋ฐ›์ง€ ์•Š๊ณ , keras์˜ dataset์—์„œ ์ง€์›ํ•ด์ฃผ๋Š” ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ๋“ค ์ค‘ MNIST๋„ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ฝ”๋“œ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๊ฒƒ์ด๋‹ค. 

 

 

 

(1) mnist ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

mnist = keras.datasets.mnist
((train_images, train_labels), (test_images, test_labels) = mnist.load_data()

 

 

 

(2) MNIST ๋ฐ์ดํ„ฐ ํฌ๊ธฐ ์•Œ์•„๋ณด๊ธฐ 

print(f"train_images: {train_images.shape}")
print(f"train_labels: {train_labels.shape}")

print(f"test__images: {test_images.shape}")
print(f"test_labels: {test_labels.shape}")

๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋Š” shape์œผ๋กœ ์•Œ์•„๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์œ„์™€๊ฐ™์ด ์ถœ๋ ฅํ•˜๋ฉด train image๋Š” ์ด 60000์žฅ, test image๋Š” ์ด 10000์žฅ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.  

์ด๋ฏธ์ง€๊ฐ€ (60000, 28, 28) ์ธ ๊ฒƒ์€ ์ด 60000์žฅ, ํฌ๊ธฐ๊ฐ€ 28x28์ธ ์ด๋ฏธ์ง€๋ผ๋Š” ๋œป์ด๋‹ค. 

 

 

 

 (3) ์‚ฌ์ง„ ์ถœ๋ ฅํ•ด๋ณด๊ธฐ 

plt.figure()
plt.imshow(train_images[7777], cmap='gray') # ์ด๋ฏธ์ง€ ์‹œ๊ฐํ™”, ํ‘๋ฐฑ์œผ๋กœ ์ถœ๋ ฅ
plt.colorbar() # ์šฐ์ธก ์ปฌ๋Ÿฌ๋ฐ” ์ƒ์„ฑ
plt.grid(True) # grid ์ƒ์„ฑ 
plt.show()
print(train_labels[7777]) # ๋ผ๋ฒจ ์ถœ๋ ฅ

7์„ ์ข‹์•„ํ•˜๊ธฐ ๋•Œ๋ฌธ์— 7777๋ฒˆ์งธ ์‚ฌ์ง„์„ ์ถœ๋ ฅํ•ด ๋ดค๋‹ค. 

๊ทธ๋žฌ๋”๋‹ˆ ์ด๋Ÿฐ ๊ทธ๋ฆผ์ด ๋‚˜์™”๋‹ค. 8์ฒ˜๋Ÿผ ์ƒ๊ฒผ๊ณ , ๋ผ๋ฒจ์„ ์ถœ๋ ฅํ•œ ๊ฒฐ๊ณผ๋„ 8์ด ๋‚˜์™”๋‹ค. 

 

 

 

(4) MNIST ๋ฐ์ดํ„ฐ์…‹์˜ data type ์•Œ์•„๋ณด๊ธฐ

print(train_images.dtype)
print(train_labels.dtype)
print(test_images.dtype)
print(test_labels.dtype)

data์˜ type์„ ์•Œ๊ธฐ ์œ„ํ•ด dtype์„ ์‚ฌ์šฉํ•ด์ค€๋‹ค. ์œ„์™€ ๊ฐ™์ด ์ถœ๋ ฅํ•˜๋ฉด ๋ชจ๋‘ uint8 ์ด ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. uint8์ด ๋ฌด์Šจ ํƒ€์ž…์ธ์ง€ ๊ถ๊ธˆํ•ด์„œ ๊ตฌ๊ธ€๋ง ํ•ด๋ณด์•˜๋”๋‹ˆ unsigned interger ์ด๋ฏ€๋กœ ์–‘์ˆ˜๋งŒ ํ‘œํ˜„ํ•˜๋Š” ํƒ€์ž…์ด๋ฉฐ, 0-255 ์‚ฌ์ด์˜ ์ •์ˆ˜ํ˜• ํƒ€์ž…์ด๋‹ค. 2^8 ๊ฐœ์ˆ˜๋งŒํผ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค. 

 

numpy๊ด€๋ จ data type ์ฐธ๊ณ ์ž๋ฃŒ

https://numpy.org/doc/stable/user/basics.types.html

 

Data types — NumPy v1.20 Manual

Array Scalars NumPy generally returns elements of arrays as array scalars (a scalar with an associated dtype). Array scalars differ from Python scalars, but for the most part they can be used interchangeably (the primary exception is for versions of Python

numpy.org

 

 

 

 

์—ฌ๊ธฐ๊นŒ์ง€ MNIST๋ฐ์ดํ„ฐ์— ๋Œ€ํ•˜์—ฌ ์•Œ์•„๋ณด์•˜๋‹ค. ์ด ์ •๋ฆฌํ•˜์ž๋ฉด train 60000์žฅ ํฌ๊ธฐ 28x28, test 10000์žฅ ํฌ๊ธฐ 28x28 ์ธ ์†๊ธ€์”จ ๋ฐ์ดํ„ฐ์…‹์ด๋‹ค. ๊ธฐ๋ณธ data type์€ uint8์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

๋”ฅ๋Ÿฌ๋‹ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰์„ ํ•˜๋ฉฐ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„์„ํ•ด ๋ณผ๊ฒƒ์ด์ง€๋งŒ ๋ฐ์ดํ„ฐ์…‹๊ณผ ์นœํ•ด์ง€๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์ธ ๊ฒƒ ๊ฐ™์•„ ๋”ฐ๋กœ ํฌ์ŠคํŒ…์„ ํ–ˆ๋‹ค. ์ด์ œ ๋‹ค์Œ์—๋Š” MNIST ๋ฐ์ดํ„ฐ๋ฅผ ๋” ์ž์„ธํžˆ ์•Œ์•„๋ณด๊ณ  ์ „์ฒ˜๋ฆฌ ํ•˜๋ฉฐ ๋ชจ๋ธ๋ง์„ ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒํƒœ๋กœ ๋งŒ๋“ค์–ด์ฃผ๋Š” ํฌ์ŠคํŒ…์„ ํ•  ์˜ˆ์ •์ด๋‹ค. 

 

+ Recent posts