What would be the “Hello World” problem in Ai? Some would say this would be *Spam Filtering using Naive Bayesian Classifier*, but many would take the MNIST dataset representing the model containing 60.000 different ciphers 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. This is also known as the MNIST dataset classification problem.

To work on this problem using TensorFlow it would be enough to import just the TensorFlow library, but I also used `matplotlib`

that looks perfect for drawing the images we are dealing with.

```
import numpy as np
import tensorflow as tf
from matplotlib import pyplot as plt
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
```

Note how we normalized the data via 255.0 division.

We can print the label and the first image from MNIST dataset like this:

```
plt.imshow(x_train[0])
print(y_train[0]) # printing the label
plt.show() # printing the image representation 28x28px
```

In order for the NN to learn to recognize the ciphers we create and compile the model:

```
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
```

Note how we used the densely-connected NN layers. Dense layers implement the operation: `output = activation(dot(input, kernel) + bias)`

where `activation`

is the element-wise activation function passed as the `activation`

argument, `kernel`

is a weights matrix created by the layer, and `bias`

is a bias vector created by the layer (only applicable if `use_bias`

is `True`

.

```
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
```

And train and evaluate the model accuracy

```
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
```

This is what is needed to train the model. We have the image classifier with ~98 accuracy on this dataset. Here is how we will predict:

```
scores = model.predict(x_test[0:1])
print(np.argmax(scores))
```

Next problem to think at this point is can our NN generalize if we rotate the imput images for a certain angle, or if we do shift ciphers from the center of the image.