AI/ML

Intro to Deep Learning with Python

Neural networks, TensorFlow, and PyTorch from the ground up

Feb 10, 2025

10 min read

Deep learning powers image recognition, language models, recommendation systems, and much more. This guide walks you through the core concepts — from neurons to training loops — with hands-on examples in both TensorFlow and PyTorch.

1. What Is a Neural Network?

A neural network is a stack of layers. Each layer applies a linear transformation followed by a non-linear activation function. The network learns by adjusting its weights to minimize a loss function:

Input layer — receives raw data (pixels, tokens, numbers)
Hidden layers — extract increasingly abstract features
Output layer — produces predictions (class probabilities, values)
Activation functions — ReLU, sigmoid, softmax add non-linearity
Loss function — measures how wrong the predictions are
Optimizer — adjusts weights to reduce the loss (SGD, Adam)

2. Installation

bash
# TensorFlow
pip install tensorflow

# PyTorch (CPU — for GPU visit pytorch.org for the right command)
pip install torch torchvision

3. Your First Neural Network with TensorFlow/Keras

Let's classify handwritten digits from the MNIST dataset — the "Hello World" of deep learning:

python
import tensorflow as tf
from tensorflow import keras

# Load data (60k training images, 10k test images, 28x28 pixels)
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize pixel values from [0, 255] to [0, 1]
x_train = x_train / 255.0
x_test  = x_test  / 255.0

# Build the model
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),   # 784 inputs
    keras.layers.Dense(128, activation="relu"),    # hidden layer
    keras.layers.Dropout(0.2),                     # regularization
    keras.layers.Dense(10, activation="softmax"),  # 10 digit classes
])

# Compile
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

# Train
model.fit(x_train, y_train, epochs=5, validation_split=0.1)

# Evaluate
loss, acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {acc:.4f}")  # ~98%

Dropout randomly turns off neurons during training, forcing the network to learn redundant representations. This prevents overfitting — where the model memorizes training data but fails on new inputs.

4. The Same Network in PyTorch

python
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data loading
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_set = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)

# Define model
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = x.view(-1, 784)     # flatten
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        return self.fc2(x)       # raw logits (CrossEntropyLoss handles softmax)

model = Net()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(5):
    for images, labels in train_loader:
        optimizer.zero_grad()
        output = model(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1} done")

5. Key Concepts Explained

python
# Activation functions
relu    = lambda x: max(0, x)          # most common for hidden layers
sigmoid = lambda x: 1 / (1 + exp(-x)) # binary classification output
softmax                                 # multi-class output (sums to 1)

# Loss functions
# Regression:
loss = nn.MSELoss()           # Mean Squared Error

# Binary classification:
loss = nn.BCEWithLogitsLoss()

# Multi-class classification:
loss = nn.CrossEntropyLoss()  # combines softmax + negative log likelihood

# Optimizers
optim.SGD(params, lr=0.01, momentum=0.9)  # classic stochastic gradient descent
optim.Adam(params, lr=1e-3)               # adaptive — usually best default choice

6. Convolutional Neural Networks (CNNs)

CNNs are designed for image data. Convolutional layers learn local patterns (edges, textures) regardless of their position in the image:

python
# Keras CNN for image classification
model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation="relu"),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Flatten(),
    keras.layers.Dense(64, activation="relu"),
    keras.layers.Dense(10, activation="softmax"),
])
# ~99% accuracy on MNIST vs ~98% for dense-only

CNNs use weight sharing — the same filter is applied across the entire image, drastically reducing parameters compared to fully connected layers. A 32×32 image with 64 filters needs only 64×3×3=576 weights, not 32×32×64=65,536.

7. Using Pre-trained Models (Transfer Learning)

Don't train from scratch when a pre-trained model can give you 90%+ accuracy in minutes:

python
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, Model

# Load MobileNetV2 without the top classification layer
base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights="imagenet")
base_model.trainable = False   # freeze pre-trained weights

# Add your own classification head
x = layers.GlobalAveragePooling2D()(base_model.output)
x = layers.Dense(128, activation="relu")(x)
output = layers.Dense(5, activation="softmax")(x)   # 5 custom classes

model = Model(base_model.input, output)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

What's Next?

Recurrent networks (LSTMs) for text and time series
Transformers and attention — the architecture behind GPT
Hugging Face for pre-trained NLP models
Deploying models with FastAPI or TensorFlow Serving

Back to Blog