TechStackTutor Logo
HOMEBLOGKIDSABOUT USCONTACT USBOOK DEMO
AI/ML

Intro to Deep Learning with Python

Neural networks, TensorFlow, and PyTorch from the ground up

Feb 10, 2025

10 min read

Deep learning powers image recognition, language models, recommendation systems, and much more. This guide walks you through the core concepts — from neurons to training loops — with hands-on examples in both TensorFlow and PyTorch.

1. What Is a Neural Network?

A neural network is a stack of layers. Each layer applies a linear transformation followed by a non-linear activation function. The network learns by adjusting its weights to minimize a loss function:

2. Installation

bash
# TensorFlow pip install tensorflow # PyTorch (CPU — for GPU visit pytorch.org for the right command) pip install torch torchvision

3. Your First Neural Network with TensorFlow/Keras

Let's classify handwritten digits from the MNIST dataset — the "Hello World" of deep learning:

python
import tensorflow as tf from tensorflow import keras # Load data (60k training images, 10k test images, 28x28 pixels) (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() # Normalize pixel values from [0, 255] to [0, 1] x_train = x_train / 255.0 x_test = x_test / 255.0 # Build the model model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), # 784 inputs keras.layers.Dense(128, activation="relu"), # hidden layer keras.layers.Dropout(0.2), # regularization keras.layers.Dense(10, activation="softmax"), # 10 digit classes ]) # Compile model.compile( optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"] ) # Train model.fit(x_train, y_train, epochs=5, validation_split=0.1) # Evaluate loss, acc = model.evaluate(x_test, y_test) print(f"Test accuracy: {acc:.4f}") # ~98%

Dropout randomly turns off neurons during training, forcing the network to learn redundant representations. This prevents overfitting — where the model memorizes training data but fails on new inputs.

4. The Same Network in PyTorch

python
import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms from torch.utils.data import DataLoader # Data loading transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) train_set = datasets.MNIST(root="./data", train=True, download=True, transform=transform) train_loader = DataLoader(train_set, batch_size=64, shuffle=True) # Define model class Net(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(784, 128) self.fc2 = nn.Linear(128, 10) self.relu = nn.ReLU() self.dropout = nn.Dropout(0.2) def forward(self, x): x = x.view(-1, 784) # flatten x = self.relu(self.fc1(x)) x = self.dropout(x) return self.fc2(x) # raw logits (CrossEntropyLoss handles softmax) model = Net() optimizer = optim.Adam(model.parameters(), lr=1e-3) criterion = nn.CrossEntropyLoss() # Training loop for epoch in range(5): for images, labels in train_loader: optimizer.zero_grad() output = model(images) loss = criterion(output, labels) loss.backward() optimizer.step() print(f"Epoch {epoch+1} done")

5. Key Concepts Explained

python
# Activation functions relu = lambda x: max(0, x) # most common for hidden layers sigmoid = lambda x: 1 / (1 + exp(-x)) # binary classification output softmax # multi-class output (sums to 1) # Loss functions # Regression: loss = nn.MSELoss() # Mean Squared Error # Binary classification: loss = nn.BCEWithLogitsLoss() # Multi-class classification: loss = nn.CrossEntropyLoss() # combines softmax + negative log likelihood # Optimizers optim.SGD(params, lr=0.01, momentum=0.9) # classic stochastic gradient descent optim.Adam(params, lr=1e-3) # adaptive — usually best default choice

6. Convolutional Neural Networks (CNNs)

CNNs are designed for image data. Convolutional layers learn local patterns (edges, textures) regardless of their position in the image:

python
# Keras CNN for image classification model = keras.Sequential([ keras.layers.Conv2D(32, (3, 3), activation="relu", input_shape=(28, 28, 1)), keras.layers.MaxPooling2D((2, 2)), keras.layers.Conv2D(64, (3, 3), activation="relu"), keras.layers.MaxPooling2D((2, 2)), keras.layers.Flatten(), keras.layers.Dense(64, activation="relu"), keras.layers.Dense(10, activation="softmax"), ]) # ~99% accuracy on MNIST vs ~98% for dense-only

CNNs use weight sharing — the same filter is applied across the entire image, drastically reducing parameters compared to fully connected layers. A 32×32 image with 64 filters needs only 64×3×3=576 weights, not 32×32×64=65,536.

7. Using Pre-trained Models (Transfer Learning)

Don't train from scratch when a pre-trained model can give you 90%+ accuracy in minutes:

python
from tensorflow.keras.applications import MobileNetV2 from tensorflow.keras import layers, Model # Load MobileNetV2 without the top classification layer base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights="imagenet") base_model.trainable = False # freeze pre-trained weights # Add your own classification head x = layers.GlobalAveragePooling2D()(base_model.output) x = layers.Dense(128, activation="relu")(x) output = layers.Dense(5, activation="softmax")(x) # 5 custom classes model = Model(base_model.input, output) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

What's Next?

  • Recurrent networks (LSTMs) for text and time series
  • Transformers and attention — the architecture behind GPT
  • Hugging Face for pre-trained NLP models
  • Deploying models with FastAPI or TensorFlow Serving
Back to Blog