PyTorch - Tensor

Tensors are the fundamental building block of machine learning in PyTorch — learn what they are, how to manipulate them, and how to use them on a GPU.

Goal of the lesson

By the end of this 3-hour session you should be able to:

  • explain what a tensor is and why machine learning frameworks are built around it,
  • create tensors from Python data, NumPy arrays, and built-in factories,
  • inspect and reshape tensors confidently,
  • write small numerical programs using broadcasting and matrix multiplication,
  • move computation to a GPU,
  • read and write [C, H, W] image tensors and apply simple filters to them.

The tensor is the only data structure deep learning really has. Every model input, every weight, every gradient, every output is a tensor. Spending three hours getting comfortable with them pays off in every chapter that follows.

Suggested timing

BlockTopic
30 minSetup, what a tensor is, scalar/vector/matrix/n-dim
30 minAttributes (shape, dtype, device), factory functions
45 minOperations: arithmetic, broadcasting, matmul, reshape, indexing
30 minNumPy interop and GPU
45 minCapstone — image manipulation with pure tensor ops

Setup

This series targets Windows with uv as the Python project manager.

If you don’t have uv yet, install it from PowerShell:

PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Create the project:

Terminal window
uv init --python 3.12 tensor
cd tensor
uv add torch torchvision matplotlib pillow numpy
Note

On Windows the default torch and torchvision wheels from PyPI are CPU-only. That is fine for everything in this chapter.

If you have an NVIDIA GPU and want CUDA, see Using uv with PyTorch and the AI - CUDA page. You will need to point uv at the PyTorch CUDA wheel index.

Check the install:

main.py
import torch
print("torch:", torch.__version__)
print("cuda available:", torch.cuda.is_available())
Terminal window
uv run main.py

You should see the version number and cuda available: False (or True if you set up CUDA).

What is a tensor?

A tensor is a multi-dimensional container for numbers. You already know its low-dimensional cousins:

Math nameTensor nameExample
Numberscalar (0-D)7
List of numbersvector (1-D)[1, 2, 3]
Tablematrix (2-D)[[1, 2], [3, 4]]
Cube of numbers3-D tensorRGB image, time-series of frames
n-D tensorMini-batch of RGB images: [batch, channels, height, width]

Why does deep learning need them?

  • Hardware fits. GPUs are designed to execute the same operation on millions of numbers in parallel — exactly what tensor operations do.
  • Calculus fits. Backpropagation reduces to repeated matrix multiplications and element-wise functions. Tensors are the natural type for both.
  • Models fit. A neural network is essentially a stack of tensor operations. The “weights” of a layer are tensors and the data flowing through it is tensors.

Creating tensors

Scalar

main.py
import torch
scalar = torch.tensor(7)
print(scalar) # tensor(7)
print(scalar.ndim) # 0
print(scalar.item()) # 7 (back to Python int)

item() only works on a tensor with a single element. Try calling it on a vector — you’ll get an error.

Vector

main.py
import torch
vector = torch.tensor([7, 7])
print(vector) # tensor([7, 7])
print(vector.ndim) # 1
print(vector.shape) # torch.Size([2])

A quick trick to read dimensionality: count the number of opening square brackets [ on one side. [7, 7] has one — therefore one dimension.

Matrix

main.py
import torch
matrix = torch.tensor([[7, 8], [9, 10]])
print(matrix)
# tensor([[ 7, 8],
# [ 9, 10]])
print(matrix.ndim) # 2
print(matrix.shape) # torch.Size([2, 2])

n-dimensional tensor

main.py
import torch
cube = torch.tensor([
[[7, 8, 7], [9, 10, 6]],
[[3, 4, 2], [1, 3, 2]],
[[6, 4, 7], [3, 6, 2]],
[[3, 6, 4], [6, 3, 1]],
])
print(cube.shape) # torch.Size([4, 2, 3])

Read the shape from the outside in: 4 blocks, each block has 2 rows, each row has 3 elements.

Try it — read shapes

For each tensor, predict the shape before running the code.

main.py
import torch
a = torch.tensor([1, 2, 3, 4])
b = torch.tensor([[1], [2], [3]])
c = torch.tensor([[[1, 2]]])
print(a.shape)
print(b.shape)
print(c.shape)

The three attributes you’ll always check

Every tensor exposes three attributes you will look at constantly while debugging.

AttributeMeaning
shapeSize along each dimension
dtypeData type of the elements (torch.float32, torch.int64, …)
deviceWhere the tensor lives (cpu or cuda:0)
main.py
import torch
x = torch.rand(3, 4)
print("shape :", x.shape)
print("dtype :", x.dtype)
print("device:", x.device)

Most bugs come from mismatches between these:

  • mixing float32 and float64 values in the same operation,
  • mixing tensors on cpu and cuda,
  • expecting a [B, C, H, W] shape and getting [C, H, W].

When something doesn’t work, print these three first.

Casting

main.py
import torch
x = torch.tensor([1, 2, 3])
print(x.dtype) # torch.int64
y = x.float() # cast to float32
print(y.dtype) # torch.float32
z = x.to(torch.float64) # explicit dtype
print(z.dtype) # torch.float64

Factory functions

Models start from random weights, masks need zeros, attention needs ones, ranges need arange. Memorise these; you will use them daily.

main.py
import torch
print(torch.zeros(2, 3)) # all zeros
print(torch.ones(2, 3)) # all ones
print(torch.full((2, 3), 7)) # filled with 7
print(torch.arange(0, 10, 2)) # [0, 2, 4, 6, 8]
print(torch.linspace(0, 1, 5)) # 5 equally spaced points 0..1
print(torch.eye(3)) # 3x3 identity matrix
print(torch.rand(2, 3)) # uniform [0, 1)
print(torch.randn(2, 3)) # normal mean 0 std 1
print(torch.randint(0, 10, (2, 3))) # integers in [0, 10)

Same shape as another tensor:

main.py
import torch
x = torch.rand(2, 3)
print(torch.zeros_like(x).shape) # torch.Size([2, 3])
print(torch.rand_like(x))

*_like functions copy the shape, dtype and device of an existing tensor — handy when you need a buffer.

Reproducibility

main.py
import torch
torch.manual_seed(42)
print(torch.rand(2, 3))
torch.manual_seed(42)
print(torch.rand(2, 3)) # same numbers

Set the seed at the top of every training script so your runs are comparable.

Operations

Element-wise arithmetic

main.py
import torch
x = torch.tensor([1, 2, 3])
print(x + 10) # tensor([11, 12, 13])
print(x * 2) # tensor([2, 4, 6])
print(x ** 2) # tensor([1, 4, 9])
print(torch.exp(x.float()))

In-place variants end with an underscore: x.add_(10) mutates x. Most of the time you should avoid them — non-mutating code is easier to reason about and plays better with autograd.

Broadcasting

When shapes don’t match exactly, PyTorch tries to broadcast one tensor across the other. The rule, applied from the right:

Two dimensions are compatible when they are equal or one of them is 1.

main.py
import torch
a = torch.ones(3, 4) # shape (3, 4)
b = torch.tensor([1, 2, 3, 4]) # shape (4,)
print(a + b) # b is repeated for every row
c = torch.tensor([[10], [20], [30]]) # shape (3, 1)
print(a + c) # c is repeated across columns

If the rule fails, you get RuntimeError: The size of tensor a (...) must match the size of tensor b (...). The fix is almost always unsqueeze, view, or transpose so the shapes line up.

Matrix multiplication

For dot product / matmul use @ or torch.matmul. The inner dimensions must match.

main.py
import torch
a = torch.rand(2, 3)
b = torch.rand(3, 4)
print((a @ b).shape) # torch.Size([2, 4])

Common mistake: passing two (N, M) matrices and expecting it to work. Transpose to align the inner dimensions:

main.py
import torch
a = torch.rand(2, 3)
b = torch.rand(2, 3)
print((a @ b.T).shape) # torch.Size([2, 2])

Aggregation

main.py
import torch
x = torch.arange(0, 100, 10, dtype=torch.float32)
print(x.min(), x.max()) # tensor(0.) tensor(90.)
print(x.mean(), x.sum()) # tensor(45.) tensor(450.)
print(x.argmin(), x.argmax()) # tensor(0) tensor(9)

mean() requires a floating dtype — cast with .float() first if you started with integers.

You can also aggregate along a single axis:

main.py
import torch
x = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(x.sum(dim=0)) # tensor([5., 7., 9.]) sum over rows -> per column
print(x.sum(dim=1)) # tensor([6., 15.]) sum over cols -> per row

A useful mnemonic: dim is the dimension that disappears.

Reshape, view, stack, squeeze, unsqueeze

main.py
import torch
x = torch.arange(1, 10)
print(x)
print(x.reshape(3, 3)) # change shape, same data
print(x.view(1, 9)) # alias of the same data
print(torch.stack([x, x], dim=0)) # add new outer dim
print(x.unsqueeze(dim=0).shape) # torch.Size([1, 9])
print(x.unsqueeze(dim=0).squeeze().shape) # torch.Size([9])
print(x.reshape(3, 3).T) # transpose
print(x.reshape(3, 3).permute(1, 0)) # equivalent for 2-D

view only works on contiguous memory; reshape always works (it copies if needed). When in doubt, use reshape.

Indexing

Tensors index like NumPy arrays:

main.py
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
print(x[0]) # first matrix
print(x[0, 1]) # second row of that matrix
print(x[0, 1, 2]) # the scalar at row 1, col 2
print(x[:, :, 0]) # first column of every matrix
print(x[0, :, ::2]) # every other column of the first matrix

Boolean masks are particularly useful:

main.py
import torch
x = torch.arange(10)
print(x[x > 5]) # tensor([6, 7, 8, 9])
x[x > 5] = 0 # zero out elements above 5
print(x)

Try it — broadcasting

For each pair, predict whether the shapes broadcast and if so, the result shape.

a.shapeb.shapeResult
(3, 1)(1, 4)?
(2, 3)(3,)?
(2, 3)(2,)?
(5, 3, 4)(4,)?

NumPy interop

NumPy and PyTorch share memory layout for many dtypes and convert in O(1).

main.py
import numpy as np
import torch
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array) # numpy -> torch
back = tensor.numpy() # torch -> numpy
print(tensor)
print(back)

Two gotchas:

  • torch.from_numpy keeps the original dtype. NumPy floats are float64; PyTorch defaults to float32. Cast with .float() if the tensor is going into a model.
  • A tensor on the GPU cannot be converted to NumPy directly. Bring it back to CPU first: tensor.cpu().numpy().

Running on a GPU

main.py
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print("device:", device)
x = torch.tensor([1.0, 2.0, 3.0]).to(device)
print(x, x.device)

Two tensors must be on the same device to interact. The error message is loud and clear:

Terminal window
RuntimeError: Expected all tensors to be on the same device,
but found at least two devices, cuda:0 and cpu!

The fix is .to(device) on the offender.

A typical pattern in training code:

device = "cuda" if torch.cuda.is_available() else "cpu"
model = MyModel().to(device)
for x, y in loader:
x, y = x.to(device), y.to(device)
...

Exercises

Work through these in order. Each builds on the previous.

Warm-up

  1. Create a 3×3 tensor of ones, multiply it by 7, and check its dtype.
  2. Create a tensor with values 0, 0.1, 0.2, …, 1.0 (use linspace).
  3. Create a 5×5 identity matrix and confirm x @ x == x for it.
  4. Generate two random tensors of shape (3, 4) with the same seed and verify they are equal element-wise.

Shape gymnastics

  1. Create a random tensor of shape (7, 7). Print its shape, dtype and device.
  2. Multiply it by another random tensor of shape (1, 7) (element-wise, then matmul). What are the resulting shapes?
  3. Take a vector of length 12 and reshape it into (3, 4), then (2, 2, 3). Confirm the elements stay in row-major order.
  4. Given x = torch.arange(20).reshape(4, 5), extract the second row, the last column, and the bottom-right 2×2 block.

Aggregations

  1. For torch.arange(1, 101).float(), compute the mean, std, min, max, and the index of the maximum.
  2. Create a (3, 4) random tensor and compute the mean per row and per column. Use dim correctly.

Broadcasting

  1. Subtract the per-column mean from every row of a (10, 5) random tensor, so each column has mean ~0.
  2. Build a 5×5 multiplication table using broadcasting only — no Python loops.

GPU (optional, only if cuda is available)

  1. Move a random tensor to GPU, perform tensor + tensor, then move the result back to CPU and convert to a NumPy array.

Capstone — image manipulation with pure tensor ops

Now apply everything to a real image. We will:

  1. load an image and convert it to a tensor,
  2. inspect and reshape it,
  3. apply a few classic filters using only tensor operations,
  4. save the results.

Load an image as a tensor

torchvision reads images for us. The result is a [C, H, W] tensor of uint8 values in [0, 255].

capstone.py
import torch
import matplotlib.pyplot as plt
from torchvision.io import read_image
# Any small JPG/PNG works. You can use a photo of your own.
image = read_image("cat.jpg")
print(image.shape, image.dtype) # e.g. torch.Size([3, 300, 400]) torch.uint8
# matplotlib expects [H, W, C], so permute the axes.
plt.imshow(image.permute(1, 2, 0))
plt.axis("off")
plt.show()

If you don’t have an image handy:

import requests
from pathlib import Path
if not Path("cat.jpg").exists():
url = "https://raw.githubusercontent.com/pytorch/hub/master/images/dog.jpg"
Path("cat.jpg").write_bytes(requests.get(url).content)

Step 1 — to grayscale

A common grayscale formula uses the weighted average of the RGB channels:

gray = 0.299 R + 0.587 G + 0.114 B

Build it with broadcasting and a sum:

capstone.py
weights = torch.tensor([0.299, 0.587, 0.114]).view(3, 1, 1)
gray = (image.float() * weights).sum(dim=0)
print(gray.shape) # torch.Size([H, W])
plt.imshow(gray, cmap="gray")
plt.axis("off")
plt.show()

Notice the shapes: (3, 1, 1) * (3, H, W) broadcasts to (3, H, W), then sum(dim=0) collapses the channel dimension, leaving (H, W).

Step 2 — adjust brightness

Brightness is just adding a constant. Clamp the result to [0, 255] so it stays a valid image.

capstone.py
bright = (image.float() + 50).clamp(0, 255).to(torch.uint8)
plt.imshow(bright.permute(1, 2, 0))
plt.axis("off")
plt.show()

Step 3 — flip horizontally and crop

Flipping is flip on the width axis. Cropping is slicing.

capstone.py
flipped = image.flip(dims=[2])
plt.imshow(flipped.permute(1, 2, 0))
plt.show()
h, w = image.shape[1], image.shape[2]
crop = image[:, h // 4 : 3 * h // 4, w // 4 : 3 * w // 4]
plt.imshow(crop.permute(1, 2, 0))
plt.show()

Step 4 — a 3×3 mean blur

A blur replaces every pixel with the average of its 3×3 neighborhood. We can implement that with a single torch.nn.functional.conv2d call. The kernel is a (out_channels, in_channels, kH, kW) tensor of 1/9 values.

capstone.py
import torch.nn.functional as F
kernel = torch.ones(1, 1, 3, 3) / 9.0
# conv2d expects [B, C, H, W] of floats; one channel at a time.
def blur(channel: torch.Tensor) -> torch.Tensor:
x = channel.float().unsqueeze(0).unsqueeze(0) # [1, 1, H, W]
out = F.conv2d(x, kernel, padding=1)
return out.squeeze().clamp(0, 255).to(torch.uint8)
blurred = torch.stack([blur(image[c]) for c in range(image.shape[0])], dim=0)
plt.imshow(blurred.permute(1, 2, 0))
plt.axis("off")
plt.show()

The trick is the shape juggling, not the math. Practice reading the comments — every line is a tensor reshape.

Step 5 — save the result

capstone.py
from torchvision.io import write_jpeg
write_jpeg(blurred, "blurred.jpg")
write_jpeg(bright, "bright.jpg")
write_jpeg(gray.to(torch.uint8).unsqueeze(0).repeat(3, 1, 1), "gray.jpg")

write_jpeg expects a [3, H, W] tensor, so the grayscale needs an unsqueeze + repeat to become a 3-channel image again.

Going further

If you finish early, try one of these:

  • Implement edge detection with a Sobel kernel
    sobel_x = torch.tensor([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]).float()
    sobel_y = sobel_x.T
    apply each, square, sum, take the square root.
  • Reduce the resolution by 2× using torch.nn.functional.avg_pool2d.
  • Solarize the image — invert pixels above a threshold: inverted = torch.where(image > 128, 255 - image, image).

Recap

  • A tensor is a multi-dimensional array. Its shape, dtype and device are the three things you debug with.
  • Factory functions (zeros, ones, rand, arange, linspace) cover most needs.
  • Broadcasting and matmul are the two operations that do most of the heavy lifting.
  • Reshape (view, reshape, unsqueeze, squeeze, permute) is what you’ll spend the most time on. When code doesn’t work, print shapes.
  • NumPy and tensors convert in O(1); GPU tensors must come back to CPU before NumPy.
  • Image processing is just tensor algebra. Convolution is one matmul under the hood.

The next chapter, Workflow, uses everything from this chapter to train an actual model.

References