Tensor

Tensors are the fundamental building block of machine learning in PyTorch — learn what they are, how to manipulate them, and how to use them on a GPU.

On this page

Goal of the lesson
Suggested timing
Setup
What is a tensor?
Creating tensors
The three attributes you’ll always check
Factory functions
Operations
NumPy interop
Running on a GPU
Exercises
Capstone — image manipulation with pure tensor ops
Recap
References

Goal of the lesson

By the end of this 3-hour session you should be able to:

explain what a tensor is and why machine learning frameworks are built around it,
create tensors from Python data, NumPy arrays, and built-in factories,
inspect and reshape tensors confidently,
write small numerical programs using broadcasting and matrix multiplication,
move computation to a GPU,
read and write [C, H, W] image tensors and apply simple filters to them.

The tensor is the only data structure deep learning really has. Every model input, every weight, every gradient, every output is a tensor. Spending three hours getting comfortable with them pays off in every chapter that follows.

Suggested timing

Block	Topic
30 min	Setup, what a tensor is, scalar/vector/matrix/n-dim
30 min	Attributes (`shape`, `dtype`, `device`), factory functions
45 min	Operations: arithmetic, broadcasting, matmul, reshape, indexing
30 min	NumPy interop and GPU
45 min	Capstone — image manipulation with pure tensor ops

Setup

This series targets Windows with uv as the Python project manager.

If you don’t have uv yet, install it from PowerShell:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Create the project:

uv init --python 3.12 tensor
cd tensor
uv add torch torchvision matplotlib pillow numpy

Note

On Windows the default torch and torchvision wheels from PyPI are CPU-only. That is fine for everything in this chapter.

If you have an NVIDIA GPU and want CUDA, see Using uv with PyTorch and the AI - CUDA page. You will need to point uv at the PyTorch CUDA wheel index.

Check the install:

import torch

print("torch:", torch.__version__)
print("cuda available:", torch.cuda.is_available())

uv run main.py

You should see the version number and cuda available: False (or True if you set up CUDA).

What is a tensor?

A tensor is a multi-dimensional container for numbers. You already know its low-dimensional cousins:

Math name	Tensor name	Example
Number	scalar (0-D)	`7`
List of numbers	vector (1-D)	`[1, 2, 3]`
Table	matrix (2-D)	`[[1, 2], [3, 4]]`
Cube of numbers	3-D tensor	RGB image, time-series of frames
…	n-D tensor	Mini-batch of RGB images: `[batch, channels, height, width]`

Why does deep learning need them?

Hardware fits. GPUs are designed to execute the same operation on millions of numbers in parallel — exactly what tensor operations do.
Calculus fits. Backpropagation reduces to repeated matrix multiplications and element-wise functions. Tensors are the natural type for both.
Models fit. A neural network is essentially a stack of tensor operations. The “weights” of a layer are tensors and the data flowing through it is tensors.

Creating tensors

Scalar

import torch

scalar = torch.tensor(7)
print(scalar)            # tensor(7)
print(scalar.ndim)       # 0
print(scalar.item())     # 7  (back to Python int)

item() only works on a tensor with a single element. Try calling it on a vector — you’ll get an error.

Vector

import torch

vector = torch.tensor([7, 7])
print(vector)            # tensor([7, 7])
print(vector.ndim)       # 1
print(vector.shape)      # torch.Size([2])

A quick trick to read dimensionality: count the number of opening square brackets [ on one side. [7, 7] has one — therefore one dimension.

Matrix

import torch

matrix = torch.tensor([[7, 8], [9, 10]])
print(matrix)
# tensor([[ 7,  8],
#         [ 9, 10]])
print(matrix.ndim)       # 2
print(matrix.shape)      # torch.Size([2, 2])

n-dimensional tensor

import torch

cube = torch.tensor([
    [[7, 8, 7], [9, 10, 6]],
    [[3, 4, 2], [1, 3, 2]],
    [[6, 4, 7], [3, 6, 2]],
    [[3, 6, 4], [6, 3, 1]],
])
print(cube.shape)        # torch.Size([4, 2, 3])

Read the shape from the outside in: 4 blocks, each block has 2 rows, each row has 3 elements.

Try it — read shapes

For each tensor, predict the shape before running the code.

import torch

a = torch.tensor([1, 2, 3, 4])
b = torch.tensor([[1], [2], [3]])
c = torch.tensor([[[1, 2]]])

print(a.shape)
print(b.shape)
print(c.shape)

The three attributes you’ll always check

Every tensor exposes three attributes you will look at constantly while debugging.

Attribute	Meaning
`shape`	Size along each dimension
`dtype`	Data type of the elements (`torch.float32`, `torch.int64`, …)
`device`	Where the tensor lives (`cpu` or `cuda:0`)

import torch

x = torch.rand(3, 4)
print("shape :", x.shape)
print("dtype :", x.dtype)
print("device:", x.device)

Most bugs come from mismatches between these:

mixing float32 and float64 values in the same operation,
mixing tensors on cpu and cuda,
expecting a [B, C, H, W] shape and getting [C, H, W].

When something doesn’t work, print these three first.

Casting

import torch

x = torch.tensor([1, 2, 3])
print(x.dtype)               # torch.int64

y = x.float()                # cast to float32
print(y.dtype)               # torch.float32

z = x.to(torch.float64)      # explicit dtype
print(z.dtype)               # torch.float64

Factory functions

Models start from random weights, masks need zeros, attention needs ones, ranges need arange. Memorise these; you will use them daily.

import torch

print(torch.zeros(2, 3))                # all zeros
print(torch.ones(2, 3))                 # all ones
print(torch.full((2, 3), 7))            # filled with 7
print(torch.arange(0, 10, 2))           # [0, 2, 4, 6, 8]
print(torch.linspace(0, 1, 5))          # 5 equally spaced points 0..1
print(torch.eye(3))                     # 3x3 identity matrix
print(torch.rand(2, 3))                 # uniform [0, 1)
print(torch.randn(2, 3))                # normal mean 0 std 1
print(torch.randint(0, 10, (2, 3)))     # integers in [0, 10)

Same shape as another tensor:

import torch

x = torch.rand(2, 3)
print(torch.zeros_like(x).shape)        # torch.Size([2, 3])
print(torch.rand_like(x))

*_like functions copy the shape, dtype and device of an existing tensor — handy when you need a buffer.

Reproducibility

import torch

torch.manual_seed(42)
print(torch.rand(2, 3))

torch.manual_seed(42)
print(torch.rand(2, 3))   # same numbers

Set the seed at the top of every training script so your runs are comparable.

Operations

Element-wise arithmetic

import torch

x = torch.tensor([1, 2, 3])
print(x + 10)        # tensor([11, 12, 13])
print(x * 2)         # tensor([2, 4, 6])
print(x ** 2)        # tensor([1, 4, 9])
print(torch.exp(x.float()))

In-place variants end with an underscore: x.add_(10) mutates x. Most of the time you should avoid them — non-mutating code is easier to reason about and plays better with autograd.

Broadcasting

When shapes don’t match exactly, PyTorch tries to broadcast one tensor across the other. The rule, applied from the right:

Two dimensions are compatible when they are equal or one of them is 1.

import torch

a = torch.ones(3, 4)            # shape (3, 4)
b = torch.tensor([1, 2, 3, 4])  # shape (4,)
print(a + b)                    # b is repeated for every row

c = torch.tensor([[10], [20], [30]])   # shape (3, 1)
print(a + c)                            # c is repeated across columns

If the rule fails, you get RuntimeError: The size of tensor a (...) must match the size of tensor b (...). The fix is almost always unsqueeze, view, or transpose so the shapes line up.

Matrix multiplication

For dot product / matmul use @ or torch.matmul. The inner dimensions must match.

import torch

a = torch.rand(2, 3)
b = torch.rand(3, 4)
print((a @ b).shape)        # torch.Size([2, 4])

Common mistake: passing two (N, M) matrices and expecting it to work. Transpose to align the inner dimensions:

import torch

a = torch.rand(2, 3)
b = torch.rand(2, 3)
print((a @ b.T).shape)      # torch.Size([2, 2])

Aggregation

import torch

x = torch.arange(0, 100, 10, dtype=torch.float32)
print(x.min(), x.max())                  # tensor(0.) tensor(90.)
print(x.mean(), x.sum())                 # tensor(45.) tensor(450.)
print(x.argmin(), x.argmax())            # tensor(0) tensor(9)

mean() requires a floating dtype — cast with .float() first if you started with integers.

You can also aggregate along a single axis:

import torch

x = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(x.sum(dim=0))      # tensor([5., 7., 9.])   sum over rows -> per column
print(x.sum(dim=1))      # tensor([6., 15.])      sum over cols -> per row

A useful mnemonic: dim is the dimension that disappears.

Reshape, view, stack, squeeze, unsqueeze

import torch

x = torch.arange(1, 10)
print(x)
print(x.reshape(3, 3))                # change shape, same data
print(x.view(1, 9))                   # alias of the same data
print(torch.stack([x, x], dim=0))     # add new outer dim
print(x.unsqueeze(dim=0).shape)       # torch.Size([1, 9])
print(x.unsqueeze(dim=0).squeeze().shape)  # torch.Size([9])

print(x.reshape(3, 3).T)              # transpose
print(x.reshape(3, 3).permute(1, 0))  # equivalent for 2-D

view only works on contiguous memory; reshape always works (it copies if needed). When in doubt, use reshape.

Indexing

Tensors index like NumPy arrays:

import torch

x = torch.arange(1, 10).reshape(1, 3, 3)
print(x[0])           # first matrix
print(x[0, 1])        # second row of that matrix
print(x[0, 1, 2])     # the scalar at row 1, col 2
print(x[:, :, 0])     # first column of every matrix
print(x[0, :, ::2])   # every other column of the first matrix

Boolean masks are particularly useful:

import torch

x = torch.arange(10)
print(x[x > 5])       # tensor([6, 7, 8, 9])
x[x > 5] = 0          # zero out elements above 5
print(x)

Try it — broadcasting

For each pair, predict whether the shapes broadcast and if so, the result shape.

`a.shape`	`b.shape`	Result
`(3, 1)`	`(1, 4)`	?
`(2, 3)`	`(3,)`	?
`(2, 3)`	`(2,)`	?
`(5, 3, 4)`	`(4,)`	?

NumPy interop

NumPy and PyTorch share memory layout for many dtypes and convert in O(1).

import numpy as np
import torch

array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)        # numpy -> torch
back = tensor.numpy()                   # torch -> numpy

print(tensor)
print(back)

Two gotchas:

torch.from_numpy keeps the original dtype. NumPy floats are float64; PyTorch defaults to float32. Cast with .float() if the tensor is going into a model.
A tensor on the GPU cannot be converted to NumPy directly. Bring it back to CPU first: tensor.cpu().numpy().

Running on a GPU

import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
print("device:", device)

x = torch.tensor([1.0, 2.0, 3.0]).to(device)
print(x, x.device)

Two tensors must be on the same device to interact. The error message is loud and clear:

RuntimeError: Expected all tensors to be on the same device,
but found at least two devices, cuda:0 and cpu!

The fix is .to(device) on the offender.

A typical pattern in training code:

device = "cuda" if torch.cuda.is_available() else "cpu"
model = MyModel().to(device)

for x, y in loader:
    x, y = x.to(device), y.to(device)
    ...

Exercises

Work through these in order. Each builds on the previous.

Warm-up

Create a 3×3 tensor of ones, multiply it by 7, and check its dtype.
Create a tensor with values 0, 0.1, 0.2, …, 1.0 (use linspace).
Create a 5×5 identity matrix and confirm x @ x == x for it.
Generate two random tensors of shape (3, 4) with the same seed and verify they are equal element-wise.

Shape gymnastics

Create a random tensor of shape (7, 7). Print its shape, dtype and device.
Multiply it by another random tensor of shape (1, 7) (element-wise, then matmul). What are the resulting shapes?
Take a vector of length 12 and reshape it into (3, 4), then (2, 2, 3). Confirm the elements stay in row-major order.
Given x = torch.arange(20).reshape(4, 5), extract the second row, the last column, and the bottom-right 2×2 block.

Aggregations

For torch.arange(1, 101).float(), compute the mean, std, min, max, and the index of the maximum.
Create a (3, 4) random tensor and compute the mean per row and per column. Use dim correctly.

Broadcasting

Subtract the per-column mean from every row of a (10, 5) random tensor, so each column has mean ~0.
Build a 5×5 multiplication table using broadcasting only — no Python loops.

GPU (optional, only if `cuda` is available)

Move a random tensor to GPU, perform tensor + tensor, then move the result back to CPU and convert to a NumPy array.

For exercise 11:

import torch

x = torch.rand(10, 5)
mean = x.mean(dim=0, keepdim=True)    # shape (1, 5)
centered = x - mean
print(centered.mean(dim=0))           # close to zero

For exercise 12:

import torch

a = torch.arange(1, 6).unsqueeze(0)   # shape (1, 5)
b = torch.arange(1, 6).unsqueeze(1)   # shape (5, 1)
print(a * b)                          # shape (5, 5) multiplication table

Capstone — image manipulation with pure tensor ops

Now apply everything to a real image. We will:

load an image and convert it to a tensor,
inspect and reshape it,
apply a few classic filters using only tensor operations,
save the results.

Load an image as a tensor

torchvision reads images for us. The result is a [C, H, W] tensor of uint8 values in [0, 255].

import torch
import matplotlib.pyplot as plt
from torchvision.io import read_image

# Any small JPG/PNG works. You can use a photo of your own.
image = read_image("cat.jpg")
print(image.shape, image.dtype)         # e.g. torch.Size([3, 300, 400]) torch.uint8

# matplotlib expects [H, W, C], so permute the axes.
plt.imshow(image.permute(1, 2, 0))
plt.axis("off")
plt.show()

If you don’t have an image handy:

import requests
from pathlib import Path

if not Path("cat.jpg").exists():
    url = "https://raw.githubusercontent.com/pytorch/hub/master/images/dog.jpg"
    Path("cat.jpg").write_bytes(requests.get(url).content)

Step 1 — to grayscale

A common grayscale formula uses the weighted average of the RGB channels:

gray = 0.299 R + 0.587 G + 0.114 B

Build it with broadcasting and a sum:

weights = torch.tensor([0.299, 0.587, 0.114]).view(3, 1, 1)
gray = (image.float() * weights).sum(dim=0)
print(gray.shape)           # torch.Size([H, W])

plt.imshow(gray, cmap="gray")
plt.axis("off")
plt.show()

Notice the shapes: (3, 1, 1) * (3, H, W) broadcasts to (3, H, W), then sum(dim=0) collapses the channel dimension, leaving (H, W).

Step 2 — adjust brightness

Brightness is just adding a constant. Clamp the result to [0, 255] so it stays a valid image.

bright = (image.float() + 50).clamp(0, 255).to(torch.uint8)
plt.imshow(bright.permute(1, 2, 0))
plt.axis("off")
plt.show()

Step 3 — flip horizontally and crop

Flipping is flip on the width axis. Cropping is slicing.

flipped = image.flip(dims=[2])
plt.imshow(flipped.permute(1, 2, 0))
plt.show()

h, w = image.shape[1], image.shape[2]
crop = image[:, h // 4 : 3 * h // 4, w // 4 : 3 * w // 4]
plt.imshow(crop.permute(1, 2, 0))
plt.show()

Step 4 — a 3×3 mean blur

A blur replaces every pixel with the average of its 3×3 neighborhood. We can implement that with a single torch.nn.functional.conv2d call. The kernel is a (out_channels, in_channels, kH, kW) tensor of 1/9 values.

import torch.nn.functional as F

kernel = torch.ones(1, 1, 3, 3) / 9.0

# conv2d expects [B, C, H, W] of floats; one channel at a time.
def blur(channel: torch.Tensor) -> torch.Tensor:
    x = channel.float().unsqueeze(0).unsqueeze(0)        # [1, 1, H, W]
    out = F.conv2d(x, kernel, padding=1)
    return out.squeeze().clamp(0, 255).to(torch.uint8)

blurred = torch.stack([blur(image[c]) for c in range(image.shape[0])], dim=0)

plt.imshow(blurred.permute(1, 2, 0))
plt.axis("off")
plt.show()

The trick is the shape juggling, not the math. Practice reading the comments — every line is a tensor reshape.

Step 5 — save the result

from torchvision.io import write_jpeg

write_jpeg(blurred, "blurred.jpg")
write_jpeg(bright, "bright.jpg")
write_jpeg(gray.to(torch.uint8).unsqueeze(0).repeat(3, 1, 1), "gray.jpg")

write_jpeg expects a [3, H, W] tensor, so the grayscale needs an unsqueeze + repeat to become a 3-channel image again.

Going further

If you finish early, try one of these:

Implement edge detection with a Sobel kernel

sobel_x = torch.tensor([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]).float()
sobel_y = sobel_x.T

apply each, square, sum, take the square root.

Reduce the resolution by 2× using torch.nn.functional.avg_pool2d.
Solarize the image — invert pixels above a threshold: inverted = torch.where(image > 128, 255 - image, image).

Recap

A tensor is a multi-dimensional array. Its shape, dtype and device are the three things you debug with.
Factory functions (zeros, ones, rand, arange, linspace) cover most needs.
Broadcasting and matmul are the two operations that do most of the heavy lifting.
Reshape (view, reshape, unsqueeze, squeeze, permute) is what you’ll spend the most time on. When code doesn’t work, print shapes.
NumPy and tensors convert in O(1); GPU tensors must come back to CPU before NumPy.
Image processing is just tensor algebra. Convolution is one matmul under the hood.

The next chapter, Workflow, uses everything from this chapter to train an actual model.

PyTorch - Tensor

Goal of the lesson

Suggested timing

Setup

What is a tensor?

Creating tensors

Scalar

Vector

Matrix

n-dimensional tensor

Try it — read shapes

The three attributes you’ll always check

Casting

Factory functions

Reproducibility

Operations

Element-wise arithmetic

Broadcasting

Matrix multiplication

Aggregation

Reshape, view, stack, squeeze, unsqueeze

Indexing

Try it — broadcasting

NumPy interop

Running on a GPU

Exercises

Warm-up

Shape gymnastics

Aggregations

Broadcasting

GPU (optional, only if cuda is available)

Capstone — image manipulation with pure tensor ops

Load an image as a tensor

Step 1 — to grayscale

Step 2 — adjust brightness

Step 3 — flip horizontally and crop

Step 4 — a 3×3 mean blur

Step 5 — save the result

Going further

Recap

References

GPU (optional, only if `cuda` is available)