Tensors are the fundamental building block of machine learning in PyTorch — learn what they are, how to manipulate them, and how to use them on a GPU.
On this page
Goal of the lesson
By the end of this 3-hour session you should be able to:
- explain what a tensor is and why machine learning frameworks are built around it,
- create tensors from Python data, NumPy arrays, and built-in factories,
- inspect and reshape tensors confidently,
- write small numerical programs using broadcasting and matrix multiplication,
- move computation to a GPU,
- read and write
[C, H, W]image tensors and apply simple filters to them.
The tensor is the only data structure deep learning really has. Every model input, every weight, every gradient, every output is a tensor. Spending three hours getting comfortable with them pays off in every chapter that follows.
Suggested timing
| Block | Topic |
|---|---|
| 30 min | Setup, what a tensor is, scalar/vector/matrix/n-dim |
| 30 min | Attributes (shape, dtype, device), factory functions |
| 45 min | Operations: arithmetic, broadcasting, matmul, reshape, indexing |
| 30 min | NumPy interop and GPU |
| 45 min | Capstone — image manipulation with pure tensor ops |
Setup
This series targets Windows with uv as the Python project manager.
If you don’t have uv yet, install it from PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Create the project:
uv init --python 3.12 tensorcd tensoruv add torch torchvision matplotlib pillow numpyOn Windows the default torch and torchvision wheels from PyPI are CPU-only. That is fine for everything in this chapter.
If you have an NVIDIA GPU and want CUDA, see Using uv with PyTorch and the AI - CUDA page. You will need to point uv at the PyTorch CUDA wheel index.
Check the install:
import torch
print("torch:", torch.__version__)print("cuda available:", torch.cuda.is_available())uv run main.pyYou should see the version number and cuda available: False (or True if you set up CUDA).
What is a tensor?
A tensor is a multi-dimensional container for numbers. You already know its low-dimensional cousins:
| Math name | Tensor name | Example |
|---|---|---|
| Number | scalar (0-D) | 7 |
| List of numbers | vector (1-D) | [1, 2, 3] |
| Table | matrix (2-D) | [[1, 2], [3, 4]] |
| Cube of numbers | 3-D tensor | RGB image, time-series of frames |
| … | n-D tensor | Mini-batch of RGB images: [batch, channels, height, width] |
Why does deep learning need them?
- Hardware fits. GPUs are designed to execute the same operation on millions of numbers in parallel — exactly what tensor operations do.
- Calculus fits. Backpropagation reduces to repeated matrix multiplications and element-wise functions. Tensors are the natural type for both.
- Models fit. A neural network is essentially a stack of tensor operations. The “weights” of a layer are tensors and the data flowing through it is tensors.
Creating tensors
Scalar
import torch
scalar = torch.tensor(7)print(scalar) # tensor(7)print(scalar.ndim) # 0print(scalar.item()) # 7 (back to Python int)item() only works on a tensor with a single element. Try calling it on a vector — you’ll get an error.
Vector
import torch
vector = torch.tensor([7, 7])print(vector) # tensor([7, 7])print(vector.ndim) # 1print(vector.shape) # torch.Size([2])A quick trick to read dimensionality: count the number of opening square brackets [ on one side. [7, 7] has one — therefore one dimension.
Matrix
import torch
matrix = torch.tensor([[7, 8], [9, 10]])print(matrix)# tensor([[ 7, 8],# [ 9, 10]])print(matrix.ndim) # 2print(matrix.shape) # torch.Size([2, 2])n-dimensional tensor
import torch
cube = torch.tensor([ [[7, 8, 7], [9, 10, 6]], [[3, 4, 2], [1, 3, 2]], [[6, 4, 7], [3, 6, 2]], [[3, 6, 4], [6, 3, 1]],])print(cube.shape) # torch.Size([4, 2, 3])Read the shape from the outside in: 4 blocks, each block has 2 rows, each row has 3 elements.
Try it — read shapes
For each tensor, predict the shape before running the code.
import torch
a = torch.tensor([1, 2, 3, 4])b = torch.tensor([[1], [2], [3]])c = torch.tensor([[[1, 2]]])
print(a.shape)print(b.shape)print(c.shape)torch.Size([4])torch.Size([3, 1])torch.Size([1, 1, 2])The three attributes you’ll always check
Every tensor exposes three attributes you will look at constantly while debugging.
| Attribute | Meaning |
|---|---|
shape | Size along each dimension |
dtype | Data type of the elements (torch.float32, torch.int64, …) |
device | Where the tensor lives (cpu or cuda:0) |
import torch
x = torch.rand(3, 4)print("shape :", x.shape)print("dtype :", x.dtype)print("device:", x.device)Most bugs come from mismatches between these:
- mixing
float32andfloat64values in the same operation, - mixing tensors on
cpuandcuda, - expecting a
[B, C, H, W]shape and getting[C, H, W].
When something doesn’t work, print these three first.
Casting
import torch
x = torch.tensor([1, 2, 3])print(x.dtype) # torch.int64
y = x.float() # cast to float32print(y.dtype) # torch.float32
z = x.to(torch.float64) # explicit dtypeprint(z.dtype) # torch.float64Factory functions
Models start from random weights, masks need zeros, attention needs ones, ranges need arange. Memorise these; you will use them daily.
import torch
print(torch.zeros(2, 3)) # all zerosprint(torch.ones(2, 3)) # all onesprint(torch.full((2, 3), 7)) # filled with 7print(torch.arange(0, 10, 2)) # [0, 2, 4, 6, 8]print(torch.linspace(0, 1, 5)) # 5 equally spaced points 0..1print(torch.eye(3)) # 3x3 identity matrixprint(torch.rand(2, 3)) # uniform [0, 1)print(torch.randn(2, 3)) # normal mean 0 std 1print(torch.randint(0, 10, (2, 3))) # integers in [0, 10)Same shape as another tensor:
import torch
x = torch.rand(2, 3)print(torch.zeros_like(x).shape) # torch.Size([2, 3])print(torch.rand_like(x))*_like functions copy the shape, dtype and device of an existing tensor — handy when you need a buffer.
Reproducibility
import torch
torch.manual_seed(42)print(torch.rand(2, 3))
torch.manual_seed(42)print(torch.rand(2, 3)) # same numbersSet the seed at the top of every training script so your runs are comparable.
Operations
Element-wise arithmetic
import torch
x = torch.tensor([1, 2, 3])print(x + 10) # tensor([11, 12, 13])print(x * 2) # tensor([2, 4, 6])print(x ** 2) # tensor([1, 4, 9])print(torch.exp(x.float()))In-place variants end with an underscore: x.add_(10) mutates x. Most of the time you should avoid them — non-mutating code is easier to reason about and plays better with autograd.
Broadcasting
When shapes don’t match exactly, PyTorch tries to broadcast one tensor across the other. The rule, applied from the right:
Two dimensions are compatible when they are equal or one of them is
1.
import torch
a = torch.ones(3, 4) # shape (3, 4)b = torch.tensor([1, 2, 3, 4]) # shape (4,)print(a + b) # b is repeated for every row
c = torch.tensor([[10], [20], [30]]) # shape (3, 1)print(a + c) # c is repeated across columnsIf the rule fails, you get RuntimeError: The size of tensor a (...) must match the size of tensor b (...). The fix is almost always unsqueeze, view, or transpose so the shapes line up.
Matrix multiplication
For dot product / matmul use @ or torch.matmul. The inner dimensions must match.
import torch
a = torch.rand(2, 3)b = torch.rand(3, 4)print((a @ b).shape) # torch.Size([2, 4])Common mistake: passing two (N, M) matrices and expecting it to work. Transpose to align the inner dimensions:
import torch
a = torch.rand(2, 3)b = torch.rand(2, 3)print((a @ b.T).shape) # torch.Size([2, 2])Aggregation
import torch
x = torch.arange(0, 100, 10, dtype=torch.float32)print(x.min(), x.max()) # tensor(0.) tensor(90.)print(x.mean(), x.sum()) # tensor(45.) tensor(450.)print(x.argmin(), x.argmax()) # tensor(0) tensor(9)mean() requires a floating dtype — cast with .float() first if you started with integers.
You can also aggregate along a single axis:
import torch
x = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])print(x.sum(dim=0)) # tensor([5., 7., 9.]) sum over rows -> per columnprint(x.sum(dim=1)) # tensor([6., 15.]) sum over cols -> per rowA useful mnemonic: dim is the dimension that disappears.
Reshape, view, stack, squeeze, unsqueeze
import torch
x = torch.arange(1, 10)print(x)print(x.reshape(3, 3)) # change shape, same dataprint(x.view(1, 9)) # alias of the same dataprint(torch.stack([x, x], dim=0)) # add new outer dimprint(x.unsqueeze(dim=0).shape) # torch.Size([1, 9])print(x.unsqueeze(dim=0).squeeze().shape) # torch.Size([9])
print(x.reshape(3, 3).T) # transposeprint(x.reshape(3, 3).permute(1, 0)) # equivalent for 2-Dview only works on contiguous memory; reshape always works (it copies if needed). When in doubt, use reshape.
Indexing
Tensors index like NumPy arrays:
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)print(x[0]) # first matrixprint(x[0, 1]) # second row of that matrixprint(x[0, 1, 2]) # the scalar at row 1, col 2print(x[:, :, 0]) # first column of every matrixprint(x[0, :, ::2]) # every other column of the first matrixBoolean masks are particularly useful:
import torch
x = torch.arange(10)print(x[x > 5]) # tensor([6, 7, 8, 9])x[x > 5] = 0 # zero out elements above 5print(x)Try it — broadcasting
For each pair, predict whether the shapes broadcast and if so, the result shape.
a.shape | b.shape | Result |
|---|---|---|
(3, 1) | (1, 4) | ? |
(2, 3) | (3,) | ? |
(2, 3) | (2,) | ? |
(5, 3, 4) | (4,) | ? |
(3, 4)— both axes broadcast (1 paired with the other dimension).(2, 3)—(3,)aligns with the last dimension of(2, 3).- Fails —
(2,)aligns with3, neither is1. Useunsqueeze(-1)to makebshape(2, 1)so it broadcasts to(2, 3). (5, 3, 4)—(4,)aligns with the last axis.
NumPy interop
NumPy and PyTorch share memory layout for many dtypes and convert in O(1).
import numpy as npimport torch
array = np.arange(1.0, 8.0)tensor = torch.from_numpy(array) # numpy -> torchback = tensor.numpy() # torch -> numpy
print(tensor)print(back)Two gotchas:
torch.from_numpykeeps the original dtype. NumPy floats arefloat64; PyTorch defaults tofloat32. Cast with.float()if the tensor is going into a model.- A tensor on the GPU cannot be converted to NumPy directly. Bring it back to CPU first:
tensor.cpu().numpy().
Running on a GPU
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"print("device:", device)
x = torch.tensor([1.0, 2.0, 3.0]).to(device)print(x, x.device)Two tensors must be on the same device to interact. The error message is loud and clear:
RuntimeError: Expected all tensors to be on the same device,but found at least two devices, cuda:0 and cpu!The fix is .to(device) on the offender.
A typical pattern in training code:
device = "cuda" if torch.cuda.is_available() else "cpu"model = MyModel().to(device)
for x, y in loader: x, y = x.to(device), y.to(device) ...Exercises
Work through these in order. Each builds on the previous.
Warm-up
- Create a 3×3 tensor of ones, multiply it by 7, and check its dtype.
- Create a tensor with values
0, 0.1, 0.2, …, 1.0(uselinspace). - Create a 5×5 identity matrix and confirm
x @ x == xfor it. - Generate two random tensors of shape
(3, 4)with the same seed and verify they are equal element-wise.
Shape gymnastics
- Create a random tensor of shape
(7, 7). Print its shape, dtype and device. - Multiply it by another random tensor of shape
(1, 7)(element-wise, then matmul). What are the resulting shapes? - Take a vector of length 12 and reshape it into
(3, 4), then(2, 2, 3). Confirm the elements stay in row-major order. - Given
x = torch.arange(20).reshape(4, 5), extract the second row, the last column, and the bottom-right 2×2 block.
Aggregations
- For
torch.arange(1, 101).float(), compute the mean, std, min, max, and the index of the maximum. - Create a
(3, 4)random tensor and compute the mean per row and per column. Usedimcorrectly.
Broadcasting
- Subtract the per-column mean from every row of a
(10, 5)random tensor, so each column has mean ~0. - Build a 5×5 multiplication table using broadcasting only — no Python loops.
GPU (optional, only if cuda is available)
- Move a random tensor to GPU, perform
tensor + tensor, then move the result back to CPU and convert to a NumPy array.
For exercise 11:
import torch
x = torch.rand(10, 5)mean = x.mean(dim=0, keepdim=True) # shape (1, 5)centered = x - meanprint(centered.mean(dim=0)) # close to zeroFor exercise 12:
import torch
a = torch.arange(1, 6).unsqueeze(0) # shape (1, 5)b = torch.arange(1, 6).unsqueeze(1) # shape (5, 1)print(a * b) # shape (5, 5) multiplication tableCapstone — image manipulation with pure tensor ops
Now apply everything to a real image. We will:
- load an image and convert it to a tensor,
- inspect and reshape it,
- apply a few classic filters using only tensor operations,
- save the results.
Load an image as a tensor
torchvision reads images for us. The result is a [C, H, W] tensor of uint8 values in [0, 255].
import torchimport matplotlib.pyplot as pltfrom torchvision.io import read_image
# Any small JPG/PNG works. You can use a photo of your own.image = read_image("cat.jpg")print(image.shape, image.dtype) # e.g. torch.Size([3, 300, 400]) torch.uint8
# matplotlib expects [H, W, C], so permute the axes.plt.imshow(image.permute(1, 2, 0))plt.axis("off")plt.show()If you don’t have an image handy:
import requestsfrom pathlib import Path
if not Path("cat.jpg").exists(): url = "https://raw.githubusercontent.com/pytorch/hub/master/images/dog.jpg" Path("cat.jpg").write_bytes(requests.get(url).content)Step 1 — to grayscale
A common grayscale formula uses the weighted average of the RGB channels:
gray = 0.299 R + 0.587 G + 0.114 BBuild it with broadcasting and a sum:
weights = torch.tensor([0.299, 0.587, 0.114]).view(3, 1, 1)gray = (image.float() * weights).sum(dim=0)print(gray.shape) # torch.Size([H, W])
plt.imshow(gray, cmap="gray")plt.axis("off")plt.show()Notice the shapes: (3, 1, 1) * (3, H, W) broadcasts to (3, H, W), then sum(dim=0) collapses the channel dimension, leaving (H, W).
Step 2 — adjust brightness
Brightness is just adding a constant. Clamp the result to [0, 255] so it stays a valid image.
bright = (image.float() + 50).clamp(0, 255).to(torch.uint8)plt.imshow(bright.permute(1, 2, 0))plt.axis("off")plt.show()Step 3 — flip horizontally and crop
Flipping is flip on the width axis. Cropping is slicing.
flipped = image.flip(dims=[2])plt.imshow(flipped.permute(1, 2, 0))plt.show()
h, w = image.shape[1], image.shape[2]crop = image[:, h // 4 : 3 * h // 4, w // 4 : 3 * w // 4]plt.imshow(crop.permute(1, 2, 0))plt.show()Step 4 — a 3×3 mean blur
A blur replaces every pixel with the average of its 3×3 neighborhood. We can implement that with a single torch.nn.functional.conv2d call. The kernel is a (out_channels, in_channels, kH, kW) tensor of 1/9 values.
import torch.nn.functional as F
kernel = torch.ones(1, 1, 3, 3) / 9.0
# conv2d expects [B, C, H, W] of floats; one channel at a time.def blur(channel: torch.Tensor) -> torch.Tensor: x = channel.float().unsqueeze(0).unsqueeze(0) # [1, 1, H, W] out = F.conv2d(x, kernel, padding=1) return out.squeeze().clamp(0, 255).to(torch.uint8)
blurred = torch.stack([blur(image[c]) for c in range(image.shape[0])], dim=0)
plt.imshow(blurred.permute(1, 2, 0))plt.axis("off")plt.show()The trick is the shape juggling, not the math. Practice reading the comments — every line is a tensor reshape.
Step 5 — save the result
from torchvision.io import write_jpeg
write_jpeg(blurred, "blurred.jpg")write_jpeg(bright, "bright.jpg")write_jpeg(gray.to(torch.uint8).unsqueeze(0).repeat(3, 1, 1), "gray.jpg")write_jpeg expects a [3, H, W] tensor, so the grayscale needs an unsqueeze + repeat to become a 3-channel image again.
Going further
If you finish early, try one of these:
- Implement edge detection with a Sobel kernel
apply each, square, sum, take the square root.sobel_x = torch.tensor([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]).float()sobel_y = sobel_x.T
- Reduce the resolution by 2× using
torch.nn.functional.avg_pool2d. - Solarize the image — invert pixels above a threshold:
inverted = torch.where(image > 128, 255 - image, image).
Recap
- A tensor is a multi-dimensional array. Its
shape,dtypeanddeviceare the three things you debug with. - Factory functions (
zeros,ones,rand,arange,linspace) cover most needs. - Broadcasting and matmul are the two operations that do most of the heavy lifting.
- Reshape (
view,reshape,unsqueeze,squeeze,permute) is what you’ll spend the most time on. When code doesn’t work, print shapes. - NumPy and tensors convert in O(1); GPU tensors must come back to CPU before NumPy.
- Image processing is just tensor algebra. Convolution is one matmul under the hood.
The next chapter, Workflow, uses everything from this chapter to train an actual model.