Start Here

Where abstraction ends.

You have found The Software Frontier. Most people get here from a single essay: a CUDA piece, a postmortem, something a colleague forwarded, but they have no idea what else lives in the archive.

This page is the actual map.

Three threads run through the publication. Find the one closest to what you actually came for, and start there.

“I write CUDA. I want to understand what’s really happening under the kernel.”

This is the spine of the publication. The Mastering CUDA and High-Performance Computing series goes from the compiler frontend all the way down to SASS, the warp scheduler, the memory hierarchy, and software pipelining. Ten parts so far. Read in order.

Part I — LLVM internals, the toolchain, and what nvcc actually does to your kernel. The entry point.
Part II — execution model and the warp scheduler.
Part III — the submission pipeline. What happens between cudaLaunchKernel and the SM.
Part IV — the memory hierarchy in mechanical detail.
Part V — the instruction pipeline.
Part VI — cp.async and the asynchronous-copy execution model.
Part VII — software pipelining and multi-stage cp.async.
Part VIII — the roofline model in practice.
Part IX and Part X — the end of the series.

After all of this, if you want the complete reference, like Hopper, Blackwell, the SM in mechanical detail, WMMA/WGMMA/UMMA, TMA, NCCL internals, CUTLASS 4, the full Nsight workflow, the guide is the natural next step.

→ CUDA Mastery 2026 — 27 chapters, 5 appendices, fact-checked against PTX ISA 8.7 and primary architecture whitepapers. €89.

“I run systems in production. I want to understand why they really fail.”

The strongest single entry point into the publication.

How Systems Really Fail, Part I opens with the November 2025 Cloudflare ClickHouse incident, threads through FLP impossibility, CAP, and Lamport on clocks, and argues that production failures at scale are structurally epistemic problems before they are technical ones.

Part II extends this into the observer problem: why dashboards lie, why aggregation destroys signal, and the gap between the system that exists and the system you can see.

For the kernel-level view on the same territory, the three-part Katran series traces how Meta turned the Linux kernel itself into a planet-scale load balancer:

How Meta turned the Linux Kernel into a planet-scale Load Balancer, Part I — why hyperscale traffic outgrew userspace proxies.
Part II — DSR, XDP, the offensively simple datapath.
Part III — stateless routing, eBPF maps, and statelessness as a failure-handling strategy.

“I want to read the ideas underneath.”

A smaller set of essays sits where the technical work meets the broader questions. Not the main thread, but the place readers go when they want to know what the rest of the publication is built on.

Thinking in Systems — what fifteen years of writing code does to how you see the world.
Why Everyone Should Learn Systems Thinking as an Engineer — the case for systems thinking as a primary literacy, not a specialist skill.
The Age of Synthetic Thought — what changes when machines become co-authors of cognition rather than instruments.
The Future of Computer Science — biology, optics, molecular computing, and the dissolving boundary between hardware and environment.
Can Time Be Computed, Part I and Part II — closed timelike curves, the Wheeler–DeWitt equation, and why the deepest limits in systems are ontological rather than computational.
How I Rewired My Brain for Creative Engineering — engineering as the discipline of slack.
Navigating the Techno-Future — agency, prudence, and what technological neutrality actually requires.

Free vs. paid

Free gives you most of what we publish, including the foundational pieces from each thread: the Mastering CUDA series, the Katran series, How Systems Really Fail, and the long-form essays linked above. If you only ever read the free tier, you will still get at least the core of the publication.

Paid funds the research time that makes the depth possible. Paid subscribers get the full archive, the discussion threads, and the deep-dive series that take even two months of work to write.

Some essays, the ones that require the most original research, are going to be published for paid subscribers only. Many are not. We do not gate the entire publication; paid is a way to fund it, not unlock it.

→ Subscribe

Products

CUDA Mastery 2026 — The Definitive Engineer’s Reference for Hopper, Blackwell, and Beyond. 27 chapters, 5 appendices, fact-checked end-to-end against NVIDIA documentation, PTX ISA 8.7, and primary architecture whitepapers.

CUDA Toolkit 13.0–13.2, compute capabilities 7.5–12.1, WMMA, WGMMA, UMMA, TMA, thread block clusters, Tensor Memory, CUTLASS 4, NCCL 2.30, Nsight 2025.4. → Get it on Gumroad · €89

Who writes it

Lorenzo Bradanini is co-founder and lead author at The Software Frontier. Self taught software engineer based near Como, Italy.

Studies the GPU microarchitecture and kernel-internals level, LLVM backend (FunctionPassManager, GVN, SROA, LoopVectorization), SASS, NVIDIA architecture from Volta through Blackwell, the CUDA toolchain end-to-end.

Also co-founder of CortexFlow on the systems-design and AI-infrastructure side, and author of the Software Architecture Patterns series (~500 reactions across four parts), Democratizing AI Compute, and Vector Databases under the CortexFlow technical publication.

Writes The Cognitive Layer, a companion Substack publication on the architecture of thought across AI, philosophy, mathematics, and neuroscience.

Author of “Il Silenzio Tra le Stelle”", a book on physics, mathematics, and the limits of computation, built around an original framework called the Principio Relazionale Fondamentale. Cross-posts shorter notes on dev.to and Medium.

Lorenzo Tettamanti is co-owner of The Software Frontier, co-author, and editor. Co-founder of CortexFlow, an open-source container networking and observability platform built in Rust and eBPF for Kubernetes: sidecarless, kernel-level packet interception via TC and XDP hooks, BPF-map-driven telemetry.

Physicist by training, long-term open-source contributor, with working knowledge across distributed systems, the Linux kernel, eBPF, system design, and data engineering. Comfortable in Rust, C, Python, and Java across the full stack: backend services, frontend, ML tooling (Keras, TensorFlow, pandas, Node.js).

Currently doing research in computational physics applied to LLMs. Co-authors the Mastering CUDA series, the Katran series, and the How Systems Really Fail work; reviews every technical long-form piece before it goes out.

What to do next

Pick a thread that inspires you. Read one essay end to end. If the publication is for you, that will be obvious by the time you finish.

→ Subscribe