CUDA Education
CUDA U has online courses to help you get started programming or teaching CUDA as well as links to Universities teaching CUDA.
CUDA U is organized into four sections to get you started
- Introductory CUDA Technical courses
- A full semester CUDA Class from University of Illinois you can play on your iPod
- Universities teaching CUDA where you can apply to enroll to
- CUDA Seminars from major events worldwide
- Bi-weekly CUDA and OpenCL live webinars
CUDAcasts - Downloadable CUDA Training Podcasts
- Introduction to GPU Computing
- CUDA Programming Model Overview
- CUDA Programming Basics - Part I
- CUDA Programming Basics - Part II
- Volume I: Introduction to CUDA Programming
- Exercises (for Linux and Mac)
- Visual Studio Exercises (for Windows)
- Instructions for Exercises
- Volume II: CUDA Case Studies
From University of Illinois: ECE 498AL
Taught by Professor Wen-mei W. Hwu and David Kirk, NVIDIA Chief Scientist.
- Introduction to GPU Computing (60.2 MB)
- CUDA Programming Model (75.3 MB)
- CUDA API (32.4 MB)
- Simple Matrix Multiplication in CUDA (46.0 MB)
- CUDA Memory Model (109 MB)
- Shared Memory Matrix Multiplication (81.4 MB)
- Additional CUDA API Features (22.4 MB)
- Useful Information on CUDA Tools (15.7 MB)
- Threading Hardware (140 MB)
- Memory Hardware (85.8 MB)
- Memory Bank Conflicts (115 MB)
- Parallel Thread Execution (32.6 MB)
- Control Flow (96.6 MB)
- Precision (137 MB)
These classes are each downloadable CUDAcasts with video pre-scaled to be compatible with major players
- SC07 Tutorial: High Performance Computing with CUDA
- NVISION 08 Tutorials
- Getting Started with CUDA (covers CUDA programming model, basics of CUDA programming, and BLAS and FFT libraries)
- Advanced CUDA Training (covers 10-series architecture and optimization techniques using particle simulation and finite difference case studies)
- ISC 2008 Case Study: Computational Fluid Dynamics (CFD)

- CUDA, Supercomputing for the Masses: Part 1
CUDA lets you work with familiar programming concepts while developing software that can run on a GPU
- CUDA, Supercomputing for the Masses: Part 2
A first kernel
- CUDA, Supercomputing for the Masses: Part 3
Error handling and global memory performance limitations
- CUDA, Supercomputing for the Masses: Part 4
Understanding and using shared memory (1)
- CUDA, Supercomputing for the Masses: Part 5
Understanding and using shared memory (2)
- CUDA, Supercomputing for the Masses: Part 6
Global memory and the CUDA profiler
- CUDA, Supercomputing for the Masses: Part 7
Double the fun with next-generation CUDA hardware
- CUDA, Supercomputing for the Masses: Part 8
Using libraries with CUDA
- CUDA, Supercomputing for the Masses: Part 9
Extending High-level Languages with CUDA
- CUDA, Supercomputing for the Masses: Part 10
CUDPP, a powerful data-parallel CUDA library
- CUDA, Supercomputing for the Masses: Part 11
Revisiting CUDA memory spaces
- CUDA, Supercomputing for the Masses: Part 12
- CUDA, Supercomputing for the Masses: Part 13
CUDA 2.2 changes the data movement paradigm
Using texture memory in CUDA
