Graeme Winter

Elementary DMA on the SAMD51

Like most other microcontrollers the SAMD51 has DMA hardware to do heavy lifting without using the CPU to keep doing ldr and str all the while. This is obviously a critical thing for e.g. writing data to a DAC at 1MHz while performing FFT’s of some other result.

Most elementary use of DMA is to apply it as a glorified memcpy - which is what is done here.

Code

# Simple DMA Example
#
# Copy one word from memory to 1,000 words of memory in a buffer
# (yes this is just memcpy)

from machine import mem32
from uctypes import addressof

NN = 1000

buffer = bytearray(4 * NN)
address = addressof(buffer)

# zero out

for j in range(NN):
    mem32[address + 4 * j] = 0

# set the first word

mem32[address] = 0x42424242

# get the DMA configured - depends on timer counter and global clock

DMAC_BASE = 0x4100A000
TC0_BASE = 0x40003800
GCLK_BASE = 0x40001C00

# allocate part of the SRAM for DMAC working memory - each DMAC needs
# 4 words, and in the examples I looked at they needed to be 16-byte
# aligned but I have no idea if this is important

dma_desc = bytearray(32 * 16)
dma_dwrb = bytearray(32 * 16)

DESC_BASE = addressof(dma_desc)
DWRB_BASE = addressof(dma_dwrb)

mem32[DMAC_BASE | 0x34] = DESC_BASE
mem32[DMAC_BASE | 0x38] = DWRB_BASE

# start actual DMA configuration - enable 0x2 and all priority?

mem32[DMAC_BASE | 0x0] = (0xf << 8) | 0x2

# select channel 0 - configure as 1 write per beat (default; 0x0)
# burst length of 1 (0x0) triggered as burst (0x2 << 20) and
# triggered by DMA overflow trigger (0x2c << 8) - N.B. not yet
# enabled by setting bit 1 

mem32[DMAC_BASE | 0x40] = (0x2 << 20) # | (0x2c << 8) - no triggers, should just work?

# channel zero configuration NN in top half, do not increment the source pointer
# do increment the destination pointer and move in 4 byte increments
mem32[DESC_BASE] = ((NN - 1) << 16) | (0x1 << 11) | (0x2 << 8) | 0x1
mem32[DESC_BASE | 0x4] = address
mem32[DESC_BASE | 0x8] = address + NN * 4
mem32[DESC_BASE | 0xc] = 0

# enable
mem32[DMAC_BASE | 0x40] |= 0x2

# send 999 triggers to channel 0
for j in range(999):
    mem32[DMAC_BASE | 0x10] = 0x1
    
for j in range(1000):
    print(hex(mem32[address + 4 * j]))

Create a 4,000 byte array, zero out (to make sure I am not looking at bytes from last time) then write a constant into the first word of the array. Then configure the DMA to copy this constant to the remaining 999 elements.

The DMA need a trigger event - in this case it is a simple write to the software trigger address. There are a bunch of different trigger sources, and the example I was using previously was a timer counter TC0.

Reading the data back shows it was copied. If fewer than 999 software triggers are sent, the tail of the array is full of zeros, which can be resolved one-at-a-time thus:

>>> mem32[DMAC_BASE | 0x10] = 0x1
>>> print(hex(mem32[address + 4 * 999]))
0x0
>>> mem32[DMAC_BASE | 0x10] = 0x1
>>> print(hex(mem32[address + 4 * 999]))
0x42424242

Block Transfer

Variation on a theme of the code above - set the channel configuration for channel 0 to NULL (software trigger, block transfer) then send one trigger - will copy whole array in one transaction:

# Simple DMA Example
#
# Copy one word from memory to 1,000 words of memory in a buffer
# (yes this is just memcpy)

from machine import mem32, mem16
from uctypes import addressof

NN = 1000

buffer = bytearray(4 * NN)
address = addressof(buffer)

# zero out

for j in range(NN):
    mem32[address + 4 * j] = 0

# set the first word

mem32[address] = 0x42424242

# get the DMA configured - depends on timer counter and global clock

DMAC_BASE = 0x4100A000
TC0_BASE = 0x40003800
GCLK_BASE = 0x40001C00

# SWRST of DMA controller
mem32[DMAC_BASE] = 0x1

# allocate part of the SRAM for DMAC working memory - each DMAC needs
# 4 words, and in the examples I looked at they needed to be 16-byte
# aligned but I have no idea if this is important
dma_desc = bytearray(32 * 16)
dma_dwrb = bytearray(32 * 16)

DESC_BASE = addressof(dma_desc)
DWRB_BASE = addressof(dma_dwrb)

mem32[DMAC_BASE | 0x34] = DESC_BASE
mem32[DMAC_BASE | 0x38] = DWRB_BASE

# start actual DMA configuration - enable 0x2 and all priority?
mem32[DMAC_BASE | 0x0] = (0xf << 8) | 0x2

# select channel 0 - configure as block write i.e. will send everything
# once the trigger arrives
mem32[DMAC_BASE | 0x40] = 0

# channel zero configuration NN in top half, do not increment the source pointer
# do increment the destination pointer and move in 4 byte increments
mem32[DESC_BASE] = ((NN - 1) << 16) | (0x1 << 11) | (0x2 << 8) | 0x1
mem32[DESC_BASE | 0x4] = address
mem32[DESC_BASE | 0x8] = address + NN * 4
mem32[DESC_BASE | 0xc] = 0

# enable
mem32[DMAC_BASE | 0x40] |= 0x2

# print last 5 values before transfer
for j in range(995, 1000):
    print(hex(mem32[address + 4 * j]))

# send 1 trigger to channel 0
mem32[DMAC_BASE | 0x10] = 0x1

# wait for complete - this is probably a no-op
while mem32[DMAC_BASE | 0x28] & 0x1:
    print("busy")

# print last 5 values after transfer
for j in range(995, 1000):
    print(hex(mem32[address + 4 * j]))

# disable
mem32[DMAC_BASE | 0x40] &= 0xfffffffc
mem32[DMAC_BASE | 0x0] = 0x0

N.B. a little more care taken here to reset the hardware before start, disable the DMAC at the end. Output:

0x0
0x0
0x0
0x0
0x0
busy
0x42424242
0x42424242
0x42424242
0x42424242
0x42424242

Discussion

Turns out the DMA does have a trigger called DAC EMPTY or similar, which I can use for my use case, but I don’t know how this is paced…