Following on from the question of “how long to µPython interrupts take” (around 15µs) the question came about can I add proper low-level C interrupts without changing the micropython source code? Answer: obviously no, because the interrupt vector table is stored in flash.
Less obvious answer, yes, you can, but it is a bit like changing the tyres on the car while you are driving along the highway at the speed limit. Here’s how I did it. Clue: much reading of data sheets was involved.
M4 IRQ handlers are in a vector which is pointed at by the VTOR
register - at boot this points to 0x4000
which is just inside the flash code, and consists of 16 registers belonging to ARM before the 138 registers for the IRQ handler (on SAMD51). The EIC registers for the external interrupt controller start at position 12 and count for a further 16 - I want to overwrite one of these with my own handler code.
Use D13 as output and an indicator of what is going on. Setting up D12 as input using the usual machine.Pin
and pin.irq()
shuffle, because (this is the cunning bit) I will just overwrite the IRQ function pointer. To make this work, need jumper from D12 to D13 (PA23->PA22 in AMTEL-speak).
This is a long one. Mostly C:
#include "py/dynruntime.h"
// defines
#define EIC_BASE 0x40002800
#define VTOR_ADDR 0xe000ed08
#define PORT_BASE 0x41008000
// memory structure - interrupt vector is length > 128 words -> 256 word alignment
// needed so just allocate 2 kB of memory, seek to point where this is aligned
// correctly
unsigned int *buffer = NULL;
unsigned int *irq_vector_copy = NULL;
unsigned int *VTOR_INIT = 0;
void irq_action(void) {
// clear PA22
*((unsigned int *) (PORT_BASE | 0x14)) = 0x1 << 22;
// clear IRQ
*(unsigned int *) (EIC_BASE | 0x14) = 0x1 << 7;
}
STATIC mp_obj_t drive_irq_init(void) {
// allocate buffer of 2 x size to give space to align with 1024 kB boundary
if (buffer == NULL) {
buffer = m_malloc(2 * 1024);
irq_vector_copy = (unsigned int *)(((unsigned int)buffer & (~1023)) + 1024);
VTOR_INIT = *(unsigned int **)VTOR_ADDR;
// Copy vector from current source location
for (int j = 0; j < (16 + 138); j++) {
irq_vector_copy[j] = VTOR_INIT[j];
}
// Register additional handler - for PA22 is on EIC7
irq_vector_copy[12 + 16 + 7] = (unsigned int) &irq_action;
// Update VTOR
*(unsigned int **)VTOR_ADDR = irq_vector_copy;
}
mp_int_t result = (mp_int_t)VTOR_INIT;
return mp_obj_new_int(result);
}
STATIC mp_obj_t drive_irq_deinit(void) {
if (buffer) {
// Revert VTOR
*(unsigned int **)VTOR_ADDR = VTOR_INIT;
m_free(buffer);
buffer = NULL;
irq_vector_copy = NULL;
}
mp_int_t result = (mp_int_t)irq_vector_copy;
return mp_obj_new_int(result);
}
STATIC MP_DEFINE_CONST_FUN_OBJ_0(drive_irq_init_obj, drive_irq_init);
STATIC MP_DEFINE_CONST_FUN_OBJ_0(drive_irq_deinit_obj, drive_irq_deinit);
mp_obj_t mpy_init(mp_obj_fun_bc_t *self, size_t n_args, size_t n_kw,
mp_obj_t *args) {
MP_DYNRUNTIME_INIT_ENTRY
mp_store_global(MP_QSTR_init, MP_OBJ_FROM_PTR(&drive_irq_init_obj));
mp_store_global(MP_QSTR_deinit, MP_OBJ_FROM_PTR(&drive_irq_deinit_obj));
MP_DYNRUNTIME_INIT_EXIT
}
On drive_irq.init()
this copies and rewrites the IRQ vector to another place in RAM, rewrites EIC#7 and then updates the VTOR register. drive_irq.deinit()
undoes this. The IRQ function just writes a clear bit to the right bit in the PORT register for D13 / PA22.
µPython code with this embedded:
# Blink the LED with DMA
#
# i.e. write the correct bit to the TGL register a few million times
# N.B. will involve DMA chaining and DMA enable / disable.
from uctypes import addressof
from machine import mem32, mem8
import time
# base addresses
MCLK_BASE = 0x40000800
GCLK_BASE = 0x40001C00
PORT_BASE = 0x41008000
DMAC_BASE = 0x4100A000
TCC1_BASE = 0x41018000
# for Grand Central: PORT_BASE |= 0x80 and LED_BIT = 0x1 << 1 for PB01
LED_BIT = 0x1 << 22
# set up LED - D13 => PA22 - already configured above
mem32[PORT_BASE | 0x8] = LED_BIT
mem32[PORT_BASE | 0x1C] = LED_BIT
# hook up 48 MHz clock
mem32[GCLK_BASE | 0x80 | 100] = (0x1 << 6) | 0x4
mem32[GCLK_BASE | 0x20 | 16] = (0x1 << 16) | (0x1 << 8) | 0x6
# configure TCC1 - we don't much care about the CC value, only the overflow
# (though we could trigger off CC and use this to measure?)
mem32[MCLK_BASE | 0x18] |= 0x1 << 12
mem32[TCC1_BASE | 0x3C] = 2
mem32[TCC1_BASE | 0x40] = 95
mem32[TCC1_BASE | 0x4C] = 48
# configure data buffer and DMA
buffer = bytearray(4)
address = addressof(buffer)
mem32[address] = LED_BIT
# IRQ handler on D12 - toggle D13
def off(pin):
print("off")
from machine import Pin
led = Pin("D12", Pin.IN)
led.irq(off, Pin.IRQ_RISING, hard=True)
# get the DMA configured - depends on timer counter and global clock
# allocate part of the SRAM for DMAC working memory - each DMAC needs
# 4 words, and in the examples I looked at they needed to be 16-byte
# aligned but I have no idea if this is important - it turns out it
# probably is, so allocate extra space
dma_bfr = bytearray(2 * 16)
DESC_BASE = addressof(dma_bfr)
DWRB_BASE = addressof(dma_bfr) + 16
mem32[DMAC_BASE | 0x34] = DESC_BASE
mem32[DMAC_BASE | 0x38] = DWRB_BASE
# start actual DMA configuration - enable 0x2 and all priority?
mem32[DMAC_BASE | 0x0] = (0xF << 8) | 0x2
# select channel 0 - configure as burst with size = 1 triggered
# TCC1 OVF - chain will ensure that this keeps rolling
mem32[DMAC_BASE | 0x40] = (0x2 << 20) | (0x1D << 8)
# channel zero configuration NN in top half, do increment the source pointer
# or destination pointer and move 4 bytes, chain to #0
mem32[DESC_BASE] = (1000 << 16) | (0x2 << 8) | 0x1
mem32[DESC_BASE | 0x4] = address
mem32[DESC_BASE | 0x8] = PORT_BASE | 0x1C
mem32[DESC_BASE | 0xC] = DESC_BASE
# enable
mem32[DMAC_BASE | 0x40] |= 0x2
# now override the IRQ system
import drive_irq
drive_irq.init()
# trigger by enabling TCC1
mem32[TCC1_BASE] = 0x2
# should run for a while
time.sleep(60)
# disable
mem32[TCC1_BASE] = 0x0
mem32[DMAC_BASE | 0x40] &= 0xFFFFFFFC
mem32[DMAC_BASE | 0x0] = 0x0
mem32[PORT_BASE | 0x14] = LED_BIT
drive_irq.deinit()
N.B. off()
is never called: it is just a null pointer to use while everything else is set up in pin.irq
. Everything else is there to set up the 500 kHz clock on D13 / PA22 which triggers D12 and hence the IRQ.
Remembering that the µPython version took ~ 15µs, this is much better:
However that is not good enough because (i) ~300ns feels like a long time (40 ticks?) and (ii) there is a lot of jitter on that time, seems to vary quite a lot when the CPU should really not be doing anything but time.sleep()
. That said, the IRQ handler will also be dealing with USB ping packets etc. at ~ 1kHz too, and I am running it at 1,000,000 interrupts / second which is probably a bit unsporting.
The assembly for the irq_action
code looks exactly what I would write -
00000000 <irq_action>:
0: 4b04 ldr r3, [pc, #16] ; (14 <irq_action+0x14>)
2: f44f 0280 mov.w r2, #4194304 ; 0x400000
6: 615a str r2, [r3, #20]
8: 4b03 ldr r3, [pc, #12] ; (18 <irq_action+0x18>)
a: 2280 movs r2, #128 ; 0x80
c: f8c3 2814 str.w r2, [r3, #2068] ; 0x814
10: 4770 bx lr
12: bf00 nop
14: 41008000 mrsmi r8, (UNDEF: 0)
18: 40002000 andmi r2, r0, r0
So I am not sure where all those cycles are going (the two last records are address records.) Book says the interrupt should take 12 ticks to trigger, this is about another 6 before the bit meets the register so 🤷♂️ - answers on a postcard welcome.