This is a cache of https://discuss.96boards.org/t/how-to-make-nvidia-or-amd-gfx-card-work-on-arm-based-boards/10448. It is a snapshot of the page at 2024-09-20T01:02:11.849+0000.
How to make nvidia or AMD gfx card work on ARM based boards - General - 96Boards Forum

How to make nvidia or AMD gfx card work on ARM based boards

I am looking for right place to ask and get answer for this question
How can I make nvidia or AMD gfx card work on ARM based boards?
for example on HiKey 960/970/khadas board or any other with pcie/m.2 connector on board
I know I will get just one pcie line but still.

M.2 ssd or network cards and many other devices 'just works"

I don’t think there should be any difference. Make sure that the kernel is build with the appropriate drivers for the device you are connecting, and that you have all the needed firmware in place.

That should work in theorie (and would be fun to see), you’ll need an adapater (something like https://www.amazon.com/ADT-Link-External-Graphics-Bracket-GTX1080ti/dp/B07XYZSX55) for PCIe to M2 (check that M2 version/key is compatible) and to have a dedicated exteral power for the card… On software side, include the requested kernel driver and userspace components.

Related blog ticket: External PCIe GPU on the Oxalis - 96Boards

I have all hw, but it isn’t working.
Some problem with pcie driver or I don’t know.
Anybody success with dw pcie driver + nvidia/amd open source drivers?

I’m afraid compability questions have to be more specific than just “Arm” and “dw pcie”. There is usually glue logic provided by the SoC vendor and it is very common for that glue logic to be validated only on WiFi or NVMe cards (which don’t use the same bus transaction widths).

For example I know the AMD Radeon R7 240D works on Arm but I also know that the same card cannot work on the Arm-based Developerbox due to a problem with wide transactions.

@danielt
I was able to boot up properly nvidia and amd gpus
on khadas VIM3 board with modified pcie ranges
ranges = <0x81000000 0x0 0x00000000 0x40 0x00010000 0x0 0x00010000 /* downstream I/O /
0x82000000 0x0 0x40000000 0x40 0x40000000 0x0 0x40000000>; /
non-prefetchable memory */

I took this ranges from fsl-ls1012a.dtsi
as it is same pci device(dw pci) :slight_smile:

nvidia
[ 0.210101] dw-pcie fc000000.pcie: IRQ index 1 not found
[ 0.210286] meson-pcie fc000000.pcie: get phy failed, -517
[ 1.168485] ehci-pci: EHCI PCI platform driver
[ 1.184170] ohci-pci: OHCI PCI platform driver
[ 1.306707] dw-pcie fc000000.pcie: IRQ index 1 not found
[ 1.317955] meson-pcie fc000000.pcie: host bridge /soc/pcie@fc000000 ranges:
[ 1.323909] meson-pcie fc000000.pcie: IO 0x4000010000…0x400001ffff → 0x0000000000
[ 1.341617] meson-pcie fc000000.pcie: MEM 0x4040000000…0x407fffffff → 0x0040000000
[ 1.341831] meson-pcie fc000000.pcie: invalid resource
[ 1.369140] meson-pcie fc000000.pcie: Link up
[ 1.377599] meson-pcie fc000000.pcie: PCI host bridge to bus 0000:00
[ 1.388328] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 1.388330] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 1.388333] pci_bus 0000:00: root bus resource [mem 0x4040000000-0x407fffffff] (bus address [0x40000000-0x7fffffff])
[ 1.388365] pci 0000:00:00.0: [16c3:abcd] type 01 class 0x060400
[ 1.388477] pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x0000ffff pref]
[ 1.397632] pci 0000:00:00.0: supports D1
[ 1.409110] pci 0000:00:00.0: PME# supported from D0 D1 D3hot D3cold
[ 1.410972] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000
[ 1.466953] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[ 1.473149] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref]
[ 1.480307] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
[ 1.487447] pci 0000:01:00.0: reg 0x24: [io 0x0000-0x007f]
[ 1.492967] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[ 1.499611] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
[ 1.506963] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[ 1.521545] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 1.529780] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300
[ 1.535693] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[ 1.542011] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
[ 1.568672] pci 0000:00:00.0: BAR 9: assigned [mem 0x4040000000-0x404bffffff pref]
[ 1.570598] pci 0000:00:00.0: BAR 8: assigned [mem 0x404c000000-0x404d7fffff]
[ 1.577668] pci 0000:00:00.0: BAR 6: assigned [mem 0x404d800000-0x404d80ffff pref]
[ 1.585172] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff]
[ 1.591211] pci 0000:01:00.0: BAR 1: assigned [mem 0x4040000000-0x4047ffffff 64bit pref]
[ 1.599265] pci 0000:01:00.0: BAR 3: assigned [mem 0x4048000000-0x4049ffffff 64bit pref]
[ 1.607284] pci 0000:01:00.0: BAR 0: assigned [mem 0x404c000000-0x404cffffff]
[ 1.614333] pci 0000:01:00.0: BAR 6: assigned [mem 0x404a000000-0x404a07ffff pref]
[ 1.621828] pci 0000:01:00.1: BAR 0: assigned [mem 0x404d000000-0x404d003fff]
[ 1.628915] pci 0000:01:00.0: BAR 5: assigned [io 0x1000-0x107f]
[ 1.634947] pci 0000:00:00.0: PCI bridge to [bus 01-ff]
[ 1.640113] pci 0000:00:00.0: bridge window [io 0x1000-0x1fff]
[ 1.646150] pci 0000:00:00.0: bridge window [mem 0x404c000000-0x404d7fffff]
[ 1.653222] pci 0000:00:00.0: bridge window [mem 0x4040000000-0x404bffffff pref]
[ 1.660912] pcieport 0000:00:00.0: PME: Signaling with IRQ 39
[ 1.666634] pcieport 0000:00:00.0: AER: enabled with IRQ 39
[ 1.672205] pcieport 0000:00:00.0: bw_notification: enabled with IRQ 39
[ 1.678649] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

amd
[ 1.455132] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 1.460567] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 1.466689] pci_bus 0000:00: root bus resource [mem 0x4040000000-0x407fffffff] (bus address [0x40000000-0x7fffffff])
[ 1.477141] pci 0000:00:00.0: [16c3:abcd] type 01 class 0x060400
[ 1.483083] pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x0000ffff pref]
[ 1.489750] pci 0000:00:00.0: supports D1
[ 1.493690] pci 0000:00:00.0: PME# supported from D0 D1 D3hot D3cold
[ 1.501893] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.508110] pci 0000:01:00.0: [1002:1478] type 01 class 0x060400
[ 1.513942] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 1.520782] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[ 1.526417] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:00.0 (capable of 126.024 Gb/s with 16.0 GT/s PCIe x8 link)
[ 1.566708] pci 0000:01:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.569452] pci 0000:02:00.0: [1002:1479] type 01 class 0x060400
[ 1.575677] pci 0000:02:00.0: PME# supported from D0 D3hot D3cold
[ 1.583845] pci 0000:02:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.589391] pci 0000:03:00.0: [1002:7340] type 00 class 0x030000
[ 1.595027] pci 0000:03:00.0: reg 0x10: [mem 0x00000000-0x0fffffff 64bit pref]
[ 1.602146] pci 0000:03:00.0: reg 0x18: [mem 0x00000000-0x001fffff 64bit pref]
[ 1.609280] pci 0000:03:00.0: reg 0x20: [io 0x0000-0x00ff]
[ 1.614797] pci 0000:03:00.0: reg 0x24: [mem 0x00000000-0x0007ffff]
[ 1.621005] pci 0000:03:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref]
[ 1.628165] pci 0000:03:00.0: PME# supported from D1 D2 D3hot D3cold
[ 1.634198] pci 0000:03:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[ 1.649112] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 1.657409] pci 0000:03:00.1: [1002:ab38] type 00 class 0x040300
[ 1.663300] pci 0000:03:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[ 1.669911] pci 0000:03:00.1: PME# supported from D1 D2 D3hot D3cold
[ 1.678513] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[ 1.682319] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 03
[ 1.688873] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 03
[ 1.695430] pci 0000:00:00.0: BAR 9: assigned [mem 0x4040000000-0x4057ffffff pref]
[ 1.702919] pci 0000:00:00.0: BAR 8: assigned [mem 0x4058000000-0x40581fffff]
[ 1.709991] pci 0000:00:00.0: BAR 6: assigned [mem 0x4058200000-0x405820ffff pref]
[ 1.717494] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff]
[ 1.723534] pci 0000:01:00.0: BAR 9: assigned [mem 0x4040000000-0x4057ffffff 64bit pref]
[ 1.731553] pci 0000:01:00.0: BAR 8: assigned [mem 0x4058000000-0x40580fffff]
[ 1.738629] pci 0000:01:00.0: BAR 0: assigned [mem 0x4058100000-0x4058103fff]
[ 1.745708] pci 0000:01:00.0: BAR 7: assigned [io 0x1000-0x1fff]
[ 1.751737] pci 0000:02:00.0: BAR 9: assigned [mem 0x4040000000-0x4057ffffff 64bit pref]
[ 1.759756] pci 0000:02:00.0: BAR 8: assigned [mem 0x4058000000-0x40580fffff]
[ 1.766828] pci 0000:02:00.0: BAR 7: assigned [io 0x1000-0x1fff]
[ 1.772870] pci 0000:03:00.0: BAR 0: assigned [mem 0x4040000000-0x404fffffff 64bit pref]
[ 1.780924] pci 0000:03:00.0: BAR 2: assigned [mem 0x4050000000-0x40501fffff 64bit pref]
[ 1.788945] pci 0000:03:00.0: BAR 5: assigned [mem 0x4058000000-0x405807ffff]
[ 1.795992] pci 0000:03:00.0: BAR 6: assigned [mem 0x4058080000-0x405809ffff pref]
[ 1.803488] pci 0000:03:00.1: BAR 0: assigned [mem 0x40580a0000-0x40580a3fff]
[ 1.810568] pci 0000:03:00.0: BAR 4: assigned [io 0x1000-0x10ff]
[ 1.816607] pci 0000:02:00.0: PCI bridge to [bus 03]
[ 1.821517] pci 0000:02:00.0: bridge window [io 0x1000-0x1fff]
[ 1.827563] pci 0000:02:00.0: bridge window [mem 0x4058000000-0x40580fffff]
[ 1.834631] pci 0000:02:00.0: bridge window [mem 0x4040000000-0x4057ffffff 64bit pref]
[ 1.842662] pci 0000:01:00.0: PCI bridge to [bus 02-03]
[ 1.847822] pci 0000:01:00.0: bridge window [io 0x1000-0x1fff]
[ 1.853869] pci 0000:01:00.0: bridge window [mem 0x4058000000-0x40580fffff]
[ 1.860937] pci 0000:01:00.0: bridge window [mem 0x4040000000-0x4057ffffff 64bit pref]
[ 1.868969] pci 0000:00:00.0: PCI bridge to [bus 01-03]
[ 1.874124] pci 0000:00:00.0: bridge window [io 0x1000-0x1fff]
[ 1.880161] pci 0000:00:00.0: bridge window [mem 0x4058000000-0x40581fffff]
[ 1.887233] pci 0000:00:00.0: bridge window [mem 0x4040000000-0x4057ffffff pref]
[ 1.894848] pcieport 0000:00:00.0: enabling device (0000 → 0003)
[ 1.900881] pcieport 0000:00:00.0: PME: Signaling with IRQ 39
[ 1.907553] pcieport 0000:00:00.0: AER: enabled with IRQ 39
[ 1.912135] pcieport 0000:00:00.0: bw_notification: enabled with IRQ 39
[ 1.918970] pcieport 0000:01:00.0: enabling device (0000 → 0003)
[ 1.925136] pcieport 0000:02:00.0: enabling device (0000 → 0003)
[ 1.930885] pcieport 0000:02:00.0: bw_notification: enabled with IRQ 41

But nouveau crashed like this
it is nvidia GT710

[ 6.166326] SError Interrupt on CPU4, code 0xbf000000 – SError
[ 6.166328] CPU: 4 PID: 1676 Comm: systemd-udevd Tainted: G C 5.10.0-rc6 #0.9.8
[ 6.166329] Hardware name: Khadas VIM3 (DT)
[ 6.166330] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=–)
[ 6.166331] pc : nvkm_longopt+0x0/0xa8 [nouveau]
[ 6.166332] lr : nvkm_device_ctor+0x2c0/0x3230 [nouveau]
[ 6.166333] sp : ffff8000121937e0
[ 6.166334] x29: ffff8000121937e0 x28: ffff00007cad8400
[ 6.166337] x27: ffff800008f96a70 x26: 0000000000000000
[ 6.166338] x25: ffff0000411560b8 x24: 0000000000000000
[ 6.166340] x23: 0000000000000000 x22: ffff800008f8ca50
[ 6.166342] x21: ffff000040f8a7c0 x20: ffff800009004628
[ 6.166343] x19: 0000000000000000 x18: 0000000000000030
[ 6.166345] x17: 0000000000000000 x16: 0000000000000000
[ 6.166347] x15: ffff000040f8a7c0 x14: 00000000000001b4
[ 6.166348] x13: ffff800010000000 x12: ffff80001118dff8
[ 6.166350] x11: ffff80001118dff8 x10: 0000000000000100
[ 6.166351] x9 : 0000000001000000 x8 : ffff00007f806000
[ 6.166353] x7 : ffff80001137ec24 x6 : 00000000000000c0
[ 6.166355] x5 : 000000404d000000 x4 : ffff800058000000
[ 6.166356] x3 : ffff800018ffffff x2 : 0000000000000000
[ 6.166358] x1 : ffff800008fdd018 x0 : 0000000000000000
[ 6.166360] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 6.166374] SMP: stopping secondary CPUs
[ 6.166375] Kernel Offset: disabled
[ 6.166375] CPU features: 0x0240002,61002004
[ 6.166376] Memory Limit: none

and AMD gpu

[ 6.181368] [drm] amdgpu kernel modesetting enabled.
[ 6.181518] checking generic (7f807000 7e9000) vs hw (4040000000 10000000)
[ 6.181519] checking generic (7f807000 7e9000) vs hw (4050000000 200000)
[ 6.181520] checking generic (7f807000 7e9000) vs hw (4058000000 80000)
[ 6.181608] amdgpu 0000:03:00.0: enabling device (0000 → 0003)
[ 6.181628] [drm] initializing kernel modesetting (NAVI14 0x1002:0x7340 0x1DA2:0xE421 0xC5).
[ 6.181633] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 6.181650] [drm] register mmio base: 0x58000000
[ 6.181651] [drm] register mmio size: 524288
[ 6.181673] [drm] PCIE atomic ops is not supported
[ 6.182062] SError Interrupt on CPU5, code 0xbf000000 – SError
[ 6.182064] CPU: 5 PID: 1695 Comm: systemd-udevd Tainted: G C 5.10.0-rc6 #0.9.8
[ 6.182065] Hardware name: Khadas VIM3 (DT)
[ 6.182066] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=–)
[ 6.182066] pc : amdgpu_device_rreg.part.0+0x50/0xd8 [amdgpu]
[ 6.182067] lr : amdgpu_device_rreg+0x1c/0x30 [amdgpu]
[ 6.182068] sp : ffff80001218b6e0
[ 6.182069] x29: ffff80001218b6e0 x28: 00000000fffffff4
[ 6.182072] x27: ffff80000923d000 x26: ffff000042bc0000
[ 6.182073] x25: ffff000042bd4000 x24: 0000000000000001
[ 6.182075] x23: ffff00004151dcc0 x22: 000000000000378c
[ 6.182076] x21: ffff000042bc4000 x20: ffff000042bc0000
[ 6.182078] x19: 0000000000000000 x18: 0000000000000000
[ 6.182079] x17: 0000000000007c78 x16: ffff000042bc0d28
[ 6.182081] x15: ffff00004151dcc0 x14: 000000000000000f
[ 6.182082] x13: 0000000000000000 x12: 0101010101010101
[ 6.182084] x11: 7f7f7f7f7f7f7f7f x10: fefefefefefefeff
[ 6.182085] x9 : 0000000000000000 x8 : ffff000041352000
[ 6.182087] x7 : 0000000000000000 x6 : 000000000000003f
[ 6.182088] x5 : 0000000000000040 x4 : ffff80001218b720
[ 6.182089] x3 : 0000000000080000 x2 : 0000000000000000
[ 6.182091] x1 : 0000000000000de3 x0 : 0000000000000000
[ 6.182093] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 6.182105] SMP: stopping secondary CPUs
[ 6.182106] Kernel Offset: disabled
[ 6.182107] CPU features: 0x0240002,61002004
[ 6.182107] Memory Limit: none

Can’t add much I’m afraid except that GT710 is known working on several different Arm64 platforms (as it happens I’m using one right now with a Cortex A72 system to type this reply).

@danielt
what do I need to do when I want to add firmware ?
I am on ubuntu 20.04 are there any package with nvidia firmwares ?

And can you share 64-bit PCI problem patch ???

Better Call trace
AMD
[ 35.194873] SError Interrupt on CPU5, code 0xbf000000 – SError
[ 35.194875] CPU: 5 PID: 2135 Comm: modprobe Tainted: G C 5.10.6 #0.9.8
[ 35.194876] Hardware name: Khadas VIM3 (DT)
[ 35.194877] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=–)
[ 35.194878] pc : amdgpu_device_rreg.part.0+0x50/0xd8 [amdgpu]
[ 35.194879] lr : amdgpu_device_rreg+0x1c/0x30 [amdgpu]
[ 35.194880] sp : ffff800012a636c0
[ 35.194880] x29: ffff800012a636c0 x28: 00000000fffffff4
[ 35.194883] x27: ffff800009266000 x26: ffff00007bd80000
[ 35.194885] x25: ffff00007bd94000 x24: 0000000000000001
[ 35.194887] x23: ffff00007be027c0 x22: 000000000000378c
[ 35.194888] x21: ffff00007bd84000 x20: ffff00007bd80000
[ 35.194890] x19: 0000000000000000 x18: 0000000000000000
[ 35.194891] x17: 0000000000007c78 x16: ffff00007bd80d28
[ 35.194893] x15: ffff00007be027c0 x14: 000000000000000f
[ 35.194894] x13: 0000000000000000 x12: 0101010101010101
[ 35.194896] x11: 7f7f7f7f7f7f7f7f x10: fefefefefefefeff
[ 35.194897] x9 : 0000000000000000 x8 : ffff000041199000
[ 35.194898] x7 : 0000000000000000 x6 : 000000000000003f
[ 35.194900] x5 : 0000000000000040 x4 : 0000000000000000
[ 35.194901] x3 : 0000000000080000 x2 : 0000000000000000
[ 35.194903] x1 : 0000000000000de3 x0 : 0000000000000000
[ 35.194904] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 35.194905] CPU: 5 PID: 2135 Comm: modprobe Tainted: G C 5.10.6 #0.9.8
[ 35.194906] Hardware name: Khadas VIM3 (DT)
[ 35.194906] Call trace:
[ 35.194907] dump_backtrace+0x0/0x1d0
[ 35.194907] show_stack+0x18/0x60
[ 35.194908] dump_stack+0xd0/0x12c
[ 35.194909] panic+0x15c/0x330
[ 35.194909] nmi_panic+0x8c/0x90
[ 35.194910] arm64_serror_panic+0x78/0x84
[ 35.194910] do_serror+0x3c/0x68
[ 35.194911] el1_error+0x90/0x110
[ 35.194911] amdgpu_device_rreg.part.0+0x50/0xd8 [amdgpu]
[ 35.194912] amdgpu_device_rreg+0x1c/0x30 [amdgpu]
[ 35.194913] amdgpu_discovery_reg_base_init+0x50/0x400 [amdgpu]
[ 35.194913] nv_set_ip_blocks+0x1a8/0x6c8 [amdgpu]
[ 35.194914] amdgpu_device_init+0xde4/0x1a60 [amdgpu]
[ 35.194915] amdgpu_driver_load_kms+0x1c/0x1c0 [amdgpu]
[ 35.194915] amdgpu_pci_probe+0x120/0x258 [amdgpu]
[ 35.194916] pci_device_probe+0xbc/0x178
[ 35.194917] really_probe+0xe8/0x4d0
[ 35.194917] driver_probe_device+0xf4/0x160
[ 35.194918] device_driver_attach+0x74/0x80
[ 35.194918] __driver_attach+0xa4/0x170
[ 35.194919] bus_for_each_dev+0x70/0xc0
[ 35.194920] driver_attach+0x24/0x30
[ 35.194920] bus_add_driver+0x140/0x220
[ 35.194921] driver_register+0x64/0x120
[ 35.194921] __pci_register_driver+0x44/0x50
[ 35.194922] amdgpu_init+0x6c/0x1000 [amdgpu]
[ 35.194923] do_one_initcall+0x54/0x1c0
[ 35.194923] do_init_module+0x54/0x208
[ 35.194924] load_module+0x1f34/0x25e8
[ 35.194924] __do_sys_finit_module+0xbc/0x128
[ 35.194925] __arm64_sys_finit_module+0x20/0x30
[ 35.194926] el0_svc_common.constprop.0+0x80/0x220
[ 35.194926] do_el0_svc+0x24/0x90
[ 35.194927] el0_svc+0x1c/0x50
[ 35.194927] el0_sync_handler+0xb0/0xb8
[ 35.194928] el0_sync+0x174/0x180
[ 35.194939] SMP: stopping secondary CPUs
[ 35.194939] Kernel Offset: disabled
[ 35.194940] CPU features: 0x0240002,61002004
[ 35.194941] Memory Limit: none

nvidia
[ 79.235354] SError Interrupt on CPU2, code 0xbf000000 – SError
[ 79.235355] CPU: 2 PID: 2141 Comm: modprobe Tainted: G C 5.10.6 #0.9.8
[ 79.235356] Hardware name: Khadas VIM3 (DT)
[ 79.235357] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=–)
[ 79.235358] pc : nvkm_longopt+0x0/0xa8 [nouveau]
[ 79.235359] lr : nvkm_device_ctor+0x2c0/0x3230 [nouveau]
[ 79.235359] sp : ffff800012e937c0
[ 79.235360] x29: ffff800012e937c0 x28: ffff000042a9b000
[ 79.235363] x27: ffff800009037a70 x26: 0000000000000000
[ 79.235364] x25: ffff000042f548b8 x24: 0000000000000000
[ 79.235366] x23: 0000000000000000 x22: ffff80000902da50
[ 79.235367] x21: ffff000040fbcf80 x20: ffff8000090a9628
[ 79.235369] x19: 0000000000000000 x18: 0000000000000030
[ 79.235370] x17: 0000000000000000 x16: 0000000000000000
[ 79.235372] x15: ffff000040fbcf80 x14: 00000000000001ce
[ 79.235373] x13: ffff800010000000 x12: ffff80001118e0b8
[ 79.235374] x11: ffff80001118e0b8 x10: 0000000000000100
[ 79.235376] x9 : 0000000001000000 x8 : ffff00007f806000
[ 79.235377] x7 : ffff80001137ec24 x6 : 00000000000000c0
[ 79.235379] x5 : 000000404d000000 x4 : ffff800058000000
[ 79.235380] x3 : ffff800018ffffff x2 : 0000000000000000
[ 79.235382] x1 : ffff80000907e018 x0 : 0000000000000000
[ 79.235383] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 79.235384] CPU: 2 PID: 2141 Comm: modprobe Tainted: G C 5.10.6 #0.9.8
[ 79.235385] Hardware name: Khadas VIM3 (DT)
[ 79.235385] Call trace:
[ 79.235386] dump_backtrace+0x0/0x1d0
[ 79.235386] show_stack+0x18/0x60
[ 79.235387] dump_stack+0xd0/0x12c
[ 79.235387] panic+0x15c/0x330
[ 79.235388] nmi_panic+0x8c/0x90
[ 79.235389] arm64_serror_panic+0x78/0x84
[ 79.235389] do_serror+0x3c/0x68
[ 79.235390] el1_error+0x90/0x110
[ 79.235390] nvkm_longopt+0x0/0xa8 [nouveau]
[ 79.235391] nvkm_device_pci_new+0xec/0x288 [nouveau]
[ 79.235392] nouveau_drm_probe+0x58/0x1f0 [nouveau]
[ 79.235392] pci_device_probe+0xbc/0x178
[ 79.235393] really_probe+0xe8/0x4d0
[ 79.235393] driver_probe_device+0xf4/0x160
[ 79.235394] device_driver_attach+0x74/0x80
[ 79.235395] __driver_attach+0xa4/0x170
[ 79.235395] bus_for_each_dev+0x70/0xc0
[ 79.235396] driver_attach+0x24/0x30
[ 79.235396] bus_add_driver+0x140/0x220
[ 79.235397] driver_register+0x64/0x120
[ 79.235397] __pci_register_driver+0x44/0x50
[ 79.235398] nouveau_drm_init+0x178/0x1000 [nouveau]
[ 79.235399] do_one_initcall+0x54/0x1c0
[ 79.235399] do_init_module+0x54/0x208
[ 79.235400] load_module+0x1f34/0x25e8
[ 79.235400] __do_sys_finit_module+0xbc/0x128
[ 79.235401] __arm64_sys_finit_module+0x20/0x30
[ 79.235402] el0_svc_common.constprop.0+0x80/0x220
[ 79.235402] do_el0_svc+0x24/0x90
[ 79.235403] el0_svc+0x1c/0x50
[ 79.235403] el0_sync_handler+0xb0/0xb8
[ 79.235404] el0_sync+0x174/0x180
[ 79.235416] SMP: stopping secondary CPUs
[ 79.235416] Kernel Offset: disabled
[ 79.235417] CPU features: 0x0240002,61002004
[ 79.235418] Memory Limit: none

@danielt
what do I need to do when I want to add firmware ?
I am on ubuntu 20.04 are there any package with nvidia firmwares ?

Not really sure TBH.

I’m using Debian Bullseye. It’s quite a mature installation so I don’t
remember much about how the GT710 was setup. I think any firmware that
is needed comes from firmware-misc-nonfree.

@danielt
and what about pci-e 64bit kernel patch. Can you share it?
thx

Sorry, I’m afraid I’m not clear what patch you mean (not least because I’m currently running an unmodified upstream kernel).

from this article

there is a note about kernel patch that can solve it.

It looks like arm pci-e drivers and other device drivers will need some tune up.

As I said before, I really wouldn’t recommend assuming this is an arm (or designware) issue.

The problems on Developerbox are normally mitigated with a firmware fix to make the PCIe behaviour avoid known bugs in some hardware glue logic that specific to this SoC. The kernel patch is only useful for users who do not want to deploy the firmware level fix. It is unlikely to offer much benefit for other platforms unless they they independently developed a similar silicon bug (e.g. the inability to map PCIe memory with the cache enabled).

If you are interested the patch for Developerbox is here: daniel.thompson/linux.git - Personal repo for development and upstreaming. Contains significant work on NMI/FIQ.

… and to be clear I am not currently using this patch to get GT-710 working on my arm64 platform (because I am not working on a Developerbox at the moment).

thx for patch

@danielt I hoped that you and linaro are propagators for arm based CPU’s :slight_smile:
And we know that arm SOCs still has some hw and sw bugs

The best I can do here is share what is known to work well on Arm platforms (where I know this myself).

As it happens, when the SoC vendor has correctly integrated PCIe then GT-710 has been working on Arm64 for more than three years now! That’s not to say things are perfect… there are occasionally Arm specific bugs but the main reason for the problem isn’t really an Arm thing. Instead is is because PCIe (especially x1) is usually added to an embedded SoC design to support a WiFi or modem card and these are do not interact with the PCIe bus in the same was as a graphics card. That means SoC vendors simply do not consider the bugs with graphics cards to be significant since the intended functions work correctly.

This sucks if you are a hacker trying to enable advanced features to a system using the PCIe bus but these choice by vendors are often entirely economically rational. Fixing bugs in silicon is monumentally expensive (it’s a long time since I worked for a silicon company but we are talking at least 6 figures and perhaps even 7 these days). Thus respinning a chip to fix a bug that does not affect the system when sold for its intended markets is a waste of money.

In short, to learn whether PCIe is correctly implemented (and tested) on Khadas Vim3 you’d need to ask Khadas or Amlogic about how they tested it.

PS You are getting SError messages so it is worth trying the Developerbox patch… just don’t be too optimistic!