thermal problem? playback hangs when cputemp is only 48 C
I have been testing a LePotato board with no heatsink, decoding 4K HEVC and VP9 content with the latest CoreELEC image: CoreELEC-LePotato.arm-8.90.4.img.gz
I am monitoring the cpu temperature with cputemp
and after 10-20 minutes of playback when the core temperature reaches around 48 C the processor hangs, playback stops, and the ssh terminal session is dead. By the way, gputemp
always displays the same value as cputemp
.
If I put an external fan over the board, the core temperature stays around 32-35 C and the same clips will play in a loop all day. This does seem to be a thermal related issue.
Any processor will fail when the operating temperature is too hot, except in this case the temperature at failure is quite low. For example, the maximum rated operating temperature of an Intel Core i7 processor is around 67 C.
Using my temperature calibrated finger also tells me that this processor isn't very hot when playback fails. I have seen other processors working correctly when too hot to touch. The AM905X is only warm when the system fails. Sure I can put a heat sink and fan on it, but 48 C is too cool for failure IMHO and I want to understand what is going on.
My questions:
- What is the maximum core operating temperature for AM905X, as specified by Amlogic?
- Is this a thermal problem involving memory (which isn't hot to the touch at all)
- Has anyone else noticed this?
- Do I have a bad processor (that maybe should have failed manufacturing test) on my LePotato?
- Is
cputemp
inaccurate and reporting lower than actual temperature? (It reports the same info ascat /sys/class/thermal/thermal_zone0/temp
)
Thanks
Comments
It is an old kernel and most likely reporting a lower temp. Use a mainline kernel and see if the temperature matches. You need a heatsink in any production environment due to chip hotspots. There is no integrated heatspreader and the packaging isn't sufficient to distribute localized hot spots.
CoreELEC locks up quite a bit on Le Potato at this point in time. Most people reporting issues are pointing at a file transfer/network issue. It may be related to the network interface but one person who seemed to do decent testing was also seeing it with local transfers. I have seen temps over 120F (~48.9C) with no lockups unless there was network activity. I've gone back to my C2 for now since I'm unsure how to help. The issue appears to have been fixed in the mainline kernel but that is not yet what CoreELEC is using and it's been said that the patches are not directly portable. You can find more discussion here: https://discourse.coreelec.org/t/coreelec-8-90-3-le-potato-segmentation-fault-on-large-file-transfer/690/23
@adamg has said the root cause may be a hardware issue, even if there is a software fix. I'm sure it's on the CoreELEC radar but last I saw the Le Potato user share was pretty low so it's probably not their highest priority.
It's not a hardware issue. It's a software issue. They should have patched it in the latest CoreELEC builds. The network issue is not related to temperature.
I'm still having the same lockup issues in CoreELEC 8.90.4 and even a build of HEAD yesterday (same results in LibreELEC HEAD). Either it wasn't fixed or I face some other issue, but I haven't seen any mention they fixed it.
As for temp I am still seeing temps reported up to 122F (50C) without any lockups. That's why I wonder if ormike's issue is not due to temps or in the least that temp is not the root cause. I use the LoveRPI fan case and the lowest I see under moderate load is 115F with the fan on high so a thermal crash at 48C would be pretty un-amusing.
We are not familiar with the 3.x kernel used by CoreELEC to provide any helpful support in this regard. We haven't had any reported issues on mainline kernel but we will see once decoder support is added to mainline.
@hernano Muchas gracias por dar sus observaciones. En general, me gusta la idea de aislar problemas probando diferentes compilaciones, pero en este caso hay tantas diferencias entre LibreELEC 8.2.2.2 y CoreELEC 8.90.4 que estoy usando. Si viéramos un comportamiento diferente, no sería posible saber qué diferencia era la responsable. Pero he aprendido más sobre este problema. Por favor, lea los nuevos comentarios a continuación.
Thank you @hernano, @loverpi and @frerk for your comments; they very helpful. Here are my updated observations:
I switched playback from the 16 GB Class 10 microSD card to a Samba (smb://) file share over ethernet. I played the same files in a playlist loop for over 13 hours without any hang and observed cputemp as high as 54 C with no heat sink or fan.
I would like to change the title of this discussion to "LePotato hangs with large file access reading/writing to microSD card"
I have observed this problem now in 3 scenarios:
1. playing back MPEG-2 files captured from ATSC broadcast: these always hang within 5-10 mins when played from microSD card, even with a fan keeping cputemp < 35 C
2. scp files from host computer to root@coreelec:/storage/videos: hang observed multiple times after several GB transferred at 11 MB/s
3. playing back HEVC/mp4 (4 Mb/s) and VP9/webm (25 Mb/s) files from microSD card: spurious hangs seems to be prevented by cooling the processor or board.
No hangs observed when playing the same media files from a samba share over ethernet. The network interface seems to work fine.
conclusions/next steps:
The discussion on coreelec forum linked in @frerks comment is consistent with my observations (see link above).
I noted that the hang issue always involves the microSD file system in that discussion. In these recent CoreELEC builds I am not seeing any hang problems with network I/O that don't also involve the microSD file system
@frerk as soon as h/w decoding support is mature enough in mainline LePotato will receive a CoreELEC release with it, I'm somewhat reluctant to modify the 3.14 kernel any further as any time spent on it is just wasted, I think the same view is shared amongst the team as well, it's nothing but a headache.
I'm aware of the network transfer issues with the 3.14 kernel, unfortunately the newer stmmac ethernet driver can't be backported from mainline, initially I thought it may be hardware related after some testing I did but with reports of it working fine in mainline it's very confusing as this is the only device that we have this issue with and then I have also seen reports of the same issue when using a wifi adapter so this to me would suggest it's not network related at all.
I use the LePotato as my main development device for CoreELEC with the Loverpi heatsink and have not experienced any lockups when playing videos over the network but I have not tried playing from a microSD card, I will try installing CE to the eMMC later on and see if the same issues still exist.
@adamg Yeah, I just tried with a wifi adapter and still get consistent lockups just the same as with ethernet. I don't recall it ever happening while playing a movie, only when navigating the Movie list. All of my media is on an NFS share.
I have been running that release since 8.90.3 of corelec. Without an issue.
The segmentation fault is a result of the kernel trying to access a part of the memory that doesn't exist or that it doesn't have permission to.
I believe it would be a relatively easy fix then for this long standing issue. I could always pull the compiled device tree from the Android image, decompile and compare but the results from decompiled device trees sometimes differ from the original.
LePotato is what I primarily use for CoreELEC development and probably my most favourite device so I want this resolved just as much as everyone else.
We are nowhere near ready to use mainline either but LePotato will probably be the first device CoreELEC uses it for.