• 0 Posts
  • 16 Comments
Joined 8 months ago
cake
Cake day: May 21st, 2025

help-circle


  • A GPU bench might raise temps in a way that would cause the problem to recur, but I’m not sure you’d see anything without doing something to get data flowing to the drive at the same time, so maybe try running the GPU bench and at the same time run sudo dd if=/dev/{your drive} of=/dev/null bs=1M status=progress (just pull data from the drive and write it to nowhere, but be careful about the of and if or you might overwrite your whole drive), and while those are going, run sudo dmesg -w in another terminal and watch for the same error you were getting before. If you don’t get errors, the problem was probably just some power state problem that the kernel parameter fixed. But I have to tell you, unfortunately, that the presence of the error under windows is a bad sign that points to a hardware problem, so I don’t feel very hopeful. Independent of all the other suggestions, could you try running sudo nvme smart-log /dev/{your drive}? That might give you some data.



  • I’d wager a toe from my left foot that if you look in the Event Viewer on windows you will see similar looking errors (though not as descriptive, no doubt, it might say something like “corrected read error” or something obtuse instead), this is a hardware issue that linux tends to be more aggressive in handling. These errors are on the physical layer and data link layer, so it is likely a communication problem between the drive and the motherboard, but interestingly, they are corrected on retry, so the data the system is calling from the drive is fine even if it sometimes fails to get there in time. This screams electrical connection to me, either thermal expansion is making the contacts wonky (and they might not be seated perfectly), there is a flaw in the traces somewhere, or there is some power management issue affecting your PCIe bus. Can you try running it with one more kernel parameter? Under pcie_aspm=off add nvme_core.default_ps_max_latency_us=0 and watch dmesg while running something heavy.












  • “fucking around in the South China” = sailing in international waters as defined by the UNCLOS, to which China is a signatory

    “US keeps trying to use the Taiwan situation as a wedge issue” = maintaining the status quo in the face of repeated, overt invasion threats

    “desperately want a proxy war” = can’t allow an oligo-fascist state to seize control of the single-path source of chips that enable modern life

    “no western news coverage of this” = I don’t look at the news