• Soyweiser@awful.systems
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 days ago

    So, anybody know the regular linux commands to turn a pdf into markdown? I assume there is a simple command that does that for you, if there isn’t already a pdf2markdown.

    • flaviat@awful.systems
      link
      fedilink
      English
      arrow-up
      5
      ·
      3 days ago

      There cannot be such a thing since pdf does not structure its data. There is an extension to the standard that would let a program do it for you but nobody uses it (PDF/UA-1). (also pandoc is vibe coded now)

      • froztbyte@awful.systems
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 days ago

        yeah, my answer to this also used to be pandoc until they took the prompt unto their soul

        it’s deeply fucking frustrating

        • Soyweiser@awful.systems
          link
          fedilink
          English
          arrow-up
          2
          ·
          3 days ago

          That sucks so much. But thanks anyway everybody, my post was half shitpost, half serious. (And I know some things can’t be easily converted, (but my regexp to match xhtml script is almost complete).

          I’m a bit surprised (but not totally) there actually was a proper tool for it a bit, even if it is vibe corrupted now.