Do y’all know about textise? I don’t see mention of it come up in a quick search. https://www.textise.net/

It can be used with the duckduckgo bang !textise

It also works over Tor, where I can use it as a proxy to avoid Cloudflare checkpoints.

I don’t think that it is open source but not completely sure.

Copy from the site intro:

Textise is a new way of looking at the Web. It’s an internet tool that removes everything from a web page except for its text. In practice, this means that images, forms, scripts, adverts, they all go, leaving plain text. Find out more here… (https://textise.wordpress.com/about-textise/)

How to use this page

  1. Type or paste the URL of a web page into the box below and click “Textise”. A text only version of the web page will be displayed.
  2. Type a search term into the box, select a search engine from the drop-down list, and click “Search”. You will be taken to a text only version of the search results.
  • Kissaki@feddit.de
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    1 year ago

    Does textise support what Reader mode doesn’t? If reader mode can’t determine the central content, does textise have more logic to so so?

    Given the wording I also want to point out a website doesn’t have to actively explicitly support reader mode. They only have to follow html website standards marking their content - a general accessibility approach too.

    • deleted@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      1 year ago

      Technically, you’re correct.

      However, many websites doesn’t follow the appropriate HTML standards and just abuse h1 and p.

      I just tried it with Google.com and it seems to remove all html notations other than text.

      It useful in some cases such as wordpress one-page websites which have their story, mission, products, etc…