Hey folks!
I am looking for feedback from active lemm.ee users on what you all value when it comes to images on Lemmy. I’ll go into a bit of detail about what our options are, and then I would ask you to voice your opinion about the issue in the comments.
First, some context for those who don’t know. Lemmy software can be configured to handle images in three different ways:
- Store images locally - whenever an external image is posted somewhere, lemm.ee will download a permanent local copy. When you view posts, you are seeing our local copy of the image.
- Proxy all images - similarly to the first option, lemm.ee will download a local copy of external images, however, this copy is temporary. It will be automatically deleted shortly after, and if users open the relevant post/comment again in the future, there will be another attempt to download a temporary copy at that point.
- Pass through external images directly - lemm.ee never downloads any external images, users will always connect directly to the source servers to load the images.
There are pros and cons to each configuration.
Storing images locally
Benefits:
- Your IP address is never leaked to external image hosts, as you never connect directly to the source server. External image hosts only see the IP address of the lemm.ee server.
- External servers don’t become bottlenecks for opening lemm.ee posts. If an external server is slow, it won’t matter, because the image is always available locally
Downsides:
- As time goes on, our storage will fill up with hundreds of gigabytes of useless images, most of which will never be viewed again after the relevant posts fall off the front page.
- Many big external image hosts will rate limit bigger Lemmy servers, causing broken images when we fail to make a local copy.
- Crucially: some people love to spend their time uploading illegal content to online servers. There are tools to try and filter out such content, but these are not perfect. The end result is that there is a high chance of some content like this inadvertently reaching lemm.ee storage and staying there permanently. This downside is why lemm.ee has not, and will not, use this particular configuration.
Proxying images
Benefits: In addition to the same benefits as exist for the permanent local storage, by only temporarily making local copies for the moment they are requested by our users, we free up a ton of storage & remove the risk of permanently storing illegal content on our servers.
Downsides: The key downside is that external rate limits hit us much harder, as we will be requesting external images far more often. This results in a lot of constant broken images on lemm.ee.
Passing through external images
Benefits:
- Images are rarely broken, unless the source server goes down.
- The images never touch our servers, removing a lot of risk with illegal content as well as with storage costs.
Downsides:
- Our users lose a degree of privacy. Every external image that is loaded on your browser will result in the remote server getting a request directly from your computer to fetch that image - this is pretty much the same as you had visited that external server directly, which lets them log your IP address if they wish.
- When remote servers are slow, it can slow down the entire page load in some cases.
Current situation
Initially, lemm.ee was using the third option of passing through images. Ever since support for option 2, image proxying, was implemented in Lemmy code, we immediately switched to that option, mainly for the privacy benefits. However, after many months, and being blocked by more and more external servers, it is clear that image proxying is seriously degrading the user experience on lemm.ee. We often end up with broken images, and our users have to deal with the results.
I still believe image proxying is a really valuable feature, but I am starting to believe it is a better fit for small instances which make much less requests to external servers.
As a result, I am now seriously considering switching back to the previous method of passing through external images.
This is where you come in - I would ask you as users to please let me know which do you value more: the privacy that you get from image proxying, or the better user experience you get from directly passing through images from their source. Please let me know in the comments how you feel. If I get enough feedback about people being against image proxying, then I will be switching it off for lemm.ee soon. Thanks for reading & sharing your thoughs, and I hope you have a great weekend!
Wow, thanks for the full transparency. You are awesome!
My opinion would be option 2 (proxy requests) , but with a higher cache TTL or simple a LRU (Least Recently Used) Cache.
If you’re getting throttled, it could be mitigated by increasing the cache retention period (or improving the cache hits).
Another improvement : Would it be possible to change the proxy, so that if the proxied requests are throttled, it simply sends the user a http-302 to the origin (instead of a broken image)?
Regarding option 1 (full cache) : I greatly appreciate your desire to hide/protect your users ip, but it is outside the scope of what I expect from a Lemmy server. Maybe you could market and upsell this increased privacy as a subscription based feature. However, if I want privacy - I’ll use a VPN.
Regarding option 3 (User fetches content from origin) : From a users perspective, I really don’t want my Lemmy experience to be based on hitting a bunch of (potentially) unreliable services. When I, as a lemm.ee User, request a post from Lemmy.world (for example), lemm.ee will proxy and cache that post and the comments. This is the distributed nature of Lemmy (as far as I understand). Why restrict this caching to just posts/threads/comments and not include images (which, let’s face it, are as meaningful as pure text - especially wrt memes).
I’ll wager “no” to your question. That sounds like something the Lemmy codebase itself would have to implement, not smething that’s just configurable.
It’s sad, but I think you’re right.
I assumed/hoped that Lemmy’s architecture was more decoupled.
According to the ChangeLog, it hints that the image reverse proxy is built-in, maybe using Pict-rs.
Which certainly reeks of Not Invented Here Syndrome, as image uploading/storing, reverse proxies, and caching is a well understood problem.