[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Let's say I want to start an image hosting service. Should
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 21
Thread images: 1
File: 1361712463586.jpg (44 KB, 500x442) Image search: [Google]
1361712463586.jpg
44 KB, 500x442
Let's say I want to start an image hosting service.

Should I go for SSD or HDD option?
>>
>>54685028
It don't matter
None of this matters
>>
>>54685028
Use the cloud
>>
>>54685047

explain
>>
hdd if you need the space, popular images will be cached in ram, so will be served quickly
if the ssd is big enough, then it'd be better
>>
>>54685087

The problem I have here is that I don't know how many people will use my service and I'm not sure how to predict it. Therefore I have no idea how much space do I really need.

I'm considering buying 20 GB SSD vs 50 GB HDD, but not sure if the first option is enough.
>>
>>54685231

I can only estimate that average user will send ~60 kb files (it's a very specific image hosting service) and might do it around 5-10 times.
>>
>>54685028
I have always wondering if image hosting sites hash their images to prevent hosting duplicate files and save space.

It might be a little more overhead but the disk space saved would probably be worth it.
>>
>>54685231
i'd probably go for the hdd, consider an ssd later if it turns out to be too slow

commonly-accessed images will be cached by your OS in memory, so will be served without fetching from disk

unless you have a lot of visitors at any instance, it's better to have more space than worry about IO latency

you can always change it later on if need be
>>
Op ll host porn?
>>
Use SSD as cheap "ramdisk" (store most popular images) and HDD as major backbone.

This solves the issue of space and speed.
>>
>>54685368
even 4chan does this

there's many ways to do it, like-
- purely frontend, you have a list of images and hashes in a database, and only store unique files, pointing identical images to the same file
- filesystem-level, some filesystems like btrfs and zfs support deduplicating entire volumes, seperate to the userspace (if you have two identical files on disk, they only use the space of one copy)
- something as simple as a shell script that periodically hashes files and replaces copies with symlinks
>>
>>54685453
i imagine op is getting this choice from a VPS vendor, he might not have the option to pick both

though yes, if you're physically making a server, using an ssd as a cache to a hdd/raid backend is a smart way to make the most of both technologies
>>
>>54685496

OP here, you are correct. It's VPS SSD vs VPS HDD
>>
>>54685472
The way i would do it is have a temp upload directory and have a program that hashes the files in that directory and delete duplicates from it. If the hash is not found i would have the cgi program redirect the users browser to the existing image url seemlessly.

That would be much cleaner than fooling with symlinks.

Or even better have js hash the file on the client side then the server only has to verify the hash. If it exists no upload is needed.
>>
>>54685844
>Or even better have js hash the file on the client side then the server only has to verify the hash. If it exists no upload is needed.
some services do this as well, it's an advantage for both parties bandwidth-wise, but uses more cpu/memory resources on the client browser
this would also imply you're using a database mapping 'uploads' to stored files, unless you outright refuse identical files (which might not be a good idea, there's several cases where one might want to the same file with different metadata, such as upload date, filename, ID, or if it's in a particular collection, if that's to be a feature)
>>
>>54686013
to give you an example of what i'm thinking with a database;
http://hastebin.com/awakekubup.avrasm
(4chan spam filter)
>>
>>54686185
(oh, you should probably use something better than md5 if your server has a good cpu)
>>
>>54686013
Wouldn't altering the metadata change the hash?
>>
>>54685395
This, use HDD. It's better to have a slightly slower service that works than have your blazing fast service spit out "no space" errors.
>>
>>54687393
depends on which metadata you're talking about
metadata relating to the filesystem or your web service won't affect the file contents (and therefore, the file hash)
only metadata relating to the file format itself will, such as gif comments, jpeg EXIF, mkv titles, etc, etc
Thread replies: 21
Thread images: 1

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.