Let's say I want to start an image hosting service.
Should I go for SSD or HDD option?
>>54685028
It don't matter
None of this matters
>>54685028
Use the cloud
>>54685047
explain
hdd if you need the space, popular images will be cached in ram, so will be served quickly
if the ssd is big enough, then it'd be better
>>54685087
The problem I have here is that I don't know how many people will use my service and I'm not sure how to predict it. Therefore I have no idea how much space do I really need.
I'm considering buying 20 GB SSD vs 50 GB HDD, but not sure if the first option is enough.
>>54685231
I can only estimate that average user will send ~60 kb files (it's a very specific image hosting service) and might do it around 5-10 times.
>>54685028
I have always wondering if image hosting sites hash their images to prevent hosting duplicate files and save space.
It might be a little more overhead but the disk space saved would probably be worth it.
>>54685231
i'd probably go for the hdd, consider an ssd later if it turns out to be too slow
commonly-accessed images will be cached by your OS in memory, so will be served without fetching from disk
unless you have a lot of visitors at any instance, it's better to have more space than worry about IO latency
you can always change it later on if need be
Op ll host porn?
Use SSD as cheap "ramdisk" (store most popular images) and HDD as major backbone.
This solves the issue of space and speed.
>>54685368
even 4chan does this
there's many ways to do it, like-
- purely frontend, you have a list of images and hashes in a database, and only store unique files, pointing identical images to the same file
- filesystem-level, some filesystems like btrfs and zfs support deduplicating entire volumes, seperate to the userspace (if you have two identical files on disk, they only use the space of one copy)
- something as simple as a shell script that periodically hashes files and replaces copies with symlinks
>>54685453
i imagine op is getting this choice from a VPS vendor, he might not have the option to pick both
though yes, if you're physically making a server, using an ssd as a cache to a hdd/raid backend is a smart way to make the most of both technologies
>>54685496
OP here, you are correct. It's VPS SSD vs VPS HDD
>>54685472
The way i would do it is have a temp upload directory and have a program that hashes the files in that directory and delete duplicates from it. If the hash is not found i would have the cgi program redirect the users browser to the existing image url seemlessly.
That would be much cleaner than fooling with symlinks.
Or even better have js hash the file on the client side then the server only has to verify the hash. If it exists no upload is needed.
>>54685844
>Or even better have js hash the file on the client side then the server only has to verify the hash. If it exists no upload is needed.
some services do this as well, it's an advantage for both parties bandwidth-wise, but uses more cpu/memory resources on the client browser
this would also imply you're using a database mapping 'uploads' to stored files, unless you outright refuse identical files (which might not be a good idea, there's several cases where one might want to the same file with different metadata, such as upload date, filename, ID, or if it's in a particular collection, if that's to be a feature)
>>54686013
to give you an example of what i'm thinking with a database;
http://hastebin.com/awakekubup.avrasm
(4chan spam filter)
>>54686185
(oh, you should probably use something better than md5 if your server has a good cpu)
>>54686013
Wouldn't altering the metadata change the hash?
>>54685395
This, use HDD. It's better to have a slightly slower service that works than have your blazing fast service spit out "no space" errors.
>>54687393
depends on which metadata you're talking about
metadata relating to the filesystem or your web service won't affect the file contents (and therefore, the file hash)
only metadata relating to the file format itself will, such as gif comments, jpeg EXIF, mkv titles, etc, etc