[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
The Archive Project - Help Request From the Community
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /h/ - Hentai

Thread replies: 46
Thread images: 8
File: panorama.jpg (467 KB, 1280x960) Image search: [Google]
panorama.jpg
467 KB, 1280x960
Dear /h/!

You might've run across the collections I made a couple of years ago:
http://sukebei.nyaa.eu/?page=torrentinfo&tid=63268

I want to continue this task, bringing a new, comprehensive "ARCHIVE" to the people.

I'm an OCD collector of English scanlated hentai and my collection strove to be not only comprehensive, but well organized and user friendly too, (through the inclusion of ample metadata).

With the recent Wani debacle I thought I'd once again take up the mantle of collecting and archiving... however I've come to the conclusion that the scope of the task has simply grown beyond what a single man can achieve, hence I turn to you for support.

As I'll outline below the scope of the task is so big that by the time I'm done with a fraction of what needs done, I'll be behind schedule on other parts... it's like trying to paint the Golden Gate bridge yourself.

My old collection, released in 2011 had about 6500 titles that were collected over two years. I've once again tried to catch up to what scanlators have released since then, and it's been a mixed bag:

I have ~2000 titles already downloaded, sorted, named, ready to go.

Got ~3800 bookmarks that are live and need to be downloaded/cataloged.
Got ~1400 bookmarks that are dead, but the sites are alive.
Got ~600 bookmarks, where the translator/scanlator site itself is dead.

When I finally tallied the numbers, I realized I need help, a "posse" that would help download, sort and catalog stuff and upload it to some central repository (maybe MEGA?) for fast processing and iterative release.

The question is, would you guys be interested in something like this? Do you have some advice or request of your own?
>>
>>3933220
>As I'll outline below the scope of the task is so big that by the time I'm done with a fraction of what needs done, I'll be behind schedule on other parts...
What are your difficulties?
What is the degree of automation? What parts are automated, what do you do manually? Maybe you just need someone writing some scripts for you...
Or do you have bandwidth problems?
>but the sites are alive
Do you simply download images from manga sites? Do you have other sources? Do you value quality or is every quality level fine? (For the case, that the site is serving only downscaled images..., Also an archive for the undefined future should always keep the best quality files.. See the internet archive..)
Maybe you should contact some site owner. If you can convince them, maybe they would give you some kind of direct access. (This would be better for the site hoster, since you would waste less traffic..) (I mean a manga online reading site, not some filehoster..)
I wish you best luck for your project and the treatment.
>>
>>3933259
My problem is related to my inability to automate the tasks:
Instead aggregation sites, like FAKKU, sadpanda, pururin or nhentai I typically check the blogs & websites of the translators themselves. Why? Because I want to get the highest quality available and aggregation sites may resize or watermark the files.

Translators use filehosting services like MEGA, etc. where bulk downloading is not possible.

Here is a pastbin of live bookmarks of stuff needing to be downloaded & sorted:
http://pastebin.com/83e0J2xN

My workflow is this:

1. Download & extract files.
2. Assemble tankoubons and serial works (e.g. continous stories released in magazines) into single folders.
3a. Name everything with this scheme: "franchise {author} - title [translator]"
3b. If missing, research the appropriate meta-data, using image-search and sources like doujinshi.org.
4. Zip the folders using a batch script.
5. Export file list to a text-file.
6. Convert the list to csv using some batch/vbscipts.
7. Inserting this into an Excel speadsheet*
8 Check for duplicates using some horribly convoluted Excel expressions.
9. For tanks & stories, look for individual chapters previously added to as mag-scans --> Create list of these
10. Once finalized, assign new ID numbers to files added to the collection.,
11. Write a new batch/bash scripts to facilitate migration of old versions of the collection:
11a. Move removed files to a specific folder (using the list written above)
11b. Update renamed files. (You *always* find mistakes in your old work)
12. Double-check everything, create torrent, upload stuff to a seedbox and release the torrent with the appropriate scripts *also* available on filehosts beside the torrent root itself.

*Actually a proper DB could be better, but I couldn't be arsed to convert this to tables.
>>
>>3933711
So I need help to:
a) Download stuff.
b) Research stuff. (Check metadata for new releases).
c) Organize stuff. (Check for duplicates, or presence of individual chapters from tanks in the collection as outlined in my previous post).
d) Assemble the release. (Some automation scripts to create the release scripts could help with this... heck, if this was the only hurdle I could just write a program for this in Java or C#).
e) Host & distribute the release. (The collection is starting to go above the 200 GB mark, so I'll have to switch to a different seedbox subscription, maybe switch to a server of my own).
>>
>>3933711
I think sadpanda doesn't resize (as in, doggie bag archive lets you download the full quality upload) neither watermark shit.
Same with FAKKU I think but it only works for the shit they publish.
>>
File: 2011_3.jpg (4 MB, 2830x4045) Image search: [Google]
2011_3.jpg
4 MB, 2830x4045
>>3933727
>doggie bag archive

See, that's the problem as unless one uploads stuff, you'll be hard pressed to have sufficient credits to just doggie-bag whatever you want.

*Granted, I've never had the patience to screw around with that browser game that's supposed to give you credits, but IIRC they've restructured how credits work recently. (Also, DL-ing older archive costs *more* now).

If some *reliable* method to download (doggi-bags) from sadpanda was available that could make my work a lot easier.
>>
>>3933756
Almost all "worth" works have torrents on sadpanda.
>>
>>3933756
UPDATE: Having read the ehwiki, it seems like the easiest way would be to run a H@H client... since I'd rather not have my desktop run 24/7 (since it's a beast with lots of peripherals) I'm trying to figure out how to do this on my OpenWRT router.
>>
File: 2301004214abcade340407.jpg (19 KB, 160x147) Image search: [Google]
2301004214abcade340407.jpg
19 KB, 160x147
>>3933798
Further update: ...darn, the damn thing uses Java. Things are getting super complicated if I want to run this on a low (computing) power device.

Anyway:
>>3933790
For a lot of things yes. Are there actual seeds?
Also, I simply want to archive everything /h/ that gets scanlated, not act as yet another gatekeeper or quality checker on the content.

To sum up my plea:
Even with the ability to DL from smg. like sadpanda, since that can't be automated I still NEED YOUR HELP an0n! Drop a line if you're interested in doing your patriotic duty to preserve hentai scanlations.
>>
>>3933220
After creating your archive, did you not really download much afterwards?

A lot of your links are several years old, some before your previous torrent.

I've always done something similar, downloading from the translator, but I tend to delete 90% of what I download.

I didn't do a comprehensive check, but there are probably various other groups you are missing as well.

DesuDesu and SaHa have batch torrents, if you haven't seen those. There are a few other batch torrents as well.

Then there are ones like: http://sukebei.nyaa.se/?page=view&tid=1101618

exhentai is several 10's of terabytes, but obviously that's not only translated.

If you can convince someone to send you credits/gp on ex then that could help, but you'd likely need literally millions upon millions for that.

If you are not averse to spending, you could have someone do the bounties for you and acquire 10's of millions that way.
http://g.e-hentai.org/bounty.php?act=tops
http://g.e-hentai.org/bounty.php?act=topt
http://g.e-hentai.org/bounty.php?act=tope


You could try asking the blog owners if they would cooperate, assuming they have the files. For the ones still alive anyway.

Honestly, I think you are too concerned with the extraneous work, which seems like it would take most of the time.

There are usually torrents on ex and sukubei has a bot that mirrors them as well. There is a problem of seeders though.

http://exhentai.org/torrents.php

But yeah, it would be good for you to be able to automate. Maybe you can get some help on /a/ or /g/? I don't know. Could certainly make a post about it in the sadpanda threads on /a/ if you haven't already.

I think if you actually got everything, it would be a lot larger than you think.

I don't how much you've previously done of it, but what you list as 9. ,trying to find which magazine scans go into which tankobon seems like a hell in of itself.

Most of it does by itself alone.
>>
Here's a very incomplete list of sites I compiled over the years in lazily casual manner. I know it's incomplete because you have several on your list that I don't have listed. If anything, this unsorted/messy/marginally useful list is probably demoralizing.

Not all are for english translations, but most are. Various are missing because I have no interest in various fetishes and others I simply don't know about or I looked at the site before I was keeping loose track of them in the ~15 years I've been casually downloading hentai.

As stated above, I've deleted 90%+ of what I've downloaded. If I kept everything, who knows how much I would have. I downloads tons of h-manga/doujinshi when #lurk was around and I lot since. I lost everything a few times due to carelessness/lack of back ups in the early years, but probably not in 10 years.

Some of this I haven't edited for years and others is recent, it's very patchwork.

http://pastebin.com/edupDmMq

Good luck, but I think you'll need to be less ambitious, especially if you can't find others to help. Even if they do, it may be difficult to get it all from them and/or redundant help.
>>
>>3935668
I forgot to mention that some people don't have a website and only upload to exhentai or in some cases, fakku. I've seen a few that do that, but I've been remiss in recording them, which is unfortunate since that would be one of the harder ones to find.
>>
>>3935661
Thanks for the feedback, I'll try to answer your questions:
>After creating your archive, did you not really download much afterwards?
As I wrote in the OP, I *did* download some more stuff, namely ~2000 titles.
(I was pretty active until 2013.06. when I decided to re-prioritize my life and focus on getting my comp-sci degree).
>batch torrents...
I know about them, hence why my "need to download" list only has stuff from desudesu *since* his compilation torrent. I haven't done a cross-reference with SaHa's latest torrent, but I did go through it in the past.
>Then there are ones like: http://sukebei.nyaa.se/?page=view&tid=1101618 - Compilation torrent of aggregation sites like hentai rules:
In the past I'd have said these are less then useful, however with the Wani masacre chances are, these contain stuff that's not hosted anywhere else... sorting through them though... yup, gonna be "FUN" (in the masturbating with a cheese grater kinda way).
>I don't how much you've previously done of it, but what you list as 9. ,trying to find which magazine scans go into which tankobon seems like a hell in of itself.
I've done this for all my releases. Frankly, all that needs doing is checking the stuff by the same author. The fact that my collection actually makes this "relatively" easy is why I insist on supplying proper meta-data for all titles.

>>3935668
Thanks for the list, I have a feeling I know about most of these sites, but I'll have to check first.
That reminds me, here are some more list of stuff I track:
Active Hentai Scanlation Sites - http://pastebin.com/955kL8Nv
Inactive/Dead Hentai Scanlation Sites - http://pastebin.com/P0LFZNM5

...and the bookmarked download links:
Live - http://pastebin.com/83e0J2xN
Dead (site live) - http://pastebin.com/NrGsXndJ
Dead (site dead) - http://pastebin.com/MBzdgAvD
>>
>>3933866
>Even with the ability to DL from smg. like sadpanda, since that can't be automated
afaik it can be automated, you just need a shitton of credits
>>
>>3935910
Well, the thing is about hentairules is that Oliver comissions stuff as well and puts it on his site. It's mostly aggregated, but yeah.

I don't think it's possible to collect them all, what with dead sites and private commissions, but maybe 75% could be achievable. Not that you would be able to tell since there certainly isn't a master list.
>>
http://exhentai.org/?f_doujinshi=1&f_manga=1&f_artistcg=0&f_gamecg=0&f_western=0&f_non-h=0&f_imageset=0&f_cosplay=0&f_asianporn=0&f_misc=0&f_search=language%3Aenglish&f_apply=Apply+Filter

I think should show only English translated doujinshi and manga.

Showing 1-25 of 30,906

Only course, that excludes the expunged/purged/hidden.

If add "show only galleries with torrents" it drops to 17,539.
>>
>>3935930
Even without downloading, you could scrape the titles and compare and see what's missing compared to what you have, both on the site and in your database. Whether that's a worthwhile thing to do, I don't know.
>>
>>3935931
Hentai sites (images, vndb..) have the best tags I've seen.
However, I always wondered why none actually introduced "unique", identifiable published items (as far as I have seen). So you have one item for a manga and all uploads are simply releases of the same unique item (with different languages, translators etc). That would make a lot stuff easier, especially finding translations and getting an overview of different releases.
(Including automatic sorting / identification as in this case.)
>>
>>3935943
I'm uncertain what you mean.
Something like this?
http://en.wikipedia.org/wiki/Digital_object_identifier
>>
I noticed you only have manga-updates links for some. Here are the sites:

http://pastebin.com/rpdtx3vz

If you have any questions about what I posted in this thread, let me know.
sirc's other site had seperate content as well. He also uploaded non-h english published scans, but that it isn't relevant.
http://web.archive.org might be useful. 4029 said invalid group ID, so I had to look it up on the archive. I used an add-on that restores all the removed information so that I didn't have to use the archive for all of them. Might be other removed groups as well.
>>
Oh, there are far too sites where it's the same people on different sites. Especially annoying when they don't transfer previous content. But, what can be done?
>>
>>3935970
>http://pastebin.com/rpdtx3vz
Thank you for your contribution! This could indeed be useful when combined with the web-archive.

(Though IIRC, I got the content off most of those sites *before* they went belly up)

BTW: Updated my pastebins, since I've managed to fix the biribiri links and added some more active links.
http://pastebin.com/u/Flaser
>>
As for littlewhitebutterflies (LWB), they have a batch torrent, but it's 3 years old. In the comments they say they might get around to a new one, so maybe if you asked. Otherwise, they simply dumped everything from the previous site, which is a lot of your dead links.

SaHa (Saya/Hazard, though I wonder if it's still both of them?) said they won't fix the dead links and the torrent was the only way they would provide them. Maybe could get that updated as well.

When I looked in the archive for your previous posts, I saw one where it was people posting about how much they had. Some had TBs, it would be good if people provided.

lustyladyproject shut down because of wani, as did some others.

For http://xcxscans.wordpress.com/, while their new website doesn't have it, they say all their stuff is at http://exhentai.org/uploader/XCX%2BScans
which is why I had it on the list.

cgrascal / oneofakind has http links of most of them, so shouldn't be problem.

There's so many in general.
>>
File: fuuuuuuck.png (407 KB, 800x800) Image search: [Google]
fuuuuuuck.png
407 KB, 800x800
I started archiving some shit starting today and fuck. Wasn't smart to put this off for a few years.

I support this cause.
>>
Hey everyone,
I'm looking for an h-comic. From what I was told the artist (whom I also don't know the name) is really into panties.

The h-comic as I recall it is basically a teacher inspecting all the girls in the class to make sure they're all wearing the 'correct' panties. Which are very lewd. One of the students either doesn't have the right ones or any at all and they all start having sex.

Anyone help an anon out?
>>
>>3936134
In the spirit of this thread, I'm going to help you out and assume it's this.

[Asaki Takayuki] Sho-Pan!! [English] {SaHa}
http://exhentai.org/g/793942/cb092561ab/

You're welcome. Consider helping, if possible, I guess.
>>
>>>/a/124511127
>>>/a/124511011
heh
>>
>>3933711
I'm intrigued op, I like to collect hentai as well, but I'm more of a raws collector. I helped collect most of the raw wani magazines when sadpanda started removing them.

I find jdownloader2 works for bulk downloading of mega links, you just copy the url and it'll pick it up.

Sadpanda does resize images when you're browsing the gallery. If you run a script that downloads the gallery image by image you'll get the resized stuff. A userscript that includes a downloader that seems to work is https://dnsev-h.github.io/eze/ It'll save you the gp costs but to get the orignal images you'll need the source nexus hath perk.

The only other surefire way to get the unresized gallery is to download the archives with gp or credits.
>>
>>3936279
Also I have got quite a lot of GP, so if you need help downloading off panda I might be able to help.
>>
>>3936279
Isn't collecting raws basically mean downloading magazines and whatever comes out from conventions?

Or really, simply downloading these:
http://sukebei.nyaa.se/?page=search&cats=7_0&term=pant.su&sort=5

Terabytes of RAWs right there.
>>
>>3933220
Godspeed.
>>
A week in and looks like nothing happened. Just saw the thread today. A pity. It's a nice idea, but it was dead on arrival at /h/.
>>
>>3938013
I happened to come down with smg nasty, hence why I was pretty inactive. IMHO the next step should be creating some place (forum/file-share) to help collaboration.
>>
>>3938670
check your gmail inbox
>>
>>3938013
Heh, can't you read?
It's a multi-year project.
A week is only 2% of a year.
>>
Saw this in the archives:
https://docs.google.com/spreadsheets/d/1_I-QQ94x3EDwF7NwWa1xcOsp26wXBRxTIxPQTtiGN6A/edit?pli=1#gid=1168538798

Torrents:
http://sukebei.nyaa.se/?page=view&tid=621284&showfiles=1
http://sukebei.nyaa.se/?page=view&tid=622468&showfiles=1

http://sukebei.nyaa.se/?page=view&tid=417985&showfiles=1

http://sukebei.nyaa.se/?page=view&tid=1359260

There's various others
>>
Translator specific torrents:
http://sukebei.nyaa.se/?page=view&tid=881610
http://sukebei.nyaa.se/?page=view&tid=196240
http://sukebei.nyaa.se/?page=view&tid=1255074
http://sukebei.nyaa.se/?page=view&tid=21779
http://sukebei.nyaa.se/?page=view&tid=507625
http://sukebei.nyaa.se/?page=view&tid=99715
http://sukebei.nyaa.se/?page=view&tid=109690
http://sukebei.nyaa.se/?page=view&tid=363029
http://sukebei.nyaa.se/?page=view&tid=1555534
http://sukebei.nyaa.se/?page=view&tid=86494
http://sukebei.nyaa.se/?page=view&tid=290327
http://sukebei.nyaa.se/?page=view&tid=769502

http://sukebei.nyaa.se/?page=view&tid=1359260
>>
>>3942576
>http://sukebei.nyaa.se/?page=view&tid=621284&showfiles=1
Shit, does anyone have this? I'm trying to download it but it doesn't really have any seeds.
>>
Figured this might be a good place to ask.

There was this (probably vanilla) hentai about a couple that got together with blind dating. The woman was from a family with traditional/conservative roots (wearing a kimono and stuff) and she would do things like try to understand the husband's hentai interests. Did it get purged with the whole Wani thing or something? Can't seem to find it again.
>>
>>3943416
[Tomohiro Natsuki] Your and Mine Chance Love! (COMIC Nakado 2013-09) [English] [Red Leaves]
>>
>>3945062
>COMIC Nakado
>Tomohiro Natsuki
Is this a ruse?
>>
>>3945761
Yes.
>>
>>3945821
4/10. I should have given up looking sooner.
>>
>>3945824
You're welcome.
I give you a 8/10.
Would ruse again.
>>
>>3933220
You are doing the lord's work sir. I salute you. o7
>>
>>3945976
It's the thought the counts, even if nothing happens, right?
Thread replies: 46
Thread images: 8

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.