[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Rule 14. The use of scrapers, bots, or other automated posti
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /qa/ - Question & Answer

Thread replies: 30
Thread images: 1
File: image.jpg (27 KB, 615x615) Image search: [Google]
image.jpg
27 KB, 615x615
Rule 14. The use of scrapers, bots, or other automated posting or downloading scripts is prohibited. Users may also not post from proxies, VPNs, or Tor exit nodes.

So why the fuck are 4chan archive sites allowed to archive? Fgts.jp archives everything on 4chan almost and no one seems to give a shit. Shut that shit down MODS!

4chan works because images go away, if everything is archived somewhere else then that feature is fucking useless. Stopping crawlers is better for everyone in 4chan who actually contributes.

Why isn't this enforced?

Archives (small list):
http://fgts.jp
https://boards.fireden.net
archive.nyafuu.org
archive.rebeccablacktech.com
desustorage.org
arch.b4k.co
deploy.loveisover.me
http://archive.4plebs.org/_/articles/credits/
>>
Sheeky Forums.
Go for a walk.
>>
>>484709
Yes because we should really go back to the days of "archive this ebin bread only 5 more votes guys"
>>
>>484709
If they know these large scale scrapers exists, why don't they become more proactive in stopping them?
>>
>>484709
What can be done except suing them for violating the terms of 4chan's API?
>>
Because it turns out to be a useful feature, especially on the image dump boards. It preserves our history and increases the transparency of moderation because we can still see deleted posts.
>>
>>484728
moot was the only one against archives. Moot was the only one who emphasized the ephemerality aspect so much. I doubt hiro understands the problem.

2ch archives have always been a thing.
>>
>4chan works because images go away, if everything is archived somewhere else then that feature is fucking useless
Images on 4chan still go away and archives have been much more unreliable than 4chan, especially after foolz died

>>484728
>why don't they become more proactive in stopping them?
That's the purpose of the 7 days built-in archive, it deflects some traffic from other archivers that would otherwise appear first on a search engine ranking, with more hiroadvertising to increase profit.
>>
>>484739
>>484738
Archiving of 4chan is exactly the opposite of 4chan original goal. The boards are temporary for a reason. 4chan wipes it's server data, but if it knows that there are large scale scrapers out there, how can they harp on that point? Not only that, but it takes content and it also artificially increases site metrics and the costs.

Personal scrapers are one thing, but large scale collection definitely goes against the original purpose of 4chan.
>>
>>484748
Fgts.jp does a pretty decent job of archive /b, which has no archive.
>>
>>484749
I think the temporary nature of imageboards was always more of a pragmatic thing than a cultural decision.
It was more about bandwidth than anything else.

But that part got kinda lost in translation to moot. Hiro, on the other hand, knows how it really is.
>>
>>484731
They could change how images are show on the page. Instead of a straight href, they could use some server side control to fetch the image. Don't put the image url on the page explicitly.
>>
>>484757
It may have been pragmatic at the time, but it's become a hallmark of 4chan now. Especially /b. Most people go to /b think what they are posting is temporary, which we all know, isn't true. If 4chan would at least make it more challenging to scrape that would be a step in the right direction.
>>
>>484749
>the original purpose of 4chan.
Sharing weeb images?

>>484753
>Fgts.jp does a pretty decent job of archive /b
Until it doesn't anymore and all the data is lost or deleted for fear of illegal content, and it wouldn't be the first time they drop it.

>>484759
And force the servers to do even more shit

>>484764
Most people are stupid, nothing to see here
>>
>>484731
They could also make the expansion event to view an image only human clickable. So server side call to get the resource along with mouse click enforcement would get them pretty far. Google does this for extension installs in their App Store. The chrome.install api only allows the call to succeed if it's done by a legitimate mouse click. You can't use jquery, you can't dispatch an event, you can't use the document object. I've spent a lot of time trying to bypass that and it's pretty damn hard.
>>
>>484771
The servers can handle it. And you don't have to implement it across all board. Just boards they want to prevent scraping on.
>>
Too little too late, my friend. These archive sites are what the people wanted. You can't enforce a democratic issue.

The original 4chanarchive became legacy and died out because it didn't save -enough- threads and the process was too strict. Everyone complained about all of the awesome things they missed out on and the result is the dozens of mirror sites. Now they don't have to worry about missing a single "epic thread" and can stay up to date without having to confront that Anonymous guy who's kind of a prick because he doesn't share the same passion of running tired jokes into the ground that I have.

Now people only come here because of the archive. The archive goes down for a day or two and they lose their damn minds. If they all go down tonight, someone will get pissed off enough to make their own tomorrow. The damage is already done, you can't fix this shit now.
>>
>>484774
But that's not how server-side security works. If you can see it on the page, a bot can scrape it.
>>
>>484816
Not to mention full images are only a minuscule portion of what makes archives.

Most people care about text, then thumbnails, and full images last.
>>
>>484709
If you think about it, then every bad change of 4chan always came from outside sites that were too casual for the 4chan experience. From archives to extensions, it is all by and for people too normalfag tp browse 4chan.
>>
>>484816
well, to start, you can prevent full size images from being scraped by requiring a function call to show the image instead of just an href. As for the text and the thumbnail, that's a different story.

>>484820
I'd disagree with you there. Full size images make up the majority on content on image boards, that's why they're called image boards...
>>
>>484799
if this is true, then why does 4chan have an explicit rule stating it doesn't support archiving? Of all the anons on 4chan, a very small minority know that archiving takes place or where to find the archives.

I think archivers are so easy to write for 4chan because of the board structure. If 4chan just changes some of the code-behind to make things harder for scrapers but keep the experience, users won't even know.

Archivers defeat the purpose of 4chan and they need to be dealt with.
>>
>>484749
>original 4chan goal

the third news post has moot saying he wanted to implement an archive

ephemerality in terms of long term access of previous threads helps with absolutely nothing, a pointless tradition.

Whether you have off-site archives or not people will still forget their history and repaplace the gaps with invented answer

We have had almost 100% being archive since 2010 and nothing has changed.

>>484771
iirc bibanon has uploaded ftgs back ups to the internet archive
>>
>>485602
The rule was originally coined before archives where widespread

it was mainly designed to give a legitimate ban reason to spambots before captcha. Downloading scripts and scrappers are in fact virtually impossible to trace and only added because moot didn't think his rules through. It hasn't been reworked yet because the mods are dumb, "ideologically" opposed to them with no real practical reason
>>
>>485609
they may be hard to differential from normal users, but simple changes can prevent them from being effective. Not using href are a good first step.
>>
>>484749
>artificially increases site metrics and the costs.
>metrics
No, the big scrapers just use the JSON api which doesn't have any metric tie ins.
>costs
Scrapers are cheap as fuck in comparison to the userbase.

>The boards are temporary for a reason.
Because 4chan can't afford the space and bandwidth it would require.
>>
>>484782
No they can't. The servers can't even handle generating pages on the fly, much less routing and checking up to 250 images fetched on every page fetched. Chans software is built around the idea "the servers can't handle doing anything".
>>
>>485778
they may not have any metric tie-ins, but json api calls = server processing = costs.

>>485783
4chan creates dynamic pages all the time, and if the json api had some throttling on it, there would be less load on the servers.
>>
Stop complaining, we alredy have most boards threads archives for 7 days after they reach theor end, do you argue this is also against the spirit of 4chan? Well I dont think it is, and its a hell of a lot more convenient than it used to be when threads would 404 immediately after they reach final page and if they weren't on any archive or there werent any archives around back then and you didnt have a backup page you didnt refresh still open you were screwed.
>>
>all this newfag

These have been around since 2008, and in fact Moot himself offered assistance to archivers when the new HTML changes came around in 2012
Thread replies: 30
Thread images: 1

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.