Rule 14. The use of scrapers, bots, or other automated posti

Thread replies: 30
Thread images: 1

Dwayne Adams
2016-04-01 22:58:15 Post No. 484709
[Report] Image search: [Google]

File: image.jpg (27 KB, 615x615) Image search: [Google]

Dwayne Adams 2016-04-01 22:58:15 Post No. 484709 [Report]

Rule 14. The use of scrapers, bots, or other automated posting or downloading scripts is prohibited. Users may also not post from proxies, VPNs, or Tor exit nodes.

So why the fuck are 4chan archive sites allowed to archive? Fgts.jp archives everything on 4chan almost and no one seems to give a shit. Shut that shit down MODS!

4chan works because images go away, if everything is archived somewhere else then that feature is fucking useless. Stopping crawlers is better for everyone in 4chan who actually contributes.

Why isn't this enforced?

Archives (small list):
http://fgts.jp
https://boards.fireden.net
archive.nyafuu.org
archive.rebeccablacktech.com
desustorage.org
arch.b4k.co
deploy.loveisover.me
http://archive.4plebs.org/_/articles/credits/

>>

Julia Mcknight 2016-04-01 23:01:51 Post No.484720
[Report]

Julia Mcknight 2016-04-01 23:01:51 Post No.484720 [Report]

Sheeky Forums.
Go for a walk.

>>

Myron Carson 2016-04-01 23:02:59 Post No.484724
[Report]

Myron Carson 2016-04-01 23:02:59 Post No.484724 [Report]

>>484709
Yes because we should really go back to the days of "archive this ebin bread only 5 more votes guys"

>>

Elsie Davenport 2016-04-01 23:07:04 Post No.484728
[Report]

Elsie Davenport 2016-04-01 23:07:04 Post No.484728 [Report]

>>484709
If they know these large scale scrapers exists, why don't they become more proactive in stopping them?

>>

Walter Henson 2016-04-01 23:07:48 Post No.484731
[Report]

Walter Henson 2016-04-01 23:07:48 Post No.484731 [Report]

>>484709
What can be done except suing them for violating the terms of 4chan's API?

>>

Billy Lynn 2016-04-01 23:11:41 Post No.484738
[Report]

Billy Lynn 2016-04-01 23:11:41 Post No.484738 [Report]

Because it turns out to be a useful feature, especially on the image dump boards. It preserves our history and increases the transparency of moderation because we can still see deleted posts.

>>

Luis Cunningham 2016-04-01 23:12:18 Post No.484739
[Report]

Luis Cunningham 2016-04-01 23:12:18 Post No.484739 [Report]

>>484728
moot was the only one against archives. Moot was the only one who emphasized the ephemerality aspect so much. I doubt hiro understands the problem.

2ch archives have always been a thing.

>>

Hazel Case 2016-04-01 23:17:08 Post No.484748
[Report]

Hazel Case 2016-04-01 23:17:08 Post No.484748 [Report]

>4chan works because images go away, if everything is archived somewhere else then that feature is fucking useless
Images on 4chan still go away and archives have been much more unreliable than 4chan, especially after foolz died

>>484728
>why don't they become more proactive in stopping them?
That's the purpose of the 7 days built-in archive, it deflects some traffic from other archivers that would otherwise appear first on a search engine ranking, with more hiroadvertising to increase profit.

>>

Guy Ritter 2016-04-01 23:17:18 Post No.484749
[Report]

Guy Ritter 2016-04-01 23:17:18 Post No.484749 [Report]

>>484739
>>484738
Archiving of 4chan is exactly the opposite of 4chan original goal. The boards are temporary for a reason. 4chan wipes it's server data, but if it knows that there are large scale scrapers out there, how can they harp on that point? Not only that, but it takes content and it also artificially increases site metrics and the costs.

Personal scrapers are one thing, but large scale collection definitely goes against the original purpose of 4chan.

>>

Bob Cummings 2016-04-01 23:19:32 Post No.484753
[Report]

Bob Cummings 2016-04-01 23:19:32 Post No.484753 [Report]

>>484748
Fgts.jp does a pretty decent job of archive /b, which has no archive.

>>

Pauline Cohen 2016-04-01 23:22:12 Post No.484757
[Report]

Pauline Cohen 2016-04-01 23:22:12 Post No.484757 [Report]

>>484749
I think the temporary nature of imageboards was always more of a pragmatic thing than a cultural decision.
It was more about bandwidth than anything else.

But that part got kinda lost in translation to moot. Hiro, on the other hand, knows how it really is.

>>

Brandi Jordan 2016-04-01 23:22:50 Post No.484759
[Report]

Brandi Jordan 2016-04-01 23:22:50 Post No.484759 [Report]

>>484731
They could change how images are show on the page. Instead of a straight href, they could use some server side control to fetch the image. Don't put the image url on the page explicitly.

>>

Frankie Hunt 2016-04-01 23:25:17 Post No.484764
[Report]

Frankie Hunt 2016-04-01 23:25:17 Post No.484764 [Report]

>>484757
It may have been pragmatic at the time, but it's become a hallmark of 4chan now. Especially /b. Most people go to /b think what they are posting is temporary, which we all know, isn't true. If 4chan would at least make it more challenging to scrape that would be a step in the right direction.

>>

Sallie Hoffman 2016-04-01 23:29:40 Post No.484771
[Report]

Sallie Hoffman 2016-04-01 23:29:40 Post No.484771 [Report]

>>484749
>the original purpose of 4chan.
Sharing weeb images?

>>484753
>Fgts.jp does a pretty decent job of archive /b
Until it doesn't anymore and all the data is lost or deleted for fear of illegal content, and it wouldn't be the first time they drop it.

>>484759
And force the servers to do even more shit

>>484764
Most people are stupid, nothing to see here

>>

Melissa Barry 2016-04-01 23:30:37 Post No.484774
[Report]

Melissa Barry 2016-04-01 23:30:37 Post No.484774 [Report]

>>484731
They could also make the expansion event to view an image only human clickable. So server side call to get the resource along with mouse click enforcement would get them pretty far. Google does this for extension installs in their App Store. The chrome.install api only allows the call to succeed if it's done by a legitimate mouse click. You can't use jquery, you can't dispatch an event, you can't use the document object. I've spent a lot of time trying to bypass that and it's pretty damn hard.

>>

Delia Huynh 2016-04-01 23:32:20 Post No.484782
[Report]

Delia Huynh 2016-04-01 23:32:20 Post No.484782 [Report]

>>484771
The servers can handle it. And you don't have to implement it across all board. Just boards they want to prevent scraping on.

>>

Simon Mercer 2016-04-01 23:40:04 Post No.484799
[Report]

Simon Mercer 2016-04-01 23:40:04 Post No.484799 [Report]

Too little too late, my friend. These archive sites are what the people wanted. You can't enforce a democratic issue.

The original 4chanarchive became legacy and died out because it didn't save -enough- threads and the process was too strict. Everyone complained about all of the awesome things they missed out on and the result is the dozens of mirror sites. Now they don't have to worry about missing a single "epic thread" and can stay up to date without having to confront that Anonymous guy who's kind of a prick because he doesn't share the same passion of running tired jokes into the ground that I have.

Now people only come here because of the archive. The archive goes down for a day or two and they lose their damn minds. If they all go down tonight, someone will get pissed off enough to make their own tomorrow. The damage is already done, you can't fix this shit now.

>>

Jordan Shelton 2016-04-01 23:56:36 Post No.484816
[Report]

Jordan Shelton 2016-04-01 23:56:36 Post No.484816 [Report]

>>484774
But that's not how server-side security works. If you can see it on the page, a bot can scrape it.

>>

Aurora Delacruz 2016-04-01 23:58:18 Post No.484820
[Report]

Aurora Delacruz 2016-04-01 23:58:18 Post No.484820 [Report]

>>484816
Not to mention full images are only a minuscule portion of what makes archives.

Most people care about text, then thumbnails, and full images last.

>>

Lindsey Haynes 2016-04-02 00:21:40 Post No.484837
[Report]

Lindsey Haynes 2016-04-02 00:21:40 Post No.484837 [Report]

>>484709
If you think about it, then every bad change of 4chan always came from outside sites that were too casual for the 4chan experience. From archives to extensions, it is all by and for people too normalfag tp browse 4chan.

>>

Lottie Rosales 2016-04-02 02:33:01 Post No.484996
[Report]

Lottie Rosales 2016-04-02 02:33:01 Post No.484996 [Report]

>>484816
well, to start, you can prevent full size images from being scraped by requiring a function call to show the image instead of just an href. As for the text and the thumbnail, that's a different story.

>>484820
I'd disagree with you there. Full size images make up the majority on content on image boards, that's why they're called image boards...

>>

Anonymous 2016-04-02 15:28:57 Post No.485602
[Report]

Anonymous 2016-04-02 15:28:57 Post No.485602 [Report]

>>484799
if this is true, then why does 4chan have an explicit rule stating it doesn't support archiving? Of all the anons on 4chan, a very small minority know that archiving takes place or where to find the archives.

I think archivers are so easy to write for 4chan because of the board structure. If 4chan just changes some of the code-behind to make things harder for scrapers but keep the experience, users won't even know.

Archivers defeat the purpose of 4chan and they need to be dealt with.

>>

Anonymous 2016-04-02 15:36:26 Post No.485608
[Report]

Anonymous 2016-04-02 15:36:26 Post No.485608 [Report]

>>484749
>original 4chan goal

the third news post has moot saying he wanted to implement an archive

ephemerality in terms of long term access of previous threads helps with absolutely nothing, a pointless tradition.

Whether you have off-site archives or not people will still forget their history and repaplace the gaps with invented answer

We have had almost 100% being archive since 2010 and nothing has changed.

>>484771
iirc bibanon has uploaded ftgs back ups to the internet archive

>>

Anonymous 2016-04-02 15:40:23 Post No.485609
[Report]

Anonymous 2016-04-02 15:40:23 Post No.485609 [Report]

>>485602
The rule was originally coined before archives where widespread

it was mainly designed to give a legitimate ban reason to spambots before captcha. Downloading scripts and scrappers are in fact virtually impossible to trace and only added because moot didn't think his rules through. It hasn't been reworked yet because the mods are dumb, "ideologically" opposed to them with no real practical reason

>>

Anonymous 2016-04-02 17:54:30 Post No.485677
[Report]

Anonymous 2016-04-02 17:54:30 Post No.485677 [Report]

>>485609
they may be hard to differential from normal users, but simple changes can prevent them from being effective. Not using href are a good first step.

>>

Anonymous 2016-04-02 20:18:56 Post No.485778
[Report]

Anonymous 2016-04-02 20:18:56 Post No.485778 [Report]

>>484749
>artificially increases site metrics and the costs.
>metrics
No, the big scrapers just use the JSON api which doesn't have any metric tie ins.
>costs
Scrapers are cheap as fuck in comparison to the userbase.

>The boards are temporary for a reason.
Because 4chan can't afford the space and bandwidth it would require.

>>

Anonymous 2016-04-02 20:25:39 Post No.485783
[Report]

Anonymous 2016-04-02 20:25:39 Post No.485783 [Report]

>>484782
No they can't. The servers can't even handle generating pages on the fly, much less routing and checking up to 250 images fetched on every page fetched. Chans software is built around the idea "the servers can't handle doing anything".

>>

Anonymous 2016-04-02 21:13:19 Post No.485811
[Report]

Anonymous 2016-04-02 21:13:19 Post No.485811 [Report]

>>485778
they may not have any metric tie-ins, but json api calls = server processing = costs.

>>485783
4chan creates dynamic pages all the time, and if the json api had some throttling on it, there would be less load on the servers.

>>

Anonymous 2016-04-02 22:02:09 Post No.485836
[Report]

Anonymous 2016-04-02 22:02:09 Post No.485836 [Report]

Stop complaining, we alredy have most boards threads archives for 7 days after they reach theor end, do you argue this is also against the spirit of 4chan? Well I dont think it is, and its a hell of a lot more convenient than it used to be when threads would 404 immediately after they reach final page and if they weren't on any archive or there werent any archives around back then and you didnt have a backup page you didnt refresh still open you were screwed.

>>

Anonymous 2016-04-02 23:22:45 Post No.485889
[Report]

Anonymous 2016-04-02 23:22:45 Post No.485889 [Report]

>all this newfag

These have been around since 2008, and in fact Moot himself offered assistance to archivers when the new HTML changes came around in 2012