[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Why is no one man enough to archive /b/?
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 27
Thread images: 2
File: 1441082104170.png (1 MB, 595x1274) Image search: [Google]
1441082104170.png
1 MB, 595x1274
Why is no one man enough to archive /b/?
>>
some researchers at MIT did, but storing the images exposed them to basically all of the legal dangers a researcher that studies online communities could be exposed to.

i forgot how they handled it, but someone later (or maybe them, i'm not sure) just stored the text. even that might be dicey though.
>>
>>54797577
>Comments are owned by the Poster.
You could sue them
>>
Nothing of value was lost
>>
There must be at least one non public archive of b, and it belongs to the FBI, I think.
>>
>>54797611
Don't forget the sheeky forum
>>
>>54797604
Downloading comments from 4chan for the purposes of research fall under "fair use" by *at least* two separate lines of reasoning. to say nothing of the fact that proving copyright claim to a comment on 4chan would be (nearly*) impossible.
>>
>>54797565
Why would you want to archive /b/?

Out of all the boards /b/ makes the least sense to archive.
>>
>>54797565
The most recent one that archived, fgts.jp just got shutdown by their provider because of failure to delete CP in time, of course the owner sustained other boards, but /b/ is the most prominent when it comes to CP posting
>>
You'd need a team of moderators big enough and active 24/7 whose job would be to check every single post at a rate of like 20 posts every second.
>>
There's nothing to stop anyone from archiving anything. Just don't make it publicly accessible, or privately accessible to anyone else for that matter.
>>
>>54797700
Why not just autodelete everything deleted by mods and let them do the heavy work?
>>
There has been a couple of archives doing exactly that but dropping it after just a few weeks.
>>
>>54797565
> archive /b/
Is this a joke? The board has been only porn dumping for over 5 years.
It is dead and buried, it will never yield anything interesting anymore, courtesy of normies. And even for them I don't get the appeal of the board, is it to feel like pseudo-outcasts or something? I would even delete it if it wasn't such a symbolic board, all the people there would probably quit 4chan forever never to come back, it's not even a containment board.
>>
>>54797731
not the guy you asked, but since i know the researchers who were at MIT. the gist is that the mods aren't perfect and sometimes stuff just disappears for myriad reasons (the most benign of which being that the poster deleted their post, or the thread fell off the edge of page 15)
>>
>>54797757
I assume it's just 12 year old boys who aren't smart enough to find /gif/.
>>
There isn't anything *worth* archiving on /b/.
If there ever is, it can be archived on a thread-by-thread basis.
>>
>>54797896
and how do you determine what's worth archiving as the threads are passing by?

pretty much every sane approach to archiving threads depends on keeping track of all of them and then doing *something* to determine what to discard. waiting for cues to tell you to download a thread (being conservative rather than greedy) is the worst way to go for mining.
>>
>>54797955
Mining? Pardon me but TOP KEK!

>>54797793
Give us an insight of what your fellow colleagues extracted from that library of Alexandria that /b/ is. That is if you know.
>>
>>54797955
I'd agree with you for every board apart from /b/.

Scraping /b/ in its entirety for the occasional good thread is like collecting all of the sewer waste to filter for money people accidentally flush down the toilet.
It's generally not worth it.

I liked the chanarchive approach to thread archival, particularly with regards to /b/.
Users could suggest threads for archival, and they were then voted on their merits for a week or so after it 404'd.
At any point in time, the thread could be collected as a zip file for personal archives (say if the thread had value to you but not to the archive in general).
You ended up with an archive where most threads were suggested for archival, but because you could be backtraced if you suggested CP be archived, it was a lot less likely to be archived.

If someone (semi competent, the original had database issues) wanted to make a new version of chanarchive, I'd be all for it.
>>
>>54797955
You could build up a dictionary of shitposting terms, and then analyse each comment for a probability of being a shitpost. If a thread reaches a certain ratio of shitposts/normposts, it gets canned.
>>
>>54798025
>chanarchive
Those where the days.
>>
>>54798071
You have multiple problems there.
First you need to define what shitposting really is with quantifiable data.
You would have a lot false positive/negatives with a ratio and dictionary method.
Even with pattern recognition, shitposting style evolve fast it wouldn't work.
>>
>>54798149
Well, as shitposting evolves, you'd simply need to pick up common key words and phrases shitposting contains that aren't currently flagged as shitposting, and tag them as potential emerging shitposts.

As for what constitutes shitposting, on /b/ I don't think there really are false positives, just false negatives.
>>
>>54797565
There is, it's called sheecky forums or something like that. Guy runs a clickbot that pulls posts from various boards, grabs twitter handles and lets it run. He fakes the traffic and makes money. Pretty good idea really.
>>
>>54798176
> As for what constitutes shitposting, on /b/ I don't think there really are false positives, just false negatives.
Haha yes, thank you for the laugh.
>>
Literally give me a week.
Working shit out with a cute loli voice actor, then it's time to activate it.
Thread replies: 27
Thread images: 2

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.