[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Ok /sci/. Lets talk about nudity detection algorithms and how
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /sci/ - Science & Math

Thread replies: 35
Thread images: 5
File: 1451461440897.jpg (627 KB, 2048x1365) Image search: [Google]
1451461440897.jpg
627 KB, 2048x1365
Ok /sci/. Lets talk about nudity detection algorithms and how they work.

Currently, nudity detection seems to center around:
- Detecting skin by color.
- Attempts to detect faces, their size, and comparing it to the size of the skin blob extracted from the photo.
Checking if the size of the skin-blob is the size of a human, if its smaller then the human is probably dressed. If you couldn't find a face or the blob. The algorithm passes on.
-If the image comes from a webpage or other context with some text, scan the text for cues, also consider the source of the image
-Last resort is to try a detection pipeline like rcann to find genitals (why not run it first? Because its expensive (slow), and because it is not nearly as accurate as needed.

This is state of the art nudity detection /sci/ and its bullshit.

I think we can do better. This thread is about figuring out how.

In about two weeks time I'll implement all suggestions in this thread and give you a porn filter(or finder)

Categories for discussion include:
--How to collect training/test data.
--Manual Feature Engineering
--Model selection

If anyone wants to get on board I may even make a slack for a temporary sci working group that can grow into other projects in the future if we mesh well.
>>
>>7896603
[MODEL] Initially I want to go with boosted trees because its the easiest to implement. And because adding classifiers and branching criteria can happen over time without heavily impacting performance.
>>
>>7896603
[Data Collection] I'm thinking about running a 4chan and Reddit downloader exclusively SFW sections of 4chan and Reddit to produce a control categories the running the rest on the Exclusively NSFW sections.

Gifs and webms can either be disregarded or completely decomposed.into separate frames and added to our dataset.
>>
File: crouching-female-nude-1959.jpg (230 KB, 799x1000) Image search: [Google]
crouching-female-nude-1959.jpg
230 KB, 799x1000
>>7896615
deary me, I guess I'll have to corrupt your training data then


WE DON'T WORK FOR FREE!
>>
Post teen ass guyz
>>
>>7896603
>Currently, nudity detection seems to center around:
>- Detecting skin by color.
>- Attempts to detect faces, their size, and comparing it to the size of the skin blob extracted from the photo.
>Checking if the size of the skin-blob is the size of a human, if its smaller then the human is probably dressed. If you couldn't find a face or the blob. The algorithm passes on.
>-If the image comes from a webpage or other context with some text, scan the text for cues, also consider the source of the image
>-Last resort is to try a detection pipeline like rcann to find genitals (why not run it first? Because its expensive (slow), and because it is not nearly as accurate as needed.

None of this is necessary. I wrote a naked boob recognition algorithm using SVMs that performs reasonably well. I just gathered a bunch of pics of naked boobs along with a bunch of pics of other shit, scaled the data, labeled it, and fed it into my standard SVM engine. Works great. I use it to power a web crawler that I can start up and search for random tit pics. It annoys the crap out of my wife.
>>
File: image.jpg (11 KB, 256x144) Image search: [Google]
image.jpg
11 KB, 256x144
>>7896630
>>
>>7896603
>I think we can do better. This thread is about figuring out how.
NOT YOUR PERSONAL ARMY FAGGOT.
>>
>>7896694
>None of this is necessary. I wrote a naked boob recognition algorithm using SVMs that performs reasonably well. I just gathered a bunch of pics of naked boobs along with a bunch of pics of other shit, scaled the data, labeled it, and fed it into my standard SVM engine. Works great. I use it to power a web crawler that I can start up and search for random tit pics. It annoys the crap out of my wife.
Share please?
>>
>>7896676
>>
>>7896733
>>7896630

>acting like they can actually contribute
>>
I got blocked from posting and sending messages on Facebook for 72 hours for messaging a friend a picture of a baby's bloody circumsized penis. The person didn't even see the message. They are watching, I tell you.
>>
>>7896630
I like you.

>>7896603
Just use machine learning on your porn folder, anon. Don't pretend your porn folder isn't the largest data set of penises available.
>>
>>7896736
I'm in the process of wrapping the model in an Android app. I wonder if Google and/or Amazon will let me distribute the app through them? After all, we're only talking about titties here.
>>
PORN WANTS TO BE FREE
>>
>>7897908
How does it react to man nips though?

And what about gyno?

This is important. If you tried to run it on /fit/, it would break down completely.
>>
>>7898006
I admit that I get a lot of false positives with the man boobs, but less than you might think. Maybe it is the shape or the nipple placement. Who knows with SVMs... I'm sure I could improve on this if I added more examples of man tits to the training data set, but I'm not nearly so motivated to search for/cull/scale/label them as I am for the female variety. Plus, even running with the gpu enabled version of libsvm my training process is getting irritatingly slow.

I haven't tried training on a vag dataset. My wife is barely tolerant when walking into the room and finding me with 4 monitors full of tit pics. Full gyno might be crossing a line in our relationship.
>>
Lol op is a fagget cs. I am a mathematician and could write this in a day. It is below me as a math student tho.

Off to sticking pencils up my ass.
>>
>>7898587
Just keep on shoving pencils up your butt, you cuck
>>
>>7896694
This doesn't make sense. How are you preprocessing your data? What sort of features are you extracting?

Are you just pushing a vector of normalized RGBL at it? Or are you doing something fancier with boosting?
>>
>>7898481
I'm not understanding how SVMs apply to this. Can you go into details?
>>
>>7898672
>Are you just pushing a vector of normalized RGBL at it?

Essentially, this is all that I am doing. Nothing fancy. After cropping/scaling each image in the training dataset to a standard size (the smaller the better up to a point), I unfold the image data into a rather long vector. You can normalize the pixel values if you want, but I've found it is not necessary in this case because all of the pixel values will fall within the same well defined range. This is done to each image in the training set to form the matrix needed for libsvm for training the model. Originally, I did a very time consuming grid search to determine my parameters for svm. Now, I just use those for each new iteration of the training process. Not best practice, perhaps, but all I am doing is looking for tits, not making decisions for the financial industry. Once I have my model, prepping a test image to run through the decision engine is trivial.

Because I have to shrink the training images so much in order to make the model training run in a reasonable amount of time, the accuracy of the resulting model suffers. I get a lot of false positives and false negatives, but I get enough hits to keep me amused. I also tried converting all of the images to grayscale to shrink the dataset, but that didn't work very well. Apparently, the flesh tones are important.
>>
>>7898747
I would just love it for your tit-detection algorithm to somehow evolve into a true AI. He's got a bright future a head of hi.
>>
>>7896603
is this a real place jw
>>
>>7898747
I was building a haar cascade for tits. But your approach somehow seems more reliable.
>>
>>7898837
Yeah, I don't know. I've been working with SVMs for over 15 years, so they are my go-to tool for classification. I'm sure there are better approaches, but I know them very well, so that is what I use. The advent of the gpu has made things a lot better for the hobbyist.
>>
>>7896603
Why don't just gather a lot of undergrads, show them the pictures you want, and measure their boners?
>>
>>7897764
why the fuck would you send that
>>
>>7898587
This post made me laugh.
>>
>>7899084
You'd have to have some kind of screening test for engineering majors though.

Sounds costly.
>>
>>7896603
That is a fucking disgusting library, whoever designed it should be shot.
>>
>>7900402
I'm really digging the vibe of that libe.
>>
>>7896603
>>7896694
Why not use convolution neural nets? They are state of the art on almost all image classification and localization tasks, translation invariant and relatively easy to implement as a black box.

Feature engineering will probably turn out to be a waste of time. It's difficult, takes ages and convnets automatically learning features almost always perform better these days.

As for getting labeled data, maybe we could pay some money and use labelme?http://labelme.csail.mit.edu/Release3.0/browserTools/php/mechanical_turk.php
>>
>>7901130
I'm always open to new ideas. It's just that I already have a very high level of understanding of SVMs while I would classify myself at the "dangerous amateur" level with neural nets. What size of a training set are you talking about? I have a labeled dataset that I am working with that has ~5000 observations that are split about 50/50 tits/not tits. Also, what length of a feature vector would be reasonable to work with? With SVMs, you get to a certain point with the input matrix size where the quadratic optimization component gets really bogged down.
>>
>>7901182
>I'm always open to new ideas. It's just that I already have a very high level of understanding of SVMs while I would classify myself at the "dangerous amateur" level with neural nets

I have a little bit of experience with convnets(kaggle competitions) and get decent performance out of them. I imagine good enough to outperform a SVM, just because convnets are so well suited to image classification. What we could do is just train a SVM and a convnet and ensemble them.

>hat size of a training set are you talking about? I have a labeled dataset that I am working with that has ~5000 observations that are split about 50/50 tits/not tits.

5000 examples should be plenty, unless maybe there is an absolutely massive level of scale variation for the tits.

>Also, what length of a feature vector would be reasonable to work with? With SVMs, you get to a certain point with the input matrix size where the quadratic optimization component gets really bogged down.

Convnets actually take a 2d or 3d tensor as input, so like for a 24x24 color image they take a 3x24x24 array of rgb values. I'm not sure how well convnets scale to larger images, but I know that people have definitely trained convnets on 224x224 images, but that would probably take ages to train as I have a shit gpu, I'd prefer around 100x100, but since a critical feature(nipples) are pretty small, it might be necessary to use a high resolution.

Out of curiosity, how big of a matrix does it take to bog down a svm?
Thread replies: 35
Thread images: 5

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.