[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Computer Science
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /sci/ - Science & Math

Thread replies: 7
Thread images: 1
File: 2016-01-12-185924_752x799_scrot.png (107 KB, 752x799) Image search: [Google]
2016-01-12-185924_752x799_scrot.png
107 KB, 752x799
How does one get a good understanding of preprocessing data before starting to think about neural network architecture, etc?

Is there a checklist or something? I guess there's imputation if needed, converting categorical to numerical, then... I look for correlations (correlation matrix) and maybe for mutual information (to check for non-linear correlations) but what else? I don't know, is there a complete guide for this?

Also, computer science general
>>
>http://blog.kaggle.com/2016/01/04/how-much-did-it-rain-ii-winners-interview-1st-place-pupa-aka-aaron-sim/

>mfw random physics guy jumps into ML and gets #1
>>
>>7779060
fuck NNs, bayesian program learning BTFO deep learning: http://science.sciencemag.org/content/350/6266/1332.full
>>
>>7779090
nice paywall kike
>>
>If I were to take one point away from this contest, it is that the days of manually constructing features from data are almost over. The machines will win. I experienced this in the Plankton classification contest where the monumental effort that my teammate and I put into extracting image features was eclipsed within minutes by even the shallowest of CNNs.
>>
>>7779060
That basically means you have to learn the field you are trying to do learning on.

>>7779172
People in general don't bother reading it if it's behind a paywall. Also the machines won't win if you don't have a method of selecting relevant training data. Any machine learning method could fail if you train it using the wrong data. Manually selected features could be used to disqualify the worst training data to avoid ruining the network.
>>
>>7780343
>not being part of a group that provides access to all papers you want
Thread replies: 7
Thread images: 1

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.