[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
How to into big data ?
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 19
Thread images: 1
File: StatisticalPatternRecognition.png (60 KB, 300x239) Image search: [Google]
StatisticalPatternRecognition.png
60 KB, 300x239
How to into big data

?
>>
1. Collect lots of data
2. Analyse it
>>
>>51910178
sql
>>
>>51910214
3. Make stupid but pretty graphs nobody will ever read carefully
>>
>>51910178
Download porn
>>
>>51910178
Install Gentoo
>>
1) buy big server.
2) fill with data.
Congratulations, you have achieved your goal.
>>
>>51910214
>>51910263
>lots of small data is one big data
uh no
>>
>>51910299
Uh yes, that's exactly what big data is.
Just lots of data.
The reason it's 'big' data is because the quantity of data is so colossal that analysing it is the challenge.
>>
Learn cluster algorithms
>>
>>51910326
If you can fit it on a single server then it isn't big data.
>>
>>51910531
Then get another server.
>>
>>51910559
You probably need around 100 before you can consider it big data.
>>
>>51910531
You're wrong with that logic. You can fill 4tb of data and still be considered big data.

Stop thinking you are smart on a topic you are unsure about...
>>
There are lots of open APIs from which you can pull quite large sets of data, for example NASA just opened up a lot of stuff.

I began my adventures in machine learning with the MNIST database. It is a collection of handwritten numbers that was specifically made for testing our ML algorithms. I suggest you pick up a programming language in which you are decent, sign up for a ML course in edx/cousera/whatever, learn the basics, and implement the algorithms you've learnt. Then test those against the MNIST data.

It's pretty straightforward stuff and you should be able to understand stuff like linear regression, multiclass labeling, bayes optimal, k-means / k-medoids clustering, and ROC in a month or two, depending on how familiar you are with the statistics and math right now.
>>
>>51910590
>>51910590
The MNIST database is a couple MBs. Nowhere near big data.
>>
>>51910178
This m8

Open Source Data Science Masters

http://datasciencemasters.org/
>>
>>51910178
Hadoop
>>
>>51910531
This is nonsense
Thread replies: 19
Thread images: 1

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.