[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
database design
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 22
Thread images: 1
File: muh soggy knees.webm (1 MB, 640x360) Image search: [Google]
muh soggy knees.webm
1 MB, 640x360
I'm working on a project that needs to store 121 billion attributes (int) for 33 million accounts - each account having just under 4000 attributes (all have values, most are 0).

What's the best way to store it so that each value can be indexed against the properties of the account?
The optimal solution would obviously be just having a 4000 column table, but none of the engines i tried support that many columns (or that many indexes).
putting the values into their own table with accompanying mask data for indexing causes absurd redundancy (mongo ended up at about 3.5tb for that one table).

what the fuck do i do? my last option was partitioning the very wide table so that each partition holds 50-100 values, but i don't really want to juggle 40 tables when fetching all the values for a single account.
>>
>>53635917
arrays.
you must be new
>>
>>53635917
lmao wtf is that webm
>>
>>53635917
Just make a table with 121 billion fields. Call them attribute1, attribute2, etc.
>>
>>53635917
use mongoDB
it's webscale
>>
>>53635943
what engines support indexing arrays by index, all of them seem to just index the set so it can quickly detect what values are in it, but not which order the rows are in based on the value of that array index.

>>53635975
I'm currently using it, it's not webscale enough, it's choking hard and using up a fuck ton more space than the data it conains, 48 bytes minimum per row is a stupid property.
>>
>>53635917
Use excel
>>
>>53635917
store them in a comma delimited file
>>
>>53635917
install gentoo
>>
InfluxDB?
>>
tell your boss to hire a faggot who doesn't need to ask a phone rumors discussion board on how to do his job.
>>
>>53635972
Weight is too high, and she locks her knees while her legs/feet are too low.
>>
Why the fuck would an account have so many attributes?
>>
>>53635917
you can split db's so that only so many go into one db. like last names a-g go into one and so on.
>>
>>53636919
>implying I'm employed

>>53637102
action statistics mostly, how many time they did x action.

>>53637153
ye that was my last resort, seems like there should be a better way.
>>
>>53635917
Is this webm actually legit? It's just so fucking horrific
>>
>>53635917
an array of arrays. to store it just serialize the main array. if you need more space get a bigger hard drive.
>>
>>53637332
it needs to be indexed, currently it just sitting in a gzip'd blob on the account table.
>>
>>53637359
to extract it use: "gunzip -d file.gz"
>>
use .INI files
>>
> 121 billion attributes
Cucker-berg is that you?
>>
>>53635917
Women should not be allowed to lift weights
Thread replies: 22
Thread images: 1

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.