So with the ps4 using gddr5 as system memory and vram, it got

Thread replies: 54
Thread images: 3

Anonymous
2016-06-20 02:00:40 Post No. 55165514
[Report] Image search: [Google]

File: 11-56-49-Radeon-R9-Fury-X-4GB-HBM-reference_PCB.jpg (237 KB, 1129x600) Image search: [Google]

Anonymous 2016-06-20 02:00:40 Post No. 55165514 [Report]

So with the ps4 using gddr5 as system memory and vram, it got me thinking.

Would it be possible for an APU to have HBM as system ram and vram too?

What would the limitations be? What would the advantages be?

It could be partitioned
>4gb vram
>12GB system ram

16GB total memory.

Maybe even for gaming laptops.

>>

Anonymous 2016-06-20 02:02:58 Post No.55165546
[Report]

Anonymous 2016-06-20 02:02:58 Post No.55165546 [Report]

>>55165514
IIRC AMD is going to do HBM on some APUs eventually as a replacement for L3$/L4$

Imagine 4GB of L3$...

>>

Anonymous 2016-06-20 02:03:26 Post No.55165556
[Report]

Anonymous 2016-06-20 02:03:26 Post No.55165556 [Report]

>>55165546
Holy shit

>>

Anonymous 2016-06-20 02:04:30 Post No.55165574
[Report] Image search: [Google]

Anonymous 2016-06-20 02:04:30 Post No.55165574 [Report]

File: 1444505445293.gif (532 KB, 320x240) Image search: [Google]

532 KB, 320x240

>>55165546
>Imagine 4GB of L3$

>>

Anonymous 2016-06-20 02:04:31 Post No.55165575
[Report]

Anonymous 2016-06-20 02:04:31 Post No.55165575 [Report]

>>55165514
Yes
limitations:
you're capped at whatever's there
adding any onboard stuff would slow it down
exceptionally expensive
CPUs don't like high latencies.

>>

Anonymous 2016-06-20 02:05:58 Post No.55165593
[Report]

Anonymous 2016-06-20 02:05:58 Post No.55165593 [Report]

>>55165546
Imagine 4GB of being a poorfag cuck who can't afford a decent/superior Nvidia product

The 1080 is the best GPU ever made, and it would be a mistake not to buy one

>>

Anonymous 2016-06-20 02:06:33 Post No.55165600
[Report]

Anonymous 2016-06-20 02:06:33 Post No.55165600 [Report]

>>55165546
>high latency/high bandwidth ram as cache
do not want.jpg

>>

Anonymous 2016-06-20 02:08:00 Post No.55165623
[Report]

Anonymous 2016-06-20 02:08:00 Post No.55165623 [Report]

>>55165575
It'll be good for laptops though. You won't need more than 16GB and the ram slots could be used to squeeze in a bigger cooler for the APU.

Not sure about latency. How did Sony do it?

>>

Anonymous 2016-06-20 02:10:56 Post No.55165663
[Report]

Anonymous 2016-06-20 02:10:56 Post No.55165663 [Report]

>>55165600
>high latency
current APUs don't even have an L3$, even if it's got latency, it'd still be faster to keep decoded uops in a dram-like memory than discarding them and having to decode all over again.

>>

Anonymous 2016-06-20 02:13:27 Post No.55165684
[Report]

Anonymous 2016-06-20 02:13:27 Post No.55165684 [Report]

>>55165663
>current APUs don't have L3 cache
LOL
damn I like AMD but this is too much.
SDRAM cache still won't work like normal caches do though.

>>

Anonymous 2016-06-20 02:16:04 Post No.55165719
[Report]

Anonymous 2016-06-20 02:16:04 Post No.55165719 [Report]

>>55165546
Y tho

>>

Anonymous 2016-06-20 02:17:33 Post No.55165733
[Report]

Anonymous 2016-06-20 02:17:33 Post No.55165733 [Report]

>>55165514
Yes, they'll probably move that direction for their mobile APUs. They'd lose system power in having both HBM on package and DDR4 soldered on to the board.

>>55165546
HBM is far, far too slow to act as a CPU cache. That isn't happening.

>>55165663
Literally everything you posted there is wrong. How long it takes to reissue an instruction can be measured in clock cycles, the same is true of how long it takes to access DRAM through the memory controller.
HBM as a last level cache would be so slow that it would have a negative impact on performance. DRAM and SRAM are in entirely different leagues.

>>

Anonymous 2016-06-20 02:18:13 Post No.55165742
[Report]

Anonymous 2016-06-20 02:18:13 Post No.55165742 [Report]

>>55165684
Except people do it all the time, what do you think 1T-SRAM is?
Hint: It's not SRAM.

We're not talknig about replacing l1/l2 with dram, but rather adding an extra level of dram cache (which btw has far lower latency than DDR4 system ram)

>>current APUs don't have L3 cache
>LOL
They actually don't though...

>>

Anonymous 2016-06-20 02:18:15 Post No.55165743
[Report]

Anonymous 2016-06-20 02:18:15 Post No.55165743 [Report]

>>55165593
Wow you got me, I'm buying 4 1080s right now.

>>

Anonymous 2016-06-20 02:18:51 Post No.55165751
[Report]

Anonymous 2016-06-20 02:18:51 Post No.55165751 [Report]

>>55165719
Look at the performance difference between Intel i7's and similarly clocke/core/thread xeons.
More cache == Better performance.

The HBM on Fiji didn't do a whole lot for gaming performance because that's limited by the core count/speed above a certain memory throughput. The compute performance of Fiji is insaaaaaaane.

>>

Anonymous 2016-06-20 02:20:23 Post No.55165777
[Report]

Anonymous 2016-06-20 02:20:23 Post No.55165777 [Report]

>>55165742
>1T-SRAM
basically dram ^:)

>They actually don't though...
no i believe you (i hope you're right) and that is pretty shocking. no wonder APUs are kinda shitty.

>>

Anonymous 2016-06-20 02:22:07 Post No.55165797
[Report] Image search: [Google]

Anonymous 2016-06-20 02:22:07 Post No.55165797 [Report]

File: 12-21-48-75496.png (35 KB, 650x400) Image search: [Google]

35 KB, 650x400

>>55165751
Just looked it up, oh boy.

Stupid question though, if cache is so important, why not add more? Why not l4 and l5 cache?

>>

Anonymous 2016-06-20 02:23:38 Post No.55165815
[Report]

Anonymous 2016-06-20 02:23:38 Post No.55165815 [Report]

>>55165751
Is compute performance usually more bandwidth-limited than gaming?

I've done a little GPGPU performance and I know that memory access is the major bottleneck, but I've never done any game programming.

>>

Anonymous 2016-06-20 02:27:13 Post No.55165864
[Report]

Anonymous 2016-06-20 02:27:13 Post No.55165864 [Report]

>>55165797
Adding more caches means copying data between them.
The less you have to cpy between caches, the lower the latency for memory requests, and the less time the CPU spends waiting for the data to arrive. Ideally, you would have a single pool of memory that the CPU accesses directly (which is how simple microcontrollers work).

>>55165815
I'm not sure on the specifics, but gaming is more dependant on ROP and shader speed, and is less concerned with throwing data around than doing really fast calculations over small pieces of data.

>>

Anonymous 2016-06-20 02:28:14 Post No.55165881
[Report]

Anonymous 2016-06-20 02:28:14 Post No.55165881 [Report]

>>55165864
Then why not a larger l1 cache instead of several smaller l1, l2, and l3 caches?

>>

Anonymous 2016-06-20 02:31:45 Post No.55165933
[Report]

Anonymous 2016-06-20 02:31:45 Post No.55165933 [Report]

>>55165881
L1 cache is hella expensive and hard to manufacture without defects. It has to be right next to the core on the die so that you can keep clock speeds high, as you're limited by the speed of light as to how far you can send data cross the chip at blazing speeds.

>>

Anonymous 2016-06-20 02:39:45 Post No.55166024
[Report]

Anonymous 2016-06-20 02:39:45 Post No.55166024 [Report]

>>55165881
The larger a cache is the slower it is. This is why the fastest SRAM arrangements are always the smallest, and closest to logic. L1 is about a magnitude faster than an L2. This isn't by accident.
The L1 is hit more often, so it needs to be faster. L2 is still hit often, but larger ops land here, so you need more of it, even if its private. L3 is larger still, and a magnitude slower than L2, but it tends to be shared between cores.
How associative the cache is also has a huge impact on performance.

>>55165797
The more stages in a cache heirarchy, the less often they're hit. Its diminishing returns.
You spend more and more transistors on something that offers you no performance in the end, and it still draws power.

>>55165751
You cannot compare how a GPU uses VRAM memory bandwidth to how a CPU uses its caches.
Fiji's biggest bottleneck is pixel throughput from the same limited ROP configuration featured in Hawaii.

As always this board is filled with tech illiterate retards who still feel the need to talk out of their asses.

>>

Anonymous 2016-06-20 02:43:08 Post No.55166061
[Report]

Anonymous 2016-06-20 02:43:08 Post No.55166061 [Report]

>>55166024
>You cannot compare how a GPU uses VRAM memory bandwidth to how a CPU uses its caches
You can compare it almost directly to how something like L3 cache works, which is very similar to how AMD APU CPU cores utilize DRAM.

>>

Anonymous 2016-06-20 02:45:19 Post No.55166076
[Report]

Anonymous 2016-06-20 02:45:19 Post No.55166076 [Report]

>>55166061
No, you can't. VRAM swaps geometry and texture data primarily, not ops.
GPUs have their own L1 and L2 caches. DRAM and SRAM are not used for the same things. Stop trying to participate in a conversation you have no business being in.

>>

Anonymous 2016-06-20 02:46:01 Post No.55166083
[Report]

Anonymous 2016-06-20 02:46:01 Post No.55166083 [Report]

>>55165815
It depends. It's just as conceivable that a graphics workload can take as much bandwidth, simply in gaming most data are forwarded from command processors into caches and fed into the queues, so most of the bottlenecking occurs in texture mapping or rasterization which are usually much longer, requires signaling to be offloaded, and can actually be more or less serial rather than parallelizable, unlike most of the GPGPU operations intended to be offloaded to a coprocessor. If you write the code right, cards can usually saturate the compute queue and burn the entire compute stack faster than you can feed it work. The Fiji cards are in fact ideal for parallel compute only scenarios because they don't have that many ROPs but have a huge number of shaders that wouldn't be bottlenecked by the command processor if it was fed only a long compute queue.

>>

Anonymous 2016-06-20 02:47:33 Post No.55166098
[Report]

Anonymous 2016-06-20 02:47:33 Post No.55166098 [Report]

>>55166076
Data is Data.
Can you explain to me why copying ops is different to something like texture data??

>>

Anonymous 2016-06-20 02:52:56 Post No.55166161
[Report]

Anonymous 2016-06-20 02:52:56 Post No.55166161 [Report]

>>55166098
INSTRUCTIONS ARE EXTREMELY TIMING SENSITIVE YOU TECH ILLITERATE RETARD

Memory controllers can spend over 500 clock cycles processing some blocks based on their size. The data moved through VRAM is not timing sensitive. If a CPU core were to wait 100 clock cycles on an instruction every time it was fetched it would cripple performance. You'd have Bulldozer performance.
Stop talking out of your ass.

>>

Anonymous 2016-06-20 02:53:54 Post No.55166172
[Report]

Anonymous 2016-06-20 02:53:54 Post No.55166172 [Report]

>>55165514

What card is that OP?

>>

Anonymous 2016-06-20 02:54:32 Post No.55166179
[Report]

Anonymous 2016-06-20 02:54:32 Post No.55166179 [Report]

>>55166161
>data moved through VRAM is not timing sensitive
That'll be people don't OC VRAM then.

>>

Anonymous 2016-06-20 02:55:07 Post No.55166187
[Report]

Anonymous 2016-06-20 02:55:07 Post No.55166187 [Report]

>>55166172
File name

>>

Anonymous 2016-06-20 02:58:20 Post No.55166225
[Report]

Anonymous 2016-06-20 02:58:20 Post No.55166225 [Report]

>>55166172
>HBM die
>2x8-pin connectors
>short board
I wonder what?

>>

Anonymous 2016-06-20 02:58:43 Post No.55166235
[Report]

Anonymous 2016-06-20 02:58:43 Post No.55166235 [Report]

>>55166172
Are you kidding?

>>55166179
The amount of bandwidth a GPU needs is directly proportional to the throughput of its ALUs. You overclock VRAM along with the core clock so performance scales as linearly as possible without incurring any bottleneck.

This has absolutely nothing to do with how timing sensitive the workload is. You're still trying to equate DRAM and SRAM. They are not comparable. The data read and written to them is entirely different.

Its about time you stop posting.

>>

Anonymous 2016-06-20 03:02:01 Post No.55166282
[Report]

Anonymous 2016-06-20 03:02:01 Post No.55166282 [Report]

>>55165593
>Nvidia makes x86 CPU's

>>

Anonymous 2016-06-20 03:03:26 Post No.55166304
[Report]

Anonymous 2016-06-20 03:03:26 Post No.55166304 [Report]

>>55166282
They tried once.

>>

Anonymous 2016-06-20 03:04:31 Post No.55166322
[Report]

Anonymous 2016-06-20 03:04:31 Post No.55166322 [Report]

>>55166304
Really?

>>

Anonymous 2016-06-20 03:05:18 Post No.55166331
[Report]

Anonymous 2016-06-20 03:05:18 Post No.55166331 [Report]

>>55166235
so, you're saying that slow DRAM makes the GPU wait for data, and that slow SRAM makes the CPU wait for data, and that there's no similarities?
Say I have a cache miss in L2, and I have to go fetch that from DRAM. You're telling me that there's no difference between the DRAM on a motherboard behind a bunch of controllers is no different to putting that DRAM right next to the CPU?

>>

Anonymous 2016-06-20 03:10:39 Post No.55166404
[Report]

Anonymous 2016-06-20 03:10:39 Post No.55166404 [Report]

>>55166331
The interface that logic interacts with DRAM and SRAM through is entirely different.
If you were regularly pulling instructions out of system RAM then you would have a CPU core stalling, doing nothing, for hundreds of clock cycles. A really bad pipeline stall is maybe 20 clocks.

Stop posting. You are too fucking stupid for words. Go to your local shit tier community college and take into to CS instead of continuing this tirade of ignorance.

>>

Anonymous 2016-06-20 03:15:51 Post No.55166458
[Report]

Anonymous 2016-06-20 03:15:51 Post No.55166458 [Report]

>>55166304
What happened? Intel fuckery?

>>

Anonymous 2016-06-20 03:17:13 Post No.55166467
[Report]

Anonymous 2016-06-20 03:17:13 Post No.55166467 [Report]

>>55166404
I understand that, what I was saying, is that larger low-level caches improve performance because it lowers the number of cache misses that occur. When a cache miss occurs in the highest level on the CPU, time is wasted fetching that DRAM, which is where HBM gives an enormous advantage over traditional SODIMM's in the same way that it gives an advantage over GDDR.

>>

Anonymous 2016-06-20 03:41:11 Post No.55166710
[Report]

Anonymous 2016-06-20 03:41:11 Post No.55166710 [Report]

>>55166467
HBM IS DRAM
The stack is made of DRAM dies on top of a base layer with a little control logic.

It is only marginally faster than GDDR5 in a few metrics because of how it handles signaling, it is still DRAM. It cannot be compared to SRAM, and you would not use it as a CPU cache. The fastest DRAM is still horrendously slow compared to SRAM.
If swapping an instruction out of your last level cache takes longer than reissuing it then you are directly causing a performance regression.

>>

Anonymous 2016-06-20 03:43:24 Post No.55166732
[Report]

Anonymous 2016-06-20 03:43:24 Post No.55166732 [Report]

>>55166710
>and you would not use it as a CPU cache
Intel uses DRAM as L4 cache on Iris Pro CPU's

>>

Anonymous 2016-06-20 03:46:56 Post No.55166772
[Report]

Anonymous 2016-06-20 03:46:56 Post No.55166772 [Report]

>>55166710
Do you just hate AMD?

>>

Anonymous 2016-06-20 03:49:25 Post No.55166797
[Report]

Anonymous 2016-06-20 03:49:25 Post No.55166797 [Report]

>>55166710
>It is only marginally faster than GDDR5
4 stacks are as fast as 1 stack
This means it's 4 times faster than any other 2d counterpart.

>>

Anonymous 2016-06-20 04:04:56 Post No.55166956
[Report]

Anonymous 2016-06-20 04:04:56 Post No.55166956 [Report]

>>55166732
No, intel uses bidirectional eDRAM. It is considerably faster than HBM, it just doesn't offer the same bandwidth. It also uses a ton of power. Unsurprisingly there are trade offs to everything.

HBM's trade off is that it wouldn't ever be suitable for a last level cache.
http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/3

>>55166797
Bandwidth is not access time and latency, dipshit.

>>

Anonymous 2016-06-20 04:20:46 Post No.55167147
[Report]

Anonymous 2016-06-20 04:20:46 Post No.55167147 [Report]

>>55166956
>it is faster
>but it has less fastness
Nice.

>>

Anonymous 2016-06-20 07:00:53 Post No.55168925
[Report]

Anonymous 2016-06-20 07:00:53 Post No.55168925 [Report]

>>55165514
>ITT: something I thought about when Fury X launched

>>

Anonymous 2016-06-20 08:28:32 Post No.55169689
[Report]

Anonymous 2016-06-20 08:28:32 Post No.55169689 [Report]

>>55166956
>No, intel uses bidirectional eDRAM
That's just signaling, it's still slower than HBM
It's just on die DRAM, stacked on die DRAM is still faster
>Bandwidth is not access time and latency, dipshit.
Latency and access time is about same as HBM

>>

Anonymous 2016-06-20 08:35:32 Post No.55169748
[Report]

Anonymous 2016-06-20 08:35:32 Post No.55169748 [Report]

>>55169689
Stop mincing words like a fucking moron.
HBM offers more bandwidth, it is not lower latency. eDRAM is not *just* DRAM, it has closer wire length and a lower latency interface with the logic its feeding than anything else. The article I linked explicitly states the access time for the L4.
HBM is a full 10ns slower.

>>

Anonymous 2016-06-20 08:40:34 Post No.55169793
[Report]

Anonymous 2016-06-20 08:40:34 Post No.55169793 [Report]

>>55169748
What if HBM cache is used differently? Used like an in between for the cpu and the ram to make some things faster?
Not that guy, I'm just a tech illiterate retard who made his way onto /g/, so please no hate.

>>

Anonymous 2016-06-20 09:11:26 Post No.55170033
[Report]

Anonymous 2016-06-20 09:11:26 Post No.55170033 [Report]

>>55165546
>>55165574
Isn't small size one reason why caches are fast, the other one being that they are made of SRAM instead of DRAM?

>>

Anonymous 2016-06-20 09:21:34 Post No.55170107
[Report]

Anonymous 2016-06-20 09:21:34 Post No.55170107 [Report]

>>55166772
not the same guy but KYS

>>

Anonymous 2016-06-20 10:08:12 Post No.55170552
[Report]

Anonymous 2016-06-20 10:08:12 Post No.55170552 [Report]

>>55170107
kys

>>

Anonymous 2016-06-20 10:10:48 Post No.55170577
[Report]

Anonymous 2016-06-20 10:10:48 Post No.55170577 [Report]

>>55170552
>>55170107
you guys are misspelling kiss, what's the deal??

>>

Anonymous 2016-06-20 11:30:17 Post No.55171512
[Report]

Anonymous 2016-06-20 11:30:17 Post No.55171512 [Report]

>>55170577
kYs