[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
computer tech seemed to evolve faster and faster from 16 to 32
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 52
Thread images: 2
File: 128-bit.jpg (24 KB, 278x161) Image search: [Google]
128-bit.jpg
24 KB, 278x161
computer tech seemed to evolve faster and faster from 16 to 32 and then to 64 bit but now we seemed to hit a wall? why is there no talk about 128 bit processors? have we reached the limit ? imagine what we could do with 128 bit processors. ..
>>
128 bit processors could become self-conscious maybe.
>>
>>51580182

Bait? We are far, far away from using the memory bandwidth 64 bit gives us.
>>
>>51580182
I have two questions : Do you know what the number of bits means?
If your answer to the previous question was yes : Why the heck would you want a 128-bit processor?
>>
for what reason tho
>>
>>51580282
having direct access to even more registers would speed things up significantly
>>
>>51580462
>If your answer to the previous question was yes : Why the heck would you want a 128-bit processor?

how else are we gonna truly simulate the unierse an? we need like 1024bit address space for that!
>>
>>51580182
>imagine what we could do with 128 bit processors. ..
Like what?
>>
>>51580489
32 to 64bit did not create a significant speed up for most applications
>>
>>51580489
the improved instruction sets and chips matter more than the size of the address space. not many apps need to address 6GB of memory contiguously. in some cases on early 64bit systems you could actually get a slow down on 64bit aps over 32bit apps for some things.

but yeah in general 64bit is gonna be better than 32bit
>>
>>51580489
>having direct access to even more registers would speed things up significantly
1. That has nothing to do with the 'number of bits'
2. More registers == more state to save in a context switch. Go study the SPARC architecture.
>>
>>51580182
Yeah, Itanium was such a huge success.
>>
>>51580521
create space ships and time machines, duh anon
>>
File: 1434050943345.jpg (29 KB, 539x566) Image search: [Google]
1434050943345.jpg
29 KB, 539x566
neo-/g/ sure is new
>>
>>51580182
mainstream computing is still nowhere near the memory limits of a 64bit CPU. companies will probably start figuring out once its common to see stuff with 192GB of ram.
>>
Because there's no need and support for it.. if you actually need that much ram you'd just run a dual cpu setup or use parallel computing
>>
>>51580729
>companies will probably start figuring out once its common to see stuff with 192GB of ram.
That's still less than 38 bits.
>>
>>51580182

The RISC-V people have already designed a 128 bit ISA, that is a simple extension of their 64 bit extension. There is a paper floating around about their design decisions, but basically we are not that far away from excascale systems that need that much addressable memory. These are not desktop, but warehouse scale computers.

In spectrum, I read about a system to allow addressing ram on a datacenter scale for fast storage or computation. Basically the tech does not even have an application for it yet. Fucking insane shit. Anyway, It would be really nice to derefrence a 128 bit pointer in your code and get a dword from across the datacenter without you even knowing. Virtualization of virtualization to create massive single system images. That's where my mind went anyway.
>>
>>51580182
It seems that you are a complete retard
>>
>>51582020

Just try to keep in mind that our largest warehouse-size computing systems are at most on the scale of a small mammals brain. The power that this computer uses is megawatts, unlike the watts of the rat. Not that we even know how to encode biological style computation on the machines we can build.

We have so far to go. We will certainly need exploit data-sets that require 128 bits of address-ability, probably within our lifetimes.
>>
>>51580182
we already have 1024bit processor, just cluster a few 64bit computers together to shair their processing bits.
>>
Intel's new cpus are coming out with an instruction set to accelerate arbitrarily large integers.
>>
>>51582166

That is not exactly the same thing. Typically the number of bits in the general purpose registers of a processor are considered to be the bits of a pointer, and therefore the absolute limit of addressable memory.

Any modern (desktop) cpu core is multi-issue (really hundreds of instructions are in-flight and at various stages of decoding and execution).

This means you have multiple separate independent pipelines for integer or fp or other, each processing say 64 bit chunks at a time.

But this again is not the same as a 128, or 256, or 512 bit computing. It is more of just a vector instruction within the 32 or 64 bit computer.
>>
>>51580670
We created space ships in the 1960s
>>
>>51580651
Shhh, stop saying logical things.
>>
>>51580651

>1. That has nothing to do with the 'number of bits'
AArch64 and x86-64 both have twice as many general purpose registers as their respective 32-bit architectures. In the case of x86-64, since the 32-bit version was register starved (it only had 8 registers, 2 of which being needed for the stack), and improved performance can often be found in the 64-beit versions of applications, even if a bit minute.

>2. More registers == more state to save in a context switch. Go study the SPARC architecture.
Context switching is far less common than register shuffling within a process.
>>
>>51580282

memory bandwidth has little to do with the width of the computer. i.e. you can give a 32 bit machine a 64 bit wide memory bus (or wider).

Effectively the reason is caches. Modern machines don't talk directly to ram, they talk to the ultra fast nearby sram (caches) integrated into the processor die. When a chunk of cache needs to be replaced with new data it is written out to ram as a chunk (if it is dirty), over the much wider memory bus. It is also read/filled as a chunk, streaming a (largeish) block into it, hiding the horrible access latencies of dram. This works because most data accesses are local (high cache hit rates).
>>
>>51582766

heh. you missed the point. the registers need to be saved on even a function call. (context switch too, but that was not the point of sparc)
>>
>>51582766
>Context switching is far less common than register shuffling within a process.

Agreed. Personally, I think it's probably time sparc-style register windows made a comeback..
>>
>>51582843

Not necessarily in the case of sparc, for shallow call stacks the processors register file could contain all the register windows for the call stack without having to spill any into memory.
>>
>>51582881

In what realm of modern software do you have shallow call stacks?

But good call, I'm not terribly familiar with sparc; I have read that the hardware implementation required a massive mux around the register bank which limited performance, and its success.
>>
>>51582766
>AArch64 and x86-64 both have twice as many general purpose registers as their respective 32-bit architectures.
Still has nothing to do with them being 64-bit. The extra registers were sorely needed, and the ABI break was a good opportunity to add them.
>Context switching is far less common than register shuffling within a process.
Not in all processes. In your single threaded games, sure. Again, go look at SPARC.
>>
>>51582881
The SPARC architecture was substantially about mitigating the downsides of massive numbers of registers. So, yes, SPARC avoids some of the problems and creates others. The ways they do it are instructive.
>>
>>51580182
>computer tech seemed to evolve faster and faster from 16 to 32 and then to 64 bit
Go read about computer history a bit, you'll understand why then.
>>
>>51582946
>>Context switching is far less common than register shuffling within a process.
>Not in all processes. In your single threaded games, sure. Again, go look at SPARC.
Also, call stack. You have to save the register state somewhere....
>>
>>51582844

I wanna fight with a guy that knows a bit about computer architecture.

I think the next massive advance upon us is going to arrive, like now in the form of the new xeon phi, knights landing. Fuck the implementation, I am referring to the AVX-512 contained within.

Basically, if you look at a GPU, they have a group of threads executing the same instruction, (a warp). What Intel looks to be doing is to have the equivalent of a warp be executed in a single simd instruction.

Basically I argue that AVX-512 (plus high core counts) is going to bring GPU style computing performance into normal programming models.

Anyway, back to my initial point, how the fuck do you intend to store all this extra context in a register window? How do you make this hardware??? So no, good sir, I do not see register windows having a come-back any time soon.
>>
>>51583035

Well, the papers I've read on the subject concluded that link-time register placement can produce similar results to register windows, and that register windows represented a problematic complication..

This being the case, this paper iirc was quite old (about the age of the early sparc cpus, naturally). Reason I think it could do with another look is:
- Link time register placement still isn't done that much
- Stuff like dynamically linked libraries make it impractical to do in many cases
- Register windows are probably simpler than register renaming, which wasn't a widespread technique when the big fight over register windows happened.

But yeah, it's unlikely to make a comeback because nobody's making a fight of it in the market with a windowed architecture..

I basically agree with you about the similarity of AVX and GPU workloads (amd GCN cores particularly work much like a cpus vector instructions).

Main difference being, I think, the memory subsystem. (Discrete GPUs typically have faster memory, as well as the fact their caches are organised in a "tiled" manner to match the access patterns when working on images).

I'm not clear on your context argument, though. With SMT/hyperthreading, cores need to have circuitry to keep track of which internal registers are being used by which virtual core. This logic probably wouldn't be complicated much by the register windows.
>>
>>51583257
>I'm not clear on your context argument, though. With SMT/hyperthreading, cores need to have circuitry to keep track of which internal registers are being used by which virtual core. This logic probably wouldn't be complicated much by the register windows.

I was referring of the need to build a massive mux around multiple sets of 32 512 bit long registers.

In a massive OoO core like you hint on in your last comments, Jesus, what a clusterfuck. I can't imagine what that would look like.

Mind you, I've only implemented the classic simple mips-like designs, so don't get too worked up.

That being said, of what real utility is a register window in a massive OoO design. I assume that a function call does not even register as significant, just a burst of more writes to an area of memory in cache which may occur in parallel to real computation coded before and across the function call? (assuming a branch prediction hit)
>>
>>51583257
>Main difference being, I think, the memory subsystem. (Discrete GPUs typically have faster memory, as well as the fact their caches are organised in a "tiled" manner to match the access patterns when working on images).

Oh yeah. take a look at what is arriving with knights landing. A ?16GB? edram buffer on chip, that can be configured either at a fixed place within the address space or as a last level cache. Embedded dram is coming for all semiconductors, and is going to be huge. As far as stride/access patterns, pfff, on a cache miss if it is not on-chip in L3, odds are its next door in edram.
>>
>>51583498
>I was referring of the need to build a massive mux around multiple sets of 32 512 bit long registers.

True, but don't they need to do that anyway due to register renaming?

>That being said, of what real utility is a register window in a massive OoO design. I assume that a function call does not even register as significant, just a burst of more writes to an area of memory in cache which may occur in parallel to real computation coded before and across the function call? (assuming a branch prediction hit)

I guess you're right that the caches could mitigate the need to delay when saving register state on a function call. The caches can take a few clock cycles to access, though, and I guess the order in which architectural registers are used could create false dependencies in the code?

I'm not really an expert on this either desu. Just s hobbyist.
>>
>>51583787
>True, but don't they need to do that anyway due to register renaming?

Gosh, I wouldn't think so. I think OoO is done with maybe 100 internal hidden registers, each of which maps down to a particular hardware register at a particular point in time. I don't think they would ever have to be treated as a unified block, like a register window would have to be.
>>
>>51583787
>I guess you're right that the caches could mitigate the need to delay when saving register state on a function call. The caches can take a few clock cycles to access, though, and I guess the order in which architectural registers are used could create false dependencies in the code?

I have to guess that across a function call a big OoO is going to know that a particular register is going to be written to a memory location and used after the branch (passing an argument in a register). A copy of that value is going to be made. The conflict will be resolved by the copy, one can proceed down a memory opp pipeline, while the other is free to get modified in parallel.

I've never tried to build anything like this, but there is a reason we have billions of transistors in a die :) (go intel and amd)
>>
>>51583879

Maybe I have the wrong idea about how it's implemented, but isn't the translation between internal and external registers already done by the time the decoded op gets to the scheduler/reservation stations/whatever you want to call them..

Wouldn't that require the reservation station to be able to read any of the internal registers?
>>
>>51583985

I didn't think there is traditionally any implementation of a fixed register file (except incidentally) in modern stuff.

I had the impression that each pipeline stage had a copy of whatever data it needed.

When amd was disclosing its bobcat cores they made a huge deal about the fact they did not pass the entire register contents from pipeline stage to pipeline stage. They made the argument this saved a bunch of power (makes sense).

Anyway, I'm not sure the idea of a single block of registers in the sense of 80's risc, or even older cisc designs exists in practice.

But you are rapidly dragging me out of my knowledge domain. (I am happy to say).
>>
>>51584092

Fair enough - I haven't read anything about an alternative approach to a register file, so I'll have to leave it there.
>>
>>51584201

Fuck yeah, man. Great chat. Have a nice holidays.

Sitting here coding vhdl (east coast), and drinking whiskey.
>>
>>51580182

64 bit was more gimmick than necessity. Everyone but Intel was in a huge hurry to jump into it.

32 bit is still alive for now, and 64 bit is really only beginning to show it's true power. Why move onto something "better" when you haven't even begun to push what you're replacing it with to it's limits?
>>
>>51584251

Duuuuude. Intel was pushing IA64 pretty hard, which nobody wanted, when AMD came up with AMD-64. Intel copied the specification, which they were allowed to do, and released x64.

AMD-64 was total fucking genius.

Seriously, x32 needs to die. AMD and intel need to bury the hatchet and kill off the lesser used parts of x86. Even super deep embedded stuff like the quarc should be this simplified x64.

Why drag 2 compilers and sets of libraries around? At 15nm what exactly is the cost in die area on a tiny core of 64 bit registers?
>>
>>51584522
IA64 wasn't for the low-end market though. They were content to sticking with 32 bit for that.
>>
>>51584522
A million years from now when humans are virtual beings living on pure energy, we'll still boot in virtual 8086 mode
>>
>>51580182
>imagine what we could do with 128 bit processors. ..
Not much.

We already have 256bit floating point arithmetic anyway, 64bit integer math and everything else is fine as it is now.
It's going to be quite a while before we will ever need more than 16384 petabytes of RAM.
>>
>>51584997

zomg, but they had designs on the future. If only they could push the entire market into an architecture that AMD was NOT legally licensed to copy, they could make BELLIONS!!!!

Too bad EPIC sucked and the Chinese dev teams could not write a compiler that could extract enough ILP.
Thread replies: 52
Thread images: 2

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.