Discussion:
[Simh] UC15 development is done (for now)
Bob Supnik
2018-05-18 18:16:39 UTC
Permalink
At long last, I've finished testing the PDP15/UC15 combination, and it
works well enough to run a full XVM/DOS-15 sysgen. I've sent an "RC1"
package to Mark fP. or trial integration.

The configuration consists of two separate simulators - the PDP15 and a
stripped down PDP11 called the UC15. It uses the "shared memory"
facility that Mark P. created for both the shared memory between the
PDP15 and the PDP11 and the control link state. Getting decent
performance requires multiple cores and tight polling, so this initial
implementation has a number of restrictions:

- Windows and Linux only.
- The host system must have at least two cores or processors.
- The host system needs wall power, because the two simulated processes
run flat out, without idling.

There are a couple of clear areas for future development:

- Implementation of the 'shmem' simulator library on other host
platforms (Mac, VMS).
- Augmentation of the shared memory capability with some sort of
directed interrupt, so that the two simulator processes can sleep/idle
when nothing is happening. (Specifically, the simulator needs a 'sleep
until <a> or <b>' capability, where the conditions are expiration of the
sleep timer or receipt of a signal from another process.)
- Identification of the idle loop on the PDP15 side (the PDP11 sits at a
WAIT instruction).

Solving the idling problem will help make this configuration 'mobile
friendly,' which its certainly is not at the moment.

When 3.10 is ready to go, I'll post it on the SimH web site, along with
a design paper about the UC15 configuration, and additional instructions
for running XVM/DOS-15 in a UC15 configuration.

/Bob Supnik
Paul Koning
2018-05-25 00:04:23 UTC
Permalink
At long last, I've finished testing the PDP15/UC15 combination, and it works well enough to run a full XVM/DOS-15 sysgen. I've sent an "RC1" package to Mark fP. or trial integration.
The configuration consists of two separate simulators - the PDP15 and a stripped down PDP11 called the UC15. It uses the "shared memory" facility that Mark P. created for both the shared memory between the PDP15 and the PDP11 and the control link state. Getting decent performance requires multiple cores and tight polling, so this initial implementation has a number of restrictions: ...
Impressive!

I wonder if there might be some inspiration in Tom Hunter's DtCyber emulator. That is also a multi-processor simulation with tightly controlled timing and shared memory. Tom's implementation supports CDC Cyber configurations with 10 or 20 peripheral processors plus one central processor. The central processor is actually not all that time critical, and I have extended his code (in a fork) with dual-CPU support using a separate thread for the other processor. That required no special considerations to get the timing right.

But it turns out that near lockstep operation of the PPUs is critical. At one point I tried splitting those into separate threads, but deadstart (system boot) fails miserably then. Tom's answer is straightforward: the simulator is single threaded, timeslicing among the individual emulated processors a few cycles at a time. It actually does one PPU cycle for each PPU, then N CPU cycles (for configurable N -- 8 or so is typical to mimic the real hardware performance ratio). It's likely that it would still work with M > 1 PPU cycles per iteration, but that hasn't been tried as far as I know.

This structure of course means that entry and exit from each processor cycle emulation is frequent, which puts a premium on low overhead entry/exit to the CPU cycle action. But it works quite well without requiring multiple processes with tight sync between multiple host CPU cores.

DtCyber doesn't have idling (it isn't part of the hardware architecture) though it's conceivable something could be constructed that would work on the standard Cyber OS. There isn't a whole lot of push for that. I made a stab at it but the initial attempt wasn't successful and I set it aside for lack of strong need.

Anyway... it's open source, and might be worth a look.

paul
Bob Supnik
2018-05-25 00:39:24 UTC
Permalink
That's how I built the SiCortex simulator - six instances of the Mips
CPU were executed, round-robin, from within a single thread to assure
the lock-step behavior that was required.

Tom's implementation accurately represents how the CDC machines were
built - or at least, how the original 6600 was built. There was only one
set of logic for the 10? 12? peripheral processors, and it was
time-sliced in strict round-robin form. One of Chuck Thacker's classic
designs at Xerox operated the same way; the Alto, perhaps?

I looked fairly carefully at a software threaded model but concluded
that the numerous name-space collisions between the PDP15 and PDP11
simulators would make merging them into a single executable too
invasive. With the multicore/multisimulator approach, the changes to
both simulators are very modest.

/Bob
Post by Paul Koning
At long last, I've finished testing the PDP15/UC15 combination, and it works well enough to run a full XVM/DOS-15 sysgen. I've sent an "RC1" package to Mark fP. or trial integration.
The configuration consists of two separate simulators - the PDP15 and a stripped down PDP11 called the UC15. It uses the "shared memory" facility that Mark P. created for both the shared memory between the PDP15 and the PDP11 and the control link state. Getting decent performance requires multiple cores and tight polling, so this initial implementation has a number of restrictions: ...
Impressive!
I wonder if there might be some inspiration in Tom Hunter's DtCyber emulator. That is also a multi-processor simulation with tightly controlled timing and shared memory. Tom's implementation supports CDC Cyber configurations with 10 or 20 peripheral processors plus one central processor. The central processor is actually not all that time critical, and I have extended his code (in a fork) with dual-CPU support using a separate thread for the other processor. That required no special considerations to get the timing right.
But it turns out that near lockstep operation of the PPUs is critical. At one point I tried splitting those into separate threads, but deadstart (system boot) fails miserably then. Tom's answer is straightforward: the simulator is single threaded, timeslicing among the individual emulated processors a few cycles at a time. It actually does one PPU cycle for each PPU, then N CPU cycles (for configurable N -- 8 or so is typical to mimic the real hardware performance ratio). It's likely that it would still work with M > 1 PPU cycles per iteration, but that hasn't been tried as far as I know.
This structure of course means that entry and exit from each processor cycle emulation is frequent, which puts a premium on low overhead entry/exit to the CPU cycle action. But it works quite well without requiring multiple processes with tight sync between multiple host CPU cores.
DtCyber doesn't have idling (it isn't part of the hardware architecture) though it's conceivable something could be constructed that would work on the standard Cyber OS. There isn't a whole lot of push for that. I made a stab at it but the initial attempt wasn't successful and I set it aside for lack of strong need.
Anyway... it's open source, and might be worth a look.
paul
Paul Koning
2018-05-25 00:54:00 UTC
Permalink
Yes, you might say that the 6600 was the first hyperthreaded machine, 10 hardware threads in one processor. It's a fascinating design especially if you dig deep: the PPU state rotates around a 10-stage shift register (the "barrel"). The general descriptions show it as a storage element with the logic living at one stage, sort of in a "gap" in the circumference. Reality was more complicated; for example, the memory read is issued from a spot 6 stages around the barrel, i.e., 6 minor cycles before the instruction is due to be executed. And there's decoding ahead of execution, not surprisingly.

I can imagine that name space collisions would be painful. If the code were C++ and each machine a class, that could be handled nicely. Probably too invasive a change for a not very common scenario. Though there were other multiprocessors around; I have seen some documentation of a PDP-8 front-end for an Electrologica X8 timesharing system, unfortunately the PDP8 code was preserved but the X8 code was not.

paul
That's how I built the SiCortex simulator - six instances of the Mips CPU were executed, round-robin, from within a single thread to assure the lock-step behavior that was required.
Tom's implementation accurately represents how the CDC machines were built - or at least, how the original 6600 was built. There was only one set of logic for the 10? 12? peripheral processors, and it was time-sliced in strict round-robin form. One of Chuck Thacker's classic designs at Xerox operated the same way; the Alto, perhaps?
I looked fairly carefully at a software threaded model but concluded that the numerous name-space collisions between the PDP15 and PDP11 simulators would make merging them into a single executable too invasive. With the multicore/multisimulator approach, the changes to both simulators are very modest.
/Bob
Post by Paul Koning
At long last, I've finished testing the PDP15/UC15 combination, and it works well enough to run a full XVM/DOS-15 sysgen. I've sent an "RC1" package to Mark fP. or trial integration.
The configuration consists of two separate simulators - the PDP15 and a stripped down PDP11 called the UC15. It uses the "shared memory" facility that Mark P. created for both the shared memory between the PDP15 and the PDP11 and the control link state. Getting decent performance requires multiple cores and tight polling, so this initial implementation has a number of restrictions: ...
Impressive!
I wonder if there might be some inspiration in Tom Hunter's DtCyber emulator. That is also a multi-processor simulation with tightly controlled timing and shared memory. Tom's implementation supports CDC Cyber configurations with 10 or 20 peripheral processors plus one central processor. The central processor is actually not all that time critical, and I have extended his code (in a fork) with dual-CPU support using a separate thread for the other processor. That required no special considerations to get the timing right.
But it turns out that near lockstep operation of the PPUs is critical. At one point I tried splitting those into separate threads, but deadstart (system boot) fails miserably then. Tom's answer is straightforward: the simulator is single threaded, timeslicing among the individual emulated processors a few cycles at a time. It actually does one PPU cycle for each PPU, then N CPU cycles (for configurable N -- 8 or so is typical to mimic the real hardware performance ratio). It's likely that it would still work with M > 1 PPU cycles per iteration, but that hasn't been tried as far as I know.
This structure of course means that entry and exit from each processor cycle emulation is frequent, which puts a premium on low overhead entry/exit to the CPU cycle action. But it works quite well without requiring multiple processes with tight sync between multiple host CPU cores.
DtCyber doesn't have idling (it isn't part of the hardware architecture) though it's conceivable something could be constructed that would work on the standard Cyber OS. There isn't a whole lot of push for that. I made a stab at it but the initial attempt wasn't successful and I set it aside for lack of strong need.
Anyway... it's open source, and might be worth a look.
paul
Loading...