To strive for higher outputs
race for higher processor clock speeds on the RISC platform is decelerating
as silicon real estate faces the limits of current technology. 2006 will see
the emergence of multithreading on CPUs to shatter the barriers that are slowing
down throughput computing. by Anil Patrick R
RISC, the name itself evokes feelings of power and blazing performance rates.
For anyone tracking the evolution of these processors, it is evident that the
field is narrowing for performance options, Moores law notwithstanding.
This is why industry majors are looking at innovations to bolster the high-end
enterprises RISC server performance quotients and that is why the focus
will shift in 2006 from incorporating more cores per CPU to simultaneously processing
more threads per core.
Moores Law Isnt Enough
The time has come for approaches that go beyond ramping up clock speed. Unlike
x86, the RISC architecture has never relied wholly on clock speed. One bottleneck
has been memory performance. Of what use are blazing clock speeds if all the
output is just going to wait in a queue while memory plays catch up? This is
why CPU manufacturers decided that it was time to explore what more can be achieved
within the processor while waiting for the slower memory to catch up.
CIO and Head, IT
With silicon real estate shrinking, the first approach tried
was to put multiple cores on a single processor. With this approach, performance
improved but it still could not hold a candle to multi-processor machines.
Now the focus has shifted towards running more threads per core to increase
outputs. It is clear that this approach if properly implemented will be quite
advantageous in bulk data crunching environments. Most software programs
perform each process as you request it. Multithreaded software, however, treats
each process as a separate thread. So each request launches a specific thread
that runs at the same time as the other threads in the program. In situations
like unexpected additional traffic or an increased number of users, single threading
processors face problems. This is where multi-threaded processors will prove
advantageous, says B Satyanarayan, CIO and Head, IT, Dimexon Diamonds.
Apart from the increased performance, multi-threading processors
also possess other advantages like easier deployment, space/power savings and
cooler operation. Deployment of these multithreading processors will be
easy. In fact it will be easier than deploying blades which have their own set
of challenges such as single vendor dependencies and power/cooling considerations,
says Munish Mittal, VP, IT, HDFC Bank.
The Playing Field
Multi-threading as such is not a new approach. It has been around since 1995
on mainframe platforms in one form or another. However, on the RISC server side,
IBMs Power5 processor was the first to emphasise the need for multi-threading
when it was launched in May 2004. As of now, there are two players in the RISC
field (HP PA-RISC being almost out of the race) adopting multi-threading. These
are IBMs Power5/ Power5+ and Suns UltraSPARC T1 Niagara
processor released in November 2005.
Multi-threading as such is not a new approach. It has
been around since 1995 on mainframe platforms in one form or another.
However, on the RISC server side, IBM's Power5 processor was the first
to emphasise the need for multi-threading when it was launched in May
IBM uses Simultaneous Multi-Threading (SMT) technology while
Sun has gone in for Chip Multi-Threading (CMT). The differences are legion.
Both methods have their pros and cons. At this stage, IBM is focusing on the
large enterprise and HPC segments while Sun is focusing on the Web and application
2006 is going to be the year when multi-threading on RISC will be adopted by
large enterprises on a notable scale.
Although it is still premature to predict how the future of multi-threading
will pan out, 2006 is still the year when RISC multi-threading will be tested
to the core. I personally believe that at present the vendor focus with
multi-threaded processors is to do faster processing at the front end for consolidation,
and reduce complexity of deployment, with better ease of management. This will
be a positive development if it delivers what it promises, says Mittal.
While IBM leads the clock speed race, Sun focusses on
the threads processed per core. These are early days, and it will take
a couple of months before the dust settles and the merits and demerits
of each approach can be decided. However, the clear positioning of both
chips ought to save evaluators some time when making buying selections
While IBM leads the clock speed race, Sun focusses on the
threads processed per core. These are early days, and it will take a couple
of months before the dust settles and the merits and demerits of each approach
can be decided. However, the clear positioning of both chips ought to save evaluators
some time when making buying selections.
Power Packed Punch
The IBM Power5+ is a migration of the earlier Power5 from 130 nm to a 90 nm
process. Until the entry of Niagara, the Power chips had no competition on the
The Power chips forte has been their higher clock speeds, leading to their
supremacy in areas where high floating point processing is required, such as
transaction processing and HPC applications. At present, the Power5+ is available
as dual core and quad core processors with clock speeds of 1.5, 1.65, and 1.9
In a Power5 or Power5+ chip, two threads are run per core using SMT. With the
SMT approach, multiple threads are able to execute instructions at the same
time on the free executors in the CPU. For example, consider a core with multiple
executors, with one of the executors empty. This executor will go in for the
next waiting thread and process it. This means that the processor is able to
execute more instructions while waiting for the memory to respond. IBM has had
the edge on this front due to its faster clock speeds.
According to IBM, its present focus is not on the number of threads that can
be processed per core. Even when we first came out with a dual core processor,
it was technically possible to run more than two threads. However, there is
no significant performance boost if six threads are run on a single core,
says Rajeev Sreekantan, Product Manager, pSeries servers, IBM India.
The logic is that yield does not grow linearly; therefore, there is no need
to increase the number of threads. As per IBM, the performance boost achieved
with anything more than two threads per chip is not sufficient to justify the
effort. With four threads per chip there is an effective performance boost
of 40 percent depending on the application and the way it is threaded. For the
next one to two years we dont need to increase the number of threads per
core since performance/process core is sufficient today in IBMs microprocessors,
As the Niagara Flows
While IBM believes in blazing speeds, Suns focus is
more on increasing the number of cores per processor and the threads that can
be processed per core. Hence, the Niagara is an eight-core processor that can
run 32 threads (four threads per core) simultaneously, while operating at a
clock speed of 1.2 GHz.
A distinction has to be made between how fast threads
are created vis-à-vis how many of them can be processed at the
same time. A higher number of threads being simultaneously processed means
that a higher number of users can be connected at the same time
According to Sun, Niagara is not meant to be a fast processor;
the focus is on higher throughputs. A distinction has to be made between
how fast threads are created vis-à-vis how many of them can be processed
at the same time. A higher number of threads being simultaneously processed
means that a higher number of users can be connected at the same time,
says Anil Valluri, Country Director, Client Solutions, Sun Microsystems. This
is one reason why Sun has targeted the Niagara at the Web/database segments.
This is not a processor for floating point calculations. It is more for
applications such as OLTP, web caching, e-mail, firewalls, application servers,
and databases, says Valluri.
Instead of competing with IBM on the higher clock speed race, Sun decided to
attack the memory latencies to improve performance. This was meant to decrease
the gap between Moores law and memory performance. The Niagara has memory
controllers built in on the chip along with major chip space saving designs.
We knocked out unwanted elements and processes that are used for the traditional
5 to 10 percent performance boosts that other vendors use. The same real estate
is used for going up to eight cores saving close to 80 percent of it,
Two of the most interesting features of the Niagara are its low power consumption
levels and cooler operating temperatures. Niagara runs at 70 W per processor
compared to the Power chips that consume around 350 W and 250 W, a desirable
feature in the data centre environments that the Niagara targets.
Sun hopes to cash in on the multi-threaded nature of Solaris in a big way with
the Niagara. Existing Solaris software runs on it. Solaris 10 has been
threaded for around six years. Linux has no kernel code for running multiple
threads and Windows has no threading capability, says Valluri. Sun is
also trying to sort out the licensing issues associated with eight cores at
present. For example, Oracle has already announced that every Niagara chip will
be licensed at the rate of two cores.
Multi-threading will eventually change the way servers are sized. Sizing documents
and tools will have to be changed.
CIOs Need Convincing
Although multi-threaded RISC CPUs are going to ship in quantity in 2006, CIOs
have qualms about the maturity of this technology.
Satyanarayan stresses the need for careful evaluation. Multi-threading
technology is already coming in, but there is a need for careful evaluation
by organisations before selection. The biggest concern is whether the technology
is mature, he says.
Yet another concern is that enterprise software applications will have to be
rewritten to take full advantage of multi-threaded hardware. That said, there
will be performance boosts even if the software hasnt been optimised for
multi-threaded CP Us. Existing software will gain significant performance
boosts even if it has not been written for optimal operation on a multi-threaded
processor, says Satyanarayan.
Theres hope yet as mission-critical applications are prime targets for
this technologyonce it is bulletproof, that is.
At HDFC bank we are more interested in applications such as databases,
data warehousing and BI. Applications will benefit once multi-threading processors
move in this direction, says Mittal.