Mar 30, 2009

Who Says Elephants Can't Dance?



I am reading the book "Who Says Elephants Can't Dance? (Nikkei, 2002, Japanese translation)" so late. The book is written by Louis Gerstner assuming IBM CEO from a non-IT company in 1993, brought dramatic success to terrible IBM and left IBM in spring of 2002.
I passed intentionally such a book during an IBM employee because it was too close to enjoy for me. Instead of reading, I always imagined some scene of Dumbo from the title. Now I know that this must be a rich and excellent book.

According to the book, Scott McNealy, CEO, Sun Microsystems frankly said to some journalist that they should select a dull person for a new IBM CEO whoever assumed a CEO when the selection board looked for an IBM CEO.

Anyway, Gerstner revived IBM. And relative to HPC, IBM shipped the first scalable parallel system SP1 in 1993 and ignited a fuse of coming HPC prosperity of IBM, led by Irving Wladawsky-Berger.

In March, 2009, just after sixteen years, WSJ showed a rumor about acquisition proposal of Sun Microsystems to both HP and IBM.

Gerstner clearly said that he never bought companies that did not contribute profit but revenue in the book. Hence, we may imagine Gerstner's attitude about the rumor a little bit if he were the CEO.
Now we hold our breath expecting conclusion by Samuel Palmisano whom Louis Gerstner appointed to his successor (if the WSJ's rumor is true.)

Mar 22, 2009

RIKEN Symposium “The Third Generation PC Clusters”



RIKEN Symposium, titled “The Third Generation PC Clusters", was held on March 12 in RIKEN Wako Institute, sponsored by RIKEN Advanced Center for Computing and Communication, and RIKEN Next-Generation Supercomputer R&D Center.

All the presentations (almost in Japanese) are now available from the RIKEN symposium program site.

The symposium was interesting to me since RIKEN reviewed their five year experiences about RIKEN Super Combined Cluster (RSCC) that was the seventh of TOP500 list in June, 2004 and will be transferred to next system, RIKEN Integrated Cluster of Clusters (RICC) to be operated from August, 2009. Aggregated computing capacity of RICC is estimated 100+ 106+ 64 TFLOPS, i.e., 270 TFLOPS according to Kurokawa-san’s presentation (the sixth presentation)

Relative to the Next-Generation Supercomputer (NGS) of Japan, a six floor building framework appears in Kobe Port Island (it looks like a Blue Waters building, and is higher than it), and prototyping and evaluation phase for NGS system units are scheduled this year. Along such schedule, the role of the RICC is enhanced to:
1. Provide application development environments for NGS
2. Provide HW/SW testing environments for post-NGS
3. Provide higher performance systems for current users
4. Develop new application areas of RIKEN,
according to Dr. Ryutaro Himeno, director of RIKEN Advanced Center for Computing and Communication.

In the afternoon sessions, there were three GPGPU presentations, such as performance evaluation with HIMENO benchmark. The RICC will attach 100 GPGPUs with a 100 node-multi purpose cluster system.

More details with many pictures are published by a mycom’s reporter in their web site (in Japanese).

Mar 17, 2009

Prof. Genki Yagawa and Dr. Tadashi Watanabe, world-class researcher and architect in HPC, are awarded the Japan Academy Prize in FY2009



The Japan Academy (Nippon Gakushi-in) was established on 15 January 1879 in the Meiji Period for the purpose of advancing education and science in Japan. Now it is operated under the auspices of the Ministry of Education, Culture, Sports, Science and Technology,

The Japan Academy announced 10 awardees of the Japan Academy Prize in FY2009 last week, and Prof. Genki Yagawa and Dr. Tadashi Watanabe, world-class researcher and architect in HPC are awarded it.

Dr. Genki Yagawa, professor emeritus, University of Tokyo, executive member of Science Council of Japan, and Director of Center for Computational Mechanics Research, Toyo University is well-known in nuclear engineering research activities, and a leader of multi-scale, multi-physics phenomena’s simulation project of Japan recently.

Dr. Tadashi Watanabe, Project Leader of Next Generation Supercomputer R&D Center, RIKEN is a world-class supercomputer architect who received Cray award and so on.

It must be a landmark event that the Japan Academy awards researchers in the third mode of sciences (HPC) besides those of rich and traditional experimental and theoretical sciences.

Mar 10, 2009

HPC ASIA & APAN 2009 in Taiwan Last Week



Last week, HPC ASIA & APAN 2009 was held on March 2 – 5 in Kaohsiung, Taiwan. Its presentation materials and proceedings are ready for download in advanced program.

It is the 10th HPC Asia event hosted by National Center for High-Performance Computing (NCHC) in Taiwan. The seventh HPC Asia 2004 was held in Japan, followed by HPC Asia 2005 in China and HPC Asia 2007 in Korea

Taiwan government is positive in HPC, and researches from Taiwan also look very smart and active, such as Hsu-san, father of Deep Blue project, or Chiu-san who is developing Blue Gene, I think.

The first day of HPC ASIA & APAN 2009 is dedicated to tutorials and three remaining days are dedicated to keynotes, presentations and workshops.

AIST, Tokyo Institute of Technology, Keio University, Kyushu University, ISIT and RIKEN from Japan appear in the proceedings in HPC.

There are several keynote and invited speakers, such as well-known Jack Dongarra, and William Kramer, Deputy Director of Blue Waters project (recently moved from General Manager, NERSC), Zhiwei Xu in ICT developing Dawning 5000, #1 of TOP500 in Asia at present, and then Mark Seager in LLNL leading 20 PFLOPS Sequoia procurement.

There looks nothing new for their presentation materials. What I find is that Rob Pennington (he visited Tokyo last fall) and Edward Seidel left Blue Waters project and William Kramer and William Gropp joined.


This is just an aside relative to HPC in Asia/Middle East. TOP500 Asia is announced. It targets Asia and the Middle East and assembles and maintains a list of the 500 most powerful computer systems in this region. The list is being compiled twice a year during GITEX exhibition in Riyadh and Dubai. There’s also a variety of TOP500.

Mar 6, 2009

New Earth Simulator (ES2) and New Plasma Simulator into operation



According to the recent Japanese news releases, two Japanese supercomputer sites announced start of operation. One is a 131 TFOPS new Earth Simulator (ES2) in Japan Agency for Marine-Earth Science and Technology (JAMSTEC) and another is a 77 TFLOPS new Plasma Simulator in National Institute of Plasma Science (NIFS) in Toki of Japan.

ES2, successor of the well-known Earth Simulator is NEC SX-9/E vector supercomputer and the new Plasma Simulator is Hitachi SR16000 model L2 that is a POWER6 based supercomputer.

The two supercomputers are designed using different architectures from general purpose commodity based cluster machines that rapidly become dominant even in HPC as shown in Top500 supercomputer list.

JAMSTEC and NIFS can clearly specify their real requirements for successors, not only performance but also memory capacity, memory bandwidth, etc. in application points of view and may consider consistency of their major application codes and programming know-how for successors. Hence, it is imagined that NEC SX-9/E and Hitachi SR16000 have been evaluated the best for JAMSTEC and NIFS computational environments respectively.

The following are summary of the two different systems based on each specification below.
(For comparison purpose, some data of ES2 and SR16000 are filled with SX-9 model A and some IBM POWER6 public material respectively that can be reasonable assumption. The value with ** indicates complemented value.)

- The FLOPS per core of SX-9 is five times faster than POWER6 (you do not surprise it because a SX-9 CPU includes 8 vector units and a scalar unit)

- SX-9 gives large and steady memory bandwidth. Memory transfer per FLOPS , 2.5 Byte/FLOP, is very large and stable because of vector architecture (no cache).

- A Byte/FLOPS of POWER6 is varying between 0.21 and 4, depending on data location, i.e., on cache or memory. Hence, cache-aware programming is desirable.

- SX-9 is expensive and not green (low MFLOPS/W), probably due to rich devices in order to deliver highest vector performances.

- Peak performance per node is almost same between two simulators, approx. 820 GFLOPS in ES2 and 620 GFLOPS in the Plasma Simulator (SR16000) and less than 1 TFLOPS.

- SR16000 gives a 102 MFLOPS/W energy efficiency, more than three times better than SX-9. (According to Green 500, top is 500 MFOPS/W in IBM QS22 using PowerXCell 8i processor and then 372 MFLOPS/W in Blue Gene/P. Even a latest Xeon Quad-core server can provide 200+ MFOPS/W energy efficiency.)

- SX-9 uses traditional air cooling. On the contrary, SR16000 adopts an efficient direct water cooling system.

The following are characteristics of ES2 and new Plasma Simulator.

● ES2 (NEX SX-9/E)
(System)
- 131 TFLOPS vector peak performance, 20 TB memory, Fat-tree Network
- Number of nodes: 160
- Air Cooling
- Peak Performance/Power: *about 27.3 MFLOPS/W (819.2 GFLOPS/30 KVA)
- OS: NEC SUPER-UX
-Construction and 6 yr lease fee: about 18.9 B yen (about $192M)

(CPU)
- Vector Peak Performance: 102.4 GFLOPS (3.2 GHz Clock)
- 8 Vector units + 1 Scalar unit
- 32 port-memory port crossbar
- 65 nm CMOS 11 cupper layers

(Node)
- Vector peak performance: 819.2 GFLOPS
- CPU/node: 8
- Memory/node: 128 GB (SMP)
- Memory Band Width: **2,048 GB/s (8 CPU)
- Byte/FLOP: **2.5
- Inter node transfer: 128 GB/s (8 GB/s x 8 x 2)

● New Plasma Simulator (Phase 1: Hitachi SR16000 model L2)

(System)
- 77 TFLOPS peak performance, 16 TB memory, InfiniBand Fat Tree Network
- Number of nodes: 128
- External storage: 0.5 PB
- Direct water cooling
- Peak performance/Power: 102.1 MFLOPS/W
- OS: IBM AIX5L
- Contract price: About 5.4 Byen (about $55M)
In Phase2 (2012/10~2015/3), it is upgraded to 315 TFLOPS)

(CPU chip)
- Chip peak performance: 37.6 GFLOPS (Dual core)
- Dual core POWER6 processor (4.7 GHz clock)
- 32MB L3cache
- 8 channel memory controller (DDR2/DDR3)
- 65 nm CMOS cupper + SOI

(Node)
- Peak performance: 601.6 GFLOPS
- CPU core/node: 32
- Memory/node: 128 GB (**32 way cc-NUMA SMP)
- Memory bandwidth: **128~160 GB/s (**4~5 GB/s x 32 core)
(When data locate in L2 or L3 cache, bandwidth significantly becomes large. Memory band width behavior is different from vector processor which shows steady value.)
- Byte/FLOP: **0.21 (data on memory)~4 (data on L2 cache)
- Inter node transfer: 32 GB/s (bi-direction)

NCAR's 76.4 TFLOPS BLUEFIRE is IBM p575 that is almost same peak performance as the new Plasma Simulator.

If we simply measure value of supercomputers with a performance/price scale, commodity based cluster may become the most favorite. However it must be essential in HPC that a scale depends on a supercomputer site and there should be different architecture systems that user can choose among by their own scale, such as memory, reliability, power, space in addition to peak performance and price.

Based on my hard experience against Japanese vector supercomputers around 1985 to 1995 in IBM Japan, vector supercomputer's advantage should be rediscovered by successive innovative challenges including energy efficiency and price, if possible.