Tuesday, November 4, 2008

Computing in Clusters

Computing in Clusters

Computing plays two fundamental roles in bioinformatics and computational biology, according to Royyuru. First, computers participate in data analysis, ranging from accessing high throughput data and sequencing single nucleotide polymorphisms to analyzing microarrays and experiments in proteomics. “The biggest challenge,” says Royyuru, “is reducing the dimensionality of these data so that scientists can understand them.” To do that, computing should combine data mining with biological insight. Second, computers can run in silico models that test biological theories. “You can ask questions like: What happens to a cell system when you knock out a certain gene or administer a specific drug?” Royyuru explains. “In silico modeling must capture homeostatic behavior of cells, systems, and organisms. Then, scientists can explore what happens during disease or other perturbations.” Reaching higher levels of modeling demands increased computing capabilities, and all major information technology vendors—including Apple, IBM, Hewlett-Packard, and Sun Microsystems—offer solutions that address this market.


One computing advance relies on tightly coupled clusters of processors. In other words, many processors can be connected—with high levels of communication between them—to work as a team, such as the IBM P series of supercomputers and the Sun Fire 15K server with Sun Cluster software. Tightly coupled clusters work very well for many applications, including simulating molecular biology, chemical kinetics, protein folding, and so on.


The question for the future is: How many processors can be packed into a reasonable space? Today’s machines use a few thousand processors to churn out a few tens of teraflops, or trillion floating point operations per second (flops). “You can do a good amount of science on these,” says Royyuru, “but there is more science to get to that is not possible with these machines.” He thinks it will take petaflops (1,000 trillion flops) machines to simulate the behavior of proteins, for example. To get that kind of power, computer scientists could simply put more boxes in a room or rethink how they put together processors.


Royyuru points out that putting more boxes in a room will eventually become increasingly difficult, because even 2,000 processors in a traditional design occupy the floor space of a basketball court. Scaling up to petaflops machines would grow too large and take too much power. Instead, IBM looked for new approaches to computer architecture. In project Blue Gene, for example, investigators at IBM and the Lawrence Livermore National Laboratory converged on so-called cellular architecture, which mimics the way biological structures are composed. Blue Gene should crank out 6 teraflops with a single rack of equipment—about one thousand processors. This year, scientists at IBM’s Watson Research Laboratory hope to put together half a rack of Blue Gene chips. “We learn as we go along,” Royyuru says, “and we intend to make this hardware relevant to biological research.”

No comments: