The UCSF Computer Graphics Laboratory, home to the Resource for Biocomputing, Visualization and Informatics (RBVI), operates a cluster of high-performance servers to provide for the compute- and data-intensive needs of our user community. The cluster appears to users as a single computing environment, and is comprised of both hardware and system software as described below.Hardware:
Our server hardware is based on Compaq Computer Corp.'s AlphaServer ES45. The AlphaServer ES45 is a high-end server with four processors organized in a symmetrical multiprocessor (SMP) architecture. This is the same server that forms the building block for the Terascale Computing System recently installed at the Pittsburgh Supercomputing Center. The CGL environment includes four ES45 servers. Each server has four 1-GHz Alpha EV68 processors and 4 GB of memory, with each processor having a peak floating-point capability of two gigaflops (two billion calculations per second). Additional performance details are available here. The servers are interconnected using a high-bandwidth, low-latency interconnect technology known as Memory Channel, supporting 90MB/s channel bandwidth between any two server nodes and 2.1 usec end-to-end latency. An older AlphaServer ES40 is also configured as a node in this server cluster.Software:
The server hardware described above is integrated into a single software environment known as a "cluster." Our cluster software is based on Compaq's TruCluster Server system, and provides for high-performance, scalable, highly available services. All server nodes utilize the same "single system image" of the operating system, and home directories, user files, and system files are accessible from all nodes in the cluster, resulting in location independence for all application software. This technology makes it possible to do application load sharing among cluster nodes, so that large compute-intensive jobs can be run on separate nodes from interactive jobs, for example. This technology also provides a highly-available computing environment, since a hardware or software failure on one member of the cluster results in the migration of those services provided by that node onto the remaining active nodes of the cluster. The entire cluster is accessed through a common cluster address (cluster alias), known as "socrates.ucsf.edu". Depending on which server nodes are available and which specific service is being accessed (e.g. our web server), the cluster alias resolves to a specific node that then provides the requested service. Additional technical details on TruCluster Server are available here. The Open Portable Batch System is used to control the execution of compute-intensive jobs on the cluster.
Laboratory Overview | Research | Outreach & Training | Available Resources | Visitors Center | Search