Proposal for a
High Volume Data Analysis
And
Finite Element Analysis
Solution
At the Naval Surface Warfare Center
Prepared for
Code 653 Management
Written by:
Greg Hildstrom
Edited by:
Michael McDonald
Of the
Division of Structures and Composites
Code 653
December 21, 2000
Memorandum
To: Bill Hay
From: Greg Hildstrom
Date: 12/21/2000
Subject: Formal Proposal to Save Money
Here is the formal proposal you asked me to write following our last several discussions.
I have outlined a system that will save our department millions of dollars over the next ten
years. In my proposal I define the problem, discuss our current solution, discuss future
problems, and provide a feasible solution.
Please read over my proposal very carefully. Do not hesitate to do a little research on your
own. Please keep an open mind and keep in mind all aspects relating to this project.
Please do not hesitate to email me or call me to set up a
meeting. Thank you for your time in advance.
Table of Contents
Page 1: Title Page
Page 2: Memorandum of Transmittal
Page 3: Table of Contents
Page 4: Executive Summary
Page 5: Statement of Problem & Objective
Page 6: Need & Benefits
Page 7: Qualifications of Personnel
Data Sources
Limitations and Contingencies
Scope
Methods
Page 8: Methods
Page 9: Methods & Timetable
Page 10: Materials and Equipment
Personnel
Available Facilities
Needed Facilities
Cost
Page 11: Expected Results & Feasibility
Page 12: Summary of Key Points & Request for Action
Executive Summary
Code 653 of the Naval Surface Warfare Center (NSWC) is currently unable to tackle large
volume data analysis projects. Some of the largest datasets ever analyzed at the Naval
Surface Warfare Center were about ten gigabytes in size. The analysis of this data required
custom software to be written and several thousand man-hours to write, test, compile, debug,
and analyze. The actual analysis of this data was done on a dual-processor personal
computer and required two weeks of compute time to complete.
The typical time between a model test and the finished report, where the analyzed data is
presented and summarized, is typically more than a year. This prevents many customers
from wanting to do future business with the NSWC because of the huge cost and time delay.
Customers want to pay and see immediate results.
Finite Element Analysis (FEA) is one method of cutting down on model test cost. FEA tests
computer generated models under a variety of conditions in software. The results are then
analyzed and displayed. This means that several structures can be tested in different
conditions in the computer without ever actually building the structure. This can cut down
on overall cost by reducing hardware cost and man-hours. Our current FEA computing
system consists of several Silicon Graphics workstations and a Silicon Graphics server. The
server machine contains four processors and costs around $30,000. We are currently
licensing FEA software for around $15,000 per year. Our current solutions for high volume
data analysis and FEA are inadequate and too expensive.
This proposal examines our current and future analysis problems. It then offers an outline of
a solution, a plan that deals with future problems, and a recommendation for an initial
hardware purchase.
I have two primary goals. I plan on significantly reducing the time between model test data
acquisition and analysis report generation. I also plan on reducing model test cost by
increasing the speed of FEA. Both of these goals reduce cost, decrease man-hours, decrease
hardware cost, increase accuracy, and attract customers.
I plan on solving these problems through the use of standard personal computer hardware,
free open-source software, and custom analysis software.
The combined computing power of several personal computers, working simultaneously,
outperforms our current high end Silicon Graphics server at a fraction of the cost. This price
versus performance gain combined with free and custom software will solve our problems.
I propose that we create a Beowulf class supercomputer to increase our computing
performance. A Beowulf computer system is a group of two or more desktop personal
computers, which function as a single computer. Beowulf computers utilize the processing
power of each individual desktop computer to solve computationally intensive problems
faster than a single traditional desktop or server computer.
Statement of Problem
Code 653 of the Naval Surface Warfare Center (NSWC) is currently unable to tackle large
volume data analysis projects. Some of the largest datasets ever analyzed at the Naval
Surface Warfare Center were about ten gigabytes in size. The analysis of this data required
custom software to be written and several thousand man-hours to write, test, compile, debug,
and analyze. The actual analysis of this data was done on a dual-processor personal
computer and required two weeks of compute time to complete.
Future data analysis is not going to get any easier. We are going to continue to record more
data channels over more tests and for longer periods of time. Our datasets are going to be
much larger and we will be stuck with no efficient way to analyze the data. The analysis of
larger datasets is limited by computing power and software development.
The typical time between a model test and the finished report, where the analyzed data is
presented and summarized, is typically more than a year. This prevents many customers
from wanting to do future business with the NSWC because of the huge cost and time delay.
Customers want to pay and see immediate results.
Finite Element Analysis (FEA) is one method of cutting down on model test cost. FEA tests
computer generated models under a variety of conditions in software. The results are then
analyzed and displayed. This means that several structures can be tested in different
conditions in the computer without ever actually building the structure. This can cut down
on overall cost by reducing hardware cost and man-hours.
The current problem with FEA is also computing power. Typical FEA structures can be
tested and analyzed in several hours, but only under one condition and after they have been
built inside of the computer. If we would like to test a structure in a variety of conditions or
perform a cyclic test, the testing and analysis can take several days of computing time.
Out current FEA computing system consists of several Silicon Graphics workstations and a
Silicon Graphics server. The server machine contains four processors and costs around
$30,000. We are currently licensing FEA software for around $15,000 per year.
Out current solutions for high volume data analysis and FEA are inadequate and too
expensive.
Objective
This proposal examines our current and future analysis problems. It then offers an outline of
a solution, a plan that deals with future problems, and a recommendation for an initial
hardware purchase.
I have two primary goals. I plan on significantly reducing the time between model test data
acquisition and analysis report generation. I also plan on reducing model test cost by
increasing the speed of FEA. Both of these goals reduce cost, decrease man-hours, decrease
hardware cost, increase accuracy, and attract customers.
I plan on accomplishing my goals though the use of a new hardware and software solution.
The solution to our current problems lies in a huge increase in computing performance, a
decrease in hardware cost, and a decrease in software cost.
I plan on solving these problems through the use of standard personal computer hardware,
free open-source software, and custom analysis software.
The combined computing power of several personal computers, working simultaneously,
outperforms our current high end Silicon Graphics server at a fraction of the cost. This price
versus performance gain combined with free and custom software will solve our problems.
Need
I became aware of our high volume data analysis shortcomings when I became aware of the
British Trimaran project. We helped set up the data acquisition equipment for their full-scale
sea trial. They will be acquiring data at the rate of over 20 gigabytes per day for almost an
entire year. By the end of the trial they will have well over one terabyte of data, which is
equivalent to one thousand gigabytes.
The British asked us if we had any software to handle this kind and volume of data. We did
not have any hardware or software method of analyzing this massive amount of data. We are
unable to take on a data analysis job because we do not have the means. This does not make
sense because we are in the data analysis business.
If we used our current method of data analysis and faster computers, this analysis job would
take over twenty years. We need a better solution.
We build models of structures constantly. We end up testing a vast amount of conditions.
Many of those conditions do not end up yielding any useable results. We actually end up
breaking test equipment and specimens during some of these unusable tests. This wastes
hardware and man-hours.
If we had a more efficient method of doing FEA, we could test a number of those conditions
inside of the computer. This would allow us to see which conditions will be useful in a real
test and which conditions will not.
Our current method of FEA is not fast enough to test multiple conditions in a timely manner.
Again, we need a better solution.
Benefits
The benefits of a new high performance low cost computing solution are numerous. A new
system will decrease hardware cost and decrease man-hours. This results in saved money
and faster results. Faster results will attract more customers, which will result in increased
revenue.
Qualifications of Personnel
The people working on this project will need to have background in three primary areas. We
will need people working on computer hardware and software design. We will need the
computer hardware and software designers to be fluent with the Linux operating system. We
will need people who are familiar with current data analysis methods and procedures. We
will also need people who are familiar with FEA design, software, and methods. We will
also need some sort of a manager who will control communication among the different areas
of expertise and control funding.
Data Sources
There is a wealth of information relating to projects similar to this one at
http://www.beowulf.org. This organization was started by NASA to compile information
about high performance low cost computing solutions. I found all of my information through
this site or through links on this site.
Limitations and Contingencies
This project will be limited by two primary factors. The most severe limitation will be the
amount of money that can be spent. More money will result in better performance. One goal
of this proposal is to suggest a starting system that is easily upgradeable. I aim to find a good
price and performance tradeoff. The other limitation will be the amount of people who will
be working on this project and how many hours they can dedicate to this project. We will
also be limited by the desired results. More features will require more time to implement. I
have come up with a realistic set of features, which can be implemented during the
timeframe I have come up with.
Scope
The proposed plan includes a detailed description of the methods, costs, personnel,
feasibility, and benefits that will result pending approval of this project.
Methods
I plan on accomplishing my goals though the use of a new hardware and software solution.
The solution to our current problems lies in a huge increase in computing performance, a
decrease in hardware cost, and a decrease in software cost.
I plan on solving these problems through the use of standard inexpensive personal computer
hardware, free open-source software, and custom analysis software.
The combined computing power of several personal computers, working simultaneously,
outperforms our current high end Silicon Graphics server at a fraction of the cost. This price
versus performance gain combined with free and custom software will solve our problems.
Our current Silicon Graphics server machine cost over $30,000. This machine has several
gigabytes of memory and four parallel processors. Each analysis task is broken up into small
pieces. Each of the server's four processors works on solving a small piece of the problem at
the same time.
The Silicon Graphics server is very expensive and also very expensive to upgrade. Memory,
disk storage, and processors are all made especially for Silicon Graphics for use in their
machines. Not many of these are sold, which drives up the overall cost.
Standard desktop computer are mass-produced. Millions of personal computers are sold,
which drives the cost down. There are also many different competing personal computer
hardware vendors, which also helps to drive down the cost.
I propose that we create a Beowulf class supercomputer to increase our computing
performance. A Beowulf computer system is a group of two or more desktop personal
computers, which function as a single computer. Beowulf computers utilize the processing
power of each individual desktop computer to solve computationally intensive problems
faster than a single traditional desktop or server computer.
NASA developed Beowulf computers as an alternative to expensive supercomputers. The
original NASA research was conducted under Project Beowulf. Their goal was to find a
low cost solution to their high performance computing problems.
A Beowulf supercomputer consists of two or more standard desktop computers and
communication hardware. Each of these desktop computers must contain memory, a
processor, a communication device, and a hard drive. The communication device is typically
a standard Ethernet network card. The communication hardware is usually an Ethernet
switch and Ethernet cable. A node is a single one of the desktop computers that makes up the
Beowulf computer. Each node runs operating system software and communication software.
The operating system will be Linux and the communication software will be Message
Passing Interface (MPI) or Parallel Virtual Machine (PVM).
The operating system and communication software must be installed on each node. Each
node must be connected to a power outlet and the Ethernet switch. One node will act as the
control workstation of the Beowulf computer. This node will have a monitor, keyboard, and
mouse attached to it. The user of the Beowulf will interact with it through this node. The
communication software breaks up complicated computing tasks among the different nodes
in the Beowulf computer. The communication software enables each node to work on a
piece of the same problem simultaneously.
The combined computing power of tens or hundreds of desktop computers often rivals the
power of the world's most expensive supercomputers at a fraction of the cost. The standard
unit for measuring high performance computing power is the GFLOP. It stands for Giga-
Floating-Point-Operations/sec. One GFLOP is one billion decimal number calculations
performed every second. A new desktop computer can reach near 1 GFLOP. If a Beowulf
computer was constructed of 100 new desktop computers, its net performance could reach
nearly 100 GFLOPS. Beowulf computers can be constructed for under $1000 per GFLOP.
$1000/GFLOP is much cheaper than the world's leading supercomputer manufacturer:
CRAY Research. CRAY's price/performance ratio is usually over 5 times greater.
You can think of a Beowulf supercomputer as a landscaping crew. If you needed to dig a
large trench, it may take you several days, but you would only need a shovel. If you had
money to burn, you could buy a bulldozer to get the job done quicker. Your other alternative
is to get more people to help you dig. Twenty people may be able to dig as fast as the
bulldozer, but at a fraction of the cost. If you hired even more people, the net digging power
of your crew could match or beat the bulldozer. Beowulf computers utilize power in
numbers and not brute force. The digging crew works the same way that a Beowulf
supercomputer does. The expensive bulldozer works the same way our current Silicon
Graphics server works.
Beowulf computers solve large problems, similar to digging a large trench. Each computer
in the Beowulf works on a piece of the problem at the same time, which is similar to having
an entire landscaping crew digging at the same time. This method of problem solving is
much cheaper than buying a traditional supercomputer, which is like buying a bulldozer.
Beowulf computer hardware is cheap, powerful, and easily upgradeable. If you want to
upgrade the processing power of our Silicon Graphics machine, you must completely replace
the processors or buy a whole new machine. The old processors will be useless. If you
wanted to expand the disk storage space or the processing power of a Beowulf machine, you
can just add an entire new personal computer to the cluster of computers. The old processors
can be utilized at the same time the new ones are because all of the computers are working
together. New machines only need to be configured and plugged into the communication
hardware.
Each node of the Beowulf will run the Linux operating system. Linux is free and the source
code is readily available. We will be programming the custom analysis software using the
C++ programming language, the VSIPL toolkit, and either the MPI or PVM parallel
communication libraries. The parallel communication libraries are used to send small parts
of large problems to each node of the cluster. A free ANSI/ISO compliant C++ compiler is
part of the Linux operating system. The MPI and PVM libraries are also available at no cost.
The Vector Signal Image Processing Library (VSIPL) is available at no cost and contains
many mathematical analysis functions. I have also found open-source FEA design and
analysis tools that were written to take advantage of Beowulf computers. We are currently
looking at no initial software cost.
Timetable
I propose that we begin buying parts for a simple hardware configuration in November so
they will arrive by early December. I will work with data analysts and co-op students during
the winter break of 2000, which includes December 2000 and January 2001.
We will work to assemble and configure the initial hardware system over the break. The
assembly will not take longer than a week. The rest of the one-month break will be spent
configuring software and planning development of analysis software.
The summer break of 2001, which extends from May 20, 2001 to August 20, 2001, will be
spent developing analysis software and writing system documentation. We will have a
working system, which will be capable of simple large-scale data analysis, by August 20,
2001.
Subsequent semester breaks from college will be spent planning and implementing further
development of the system, which will depend on funding and scheduling.
Materials and Equipment
We will need the computer hardware, which I detail in the Cost section. We will also need a
set of screwdrivers and wrenches for assembling the computers. We have plenty of tools in
the various workshops and test bays around the base. This will not be a problem; we will not
need to purchase anything.
Personnel
We currently have all of the staff that will be required for initial development of this system
at this time. Michael McDonald and I will be capable of hardware and software design and
implementation. Tom Brady, Rich Lewis, Dave Kihl, and Bill Hay will be available for help
with analysis procedure design. We also have several FEA designers on staff that would be
able to help with FEA software implementation. Bill Hay and I would be capable of
handling funding and communication between project members.
Available Facilities
We will require adequate space to assemble the Beowulf computer. The workspace behind
room B-19 should be sufficient. We need space to store the cluster of computers after it is
fully assembled. We can use space in B-19 to store the Beowulf permanently. We will also
require desks and chairs for everyone who will be working on the project on a regular basis.
We currently have spare office space and furniture, which will be able to accommodate the
project.
Necessary Facilities
We may want to consider storing the Beowulf in the climate controlled computer room to
maximize the system life. We may also want to have workspaces near the Beowulf to make
working on and with the system easier for the developers.
Cost
I have come up with an initial development system, which will run free software and have a
price-performance ratio better than our Silicon Graphics server.
Hardware Item Price Each Quantity Total
1GHz AMD Processor & Motherboard 417.84 5 2089.20
Computer Case & Power Supply 20.00 5 100.00
80Gb Hard Drive 254.00 6 1524.00
256Mb Ram Chips 160.00 6 960.00
AGP Video Cards 35.00 5 175.00
3Com 100Bt Ethernet Cards 40.00 6 240.00
Sony 8X CD-RW 131.00 1 131.00
DLINK 16-port 100Bt Ethernet Switch 236.00 1 236.00
MS Optical Mouse 40.00 1 40.00
MS Natural Keyboard 30.00 1 30.00
- -
Total $5525.20
Personnel Hours Per Week Length Of Time
Student Intern Summer/Winter Hire 40 6 Months The First Year
Co-op Student 40 9 Months The First Year
Data Analyst 5-10 9 Months The First Year
Engineer Data Analyst 1-5 9 Months The First Year
Management 1-5 9 Months The First Year
Expected Results
This system will be able to outperform our current Silicon Graphics server in raw floating-
point calculations per second. The real benefits will become tangible when we have a small
set of analysis programs that have been written to take advantage of the Beowulf computer.
These programs will make it possible to analyze a huge amount of data in a finite amount of
time. Which will end up saving NSWC money and making our results more appealing to
potential customers. Because we will be automating as much of the analysis procedure as
possible, human analysts' mistakes will be minimized. Less tedious and repetitive work will
need to be done by the analysts. This means that our results will end up more accurate in
addition to all of the other benefits.
When we get the FEA software working, we will eliminate the need for the expensive Silicon
Graphics server and all of the associated software licenses.
Feasibility
I have already used old personal computers, which were not being used, to construct a three-
node Beowulf computer. I was able to install the operating system and parallel
communication software with no problems. I wrote a couple of simple programs that tested
the parallel communication software. I wrote a program in C++ that performed over a
billion calculations. I ran the program on our fastest dual processor computer and recorded
the time. I then wrote the same program to take advantage of the parallel communication on
the Beowulf. The three machines finished faster than the newer dual processor computer.
The billion calculations were broken up among the three different processors on the old
computers. All three computers worked on the same large problem at the same time and
returned the results back to the computer I was using. I did all of this from a single
keyboard, monitor, and mouse.
I have already constructed a simple Beowulf computer and written Beowulf software. I have
demonstrated the effectiveness of this technology on a simple scale. I checked to make sure
that every piece of hardware is compatible with every other piece of hardware and with the
Linux operating system. This plan is 100% feasible. There are no reasons why this plan will
not work. The only variable is software development time.
Summary of Key Points
We definitely have a data analysis problem. We are unable to analyze large datasets. We
spend too much money on our current data analysis system. Our current system does not
perform very well.
If this project is funded, we will be able to reduce cost in several areas including hardware,
software, and man-hours. We will be able to achieve better results with this new system.
This new system will help us to get more business.
I have already built and demonstrated a working Beowulf computer system. I have
demonstrated the performance increase when programs are ported to the Beowulf system
architecture. This project is 100% feasible. The only possible variable is development time.
This system will save time and money.
Request for Action
The sooner the NSWC gets started on this project, the sooner we can start ordering
hardware, developing software, and start saving money. We need to act now before we are
offered contracts that are too cumbersome for us to handle.
Please carefully consider all of the facts I have given in this proposal. Please visit
http://www.beowulf.org to research Beowulf technology and how it is currently helping
other companies. Do not hesitate to email me or call me
if you have any questions. Please contact me as soon as you reach a decision or if you wish
to set up a meeting to discuss my proposal further.