Distributed Processing

Top Previous Next

Distributed Processing Overview

As the name suggests, the distributed processing principle is used to divide (distribute, parallelize) computation tasks between available computer resources. The goal of distributed processing is to decrease the overall task completion time, or, in other words, increase the speed of calculations. Formula optimization in ChaosHunter is a natural candidate for distributed processing because ChaosHunter maintains a genetic pool of formula candidates which can be divided between separate computing agents running in parallel. An agent can be either the main ChaosHunter application (we will call it the server) and smaller programs (clients) that run separately from the server. Both server and clients have a way of communicating with each other. During optimization, the main ChaosHunter program (server) divides the genetic pool into smaller chunks and distributes them between agents. Agents do all the necessary formula calculations and send results back to the server. The server collects all the results and goes on to the next genetic generation.

Besides distributing, the server acts as a computing agent. After the server distributes the workload between clients, it does not simply sit idle waiting for results from clients. It also performs computations on a chunk that it has assigned to itself. Clients may run either on the same computer where the server runs and/or on remote networked machines.

Distributed processing in ChaosHunter takes advantage of multiple cores/CPUs on local and remote networked computers. Below are possible computer configurations where distributed processing might be beneficial in order to reduce optimization time:

1.A stand-alone computer with a multiple-core CPU, no network.

2.A computer with single-core CPU and remote computers located on the same LAN.

3.A computer with multiple-core CPU and remote computers located on the same LAN.

In cases 2 and 3 remote computers can be either single- or multiple-core/CPU.

Remote computers must be on the same LAN and the same workgroup with the server in order to see each other.

Distributed processing in ChaosHunter does not work through the Internet.

Installation

There is a single installation file to be run on the server machine and remote computers. There is no separate installation for remote computers. On remote computers, run the same installation file that you run on the server. This will set up all necessary files required for the ChaosHunter client to run remotely. Then go to your main computer where the ChaosHunter server runs and put a check mark against the desired remote computers on the Clients menu -> Remote Settings dialog.

During setup, ChaosHunter installs three main components of the distributed processing system:

ChaosHunter.exe - main ChaosHunter application (server)

ChaosHunterClient.exe – client ChaosHunter application

WSGLocator.exe – the locator.

On remote networked computers, when they act as clients, the ChaosHunter server is not being used. However, roles can easily be switched here: a client machine can become a server, and a server becomes a client.

In order for the ChaosHunter distributed system to work, we use a component called WSGLocator. It is a small executable program that runs on each local and remote computer. The locator starts when Windows starts, and runs all the time in the background. You can see WSGLocator.exe running on the Processes tab of the Windows Task Manager. The purpose of WSG Locator is to start and shut down clients upon commands from the server.

The setup automatically starts the WSGLocator at the end of installation. The Windows firewall or any other software firewalls that may be installed on your computer will usually display a dialog box (see below) informing you that the locator application (WSG Locator or WSGLocator.exe) is blocked from accepting connections from other computers. You have to press the “Unblock” button on the Windows firewall dialog. If your computer is running another type of firewall, you need to set the firewall to permanently allow the WSG Locator to communicate with other computers on the network.

If ChaosHunter Is Not Talking to Clients

If your Windows Firewall does not ask you to unblock the WSGLocator, the Firewall may nevertheless block ChaosHunter from communicating. One clue that this is happening will be that remote computers show up in your Clients->Remote Settings menu, but ChaosHunter does not start clients running on them when you begin optimizing a model. You will then either have to turn off the Windows Firewall (which some anti-virus progams do anyway) or tell the Firewall that the WSGLocator is an exception. Go to the Windows Control Panel and select "Windows Firewall". If you decide to list the WSGLocator as an exception, you will have to select the "Exceptions" tab, select "Add Program", and browse to the WSGLocator in the "Windows" folder. You may have to do this on all computers where you want clients to be able to run. We recommend you do this on the server computer as well.

Windows Firewall

Also, the setup creates a system registry key that starts the WSGLocator when Windows starts. The WSGLocator runs from the Windows directory.

Antivirus Programs Can Block ChaosHunter Execution

When running the ChaosHunter server for the first time, you will most probably receive a warning from your antivirus program if you have such a program installed on your machine. The warning will say that ChaosHunter is trying to listen for connections from other computers on the network (and perhaps within the same computer even if you are not using the Network version). The warning completely halts ChaosHunter execution and waits for your input. In many cases, the antivirus warning is hidden behind the ChaosHunter window and ChaosHunter appears not to be responding. In this case, you need to locate the antivirus program warning dialog on the Windows Task Bar, bring it up, and instruct it to permanently allow ChaosHunter to listen for connections from other computers on the network.

ChaosHunter displays its own warning when an antivirus program halts ChaosHunter execution. The warning informs you why ChaosHunter is not responding and helps you unblock the antivirus program.

Communication Ports

The server and clients communicate through TCP (Transmission Control Protocol). They use a common communication port which can be set on the ChaosHunter Clients menu -> Remote Settings dialog -> “Server Port” box. The default Server Port is 1001.

The locator and the server communicate through UDP (User Datagram Protocol). The Server and WSGLocator use a common communication port which can be set on the ChaosHunter Clients menu -> Remote Settings dialog -> “Locator Port” box. The default Locator Port is 12345. If you ever need to change the locator port, do it with ChaosHunter shut down. Then browse in your Windows directory and double-click WSGLocator.exe. This will bring up the locator window in front. Change the locator port value, press Apply and Hide. Restart ChaosHunter. The Clients menu -> Remote Settings dialog will display the new locator port value.

The Server port must be different from the Locator port. Both ports must be different from any other port used by any other application/service running on that machine. Since port numbering goes up to 65535, port conflict is usually not a problem.

Setting Up Clients

The ChaosHunter Clients menu -> Remote Settings dialog allows you to select computers which you want to engage in distributed processing. If it is a stand-alone computer and there are no remote computers (no network), only that computer may show on the list displaying the number of cores/CPUs available. A stand-alone computer shows up on the list only if it has more than one core/CPU. If there are remote computers, they will be listed by their names. No core/CPUs count information is available for remote computers.

By default, no local or remote computer is selected for distributed processing. To enable local or certain remote computer(s) to be available for distributed processing, go to the Clients menu -> Remote Settings dialog and click on the desired computers. When you press OK, the program sends signals to local/remote computers to run respective clients. It may take a couple of seconds for clients to communicate with the server and establish a connection.

Once the connection is established, a counter of running clients appears at the bottom right corner of the program. Clients are kept up even when ChaosHunter does not perform optimization. They just sit idle and wait for an optimization to start. When ChaosHunter shuts down, it closes all clients. You can always display a list of all running clients by double-clicking the Clients panel on the status bar at the bottom right corner of the program. The popup window will show the clients list with names of respective remote computers. This window is for information only; you cannot stop or start clients from there, or change any of the remote settings.

When you want to remove local and/or certain remote computers from distributed processing, go to the Clients menu -> Remote Settings dialog and uncheck those computers.

By default, clients run invisibly to the user. If you want to display them, uncheck the box “Hide clients when running” on the Clients menu -> Remote Settings dialog.

If you ever need to close down remote clients and restart them, use the Clients -> Restart Clients menu. This command sends signals to all computers selected on the Clients -> Remote Settings window to unconditionally shut down all running ChaosHunter clients and to start them fresh. One possible use of this command is when you run the ChaosHunter server (but not optimizing) and decide to turn on and add a remote computer into distributed processing. Another use of this command is in the case where you shut down the server, but remote clients don't go down as expected. Next time you run the ChaosHunter server, it shows that the number of clients is doubled. Clicking on the Clients -> Restart Clients menu will fix the situation and bring the number of clients to its correct value.

When ChaosHunter is optimizing, it cannot add new remote clients to the distributed processing loop. However, when you shut down a remote machine on which a ChaosHunter client is running, the server automatically removes that client from distributed processing and continues optimizing on clients remaining on the list.

How Many Clients Run on Local/Remote Computer(s)?

In order to utilize all available cores/CPUs, ChaosHunter brings up one client per core or CPU. This means that on the local multi-core machine there are (number of cores - 1) clients running, because one core/CPUs is always reserved to run ChaosHunter server and its internal agent. On a remote multi-core machine there are as many clients running as there are cores/CPUs.

For example, if you run ChaosHunter server on a dual-core computer A and include a remote quad-core computer B, then there will be total 5 clients running - one on computer A and four on computer B. However, the total number of computing agents will be 6 (5 clients + main server agent). In other words, the overall optimization task is distributed into 6 separate smaller tasks running concurrently.

Speed Gain Factors

Speed gain on a stand-alone multiple-core/CPU machine arises from the fact that Windows dynamically distributes concurrently running server and client applications between available multiple cores/CPUs. Windows does this parallelization automatically.

Speed gain within a network environment is obviously due to parallelization between physically separate computers.

The overall speed gain in ChaosHunter depends on the balance between pure computation time and overhead time. Pure calculation time is what the server or client spends just to compute formula outcome for a given formula and inputs to it. The overhead time includes time needed:

(1)for the server to divide the genetic population into smaller pieces before sending them out to clients;

(2)to physically transmit pieces to clients;

(3)for the client to receive and pre-process incoming data before even being able to compute the formula;

(4)to physically transmit formula outcomes back to the server;

(5)for the server to collect results from separate agents and process them.

The larger pure computation time is relative to overhead time, the more speed gain. This means that distributed processing delivers the best speed gains possible when optimization problems have a large number of rows (a couple thousand or more), larger numbers of columns, and bigger maximum equation sizes (longer formulas increase pure computation time). Increasing the genetic pool size has a rather limited effect due to the fact that the bigger the population, the larger the overhead time.

By experimentation we’ve found that good speed gains (close to theoretically possible) can be achieved if average computation time per single individual on the server agent is greater than 10 ms. You will find this statistic in the column “Avg.time” on the optimization screen on the right pane where all agents are listed. Also, each client displays average time per individual statistics on its own window.