Simplified SSH Port Forwarding

To simplify the process of connecting ParaView to HPC resources, we have developed a ParaView connection definition (.pvsc) file and a PBS script that allows the user to quickly and securely connect a ParaView client to the computational nodes of MJM.

Requirements

The technique described below was developed to solve a particular issue on the ARL DSRC systems, but can be easily transferred to other HPCMP resources. This technique assumes that the user has:

  • an account on MJM at the Army Research Laboratory DSRC (ARL), and
  • ParaView 3.4.0/3.6.1 running on a Windows, Linux or Mac machine.

When should a cluster be used?

Although running ParaView in client/server mode can be a little more complicated initially, the benefits make it worthwhile if any of the following conditions exist:

  • Your data is too large to move. It may be the case that your data is so large that it will take hours or days to move it to your desktop from the server where it was generated, or the desktop workstation does not have sufficient disk capacity to house the computed dataset.
  • Your data is too large to load into the memory of your machine. You may have tried loading your data in ParaView on your desktop and it failed because you it ran out of memory.
  • Your client has minimal rendering resources. Your desktop may be a thin client or have an outdated graphics card. In rare cases your graphics card may not be hardware accelerated.

If your data does not quite fit the conditions above, you will have to experiment to see if running ParaView in client/server mode is beneficial. Keep in mind that there will be latency added by transmitting large images over a network from a remote server and overhead due to server side communication on multiple cluster nodes. Since most of the HPC clusters do not have user-accessible hardware rendering, all images are generated in software before being sent to the client. Once connected in client/server mode, several parameters can be adjusted to improve rendering efficiency. Refer to ParaView_Client-Server_Mode for more information.

One-time Setup

The first thing that needs to be addressed is to provide the ParaView client with a definition of how to define the connection to the remote HPC resource. This is accomplished through the use of a ParaView connection definition file. Once this file is loaded into the ParaView client, the definition will be saved into a permanent configuration file. The specific steps are:

  1. Obtain a copy of the ParaView connection definition file that usually has a .pvsc file extension. This file can be downloaded from mjm.pvsc, or on systems at ARL that receive the standard ParaView distribution, the file is /usr/cta/CSE/Misc/mjm.pvsc.
  2. Register the ParaView connection file with ParaView through the following steps:
    1. In ParaView, click File -> Connect -> Load Servers
    2. Choose the .pvsc connection file identified in step 1 and click OK
    3. You should now see a screen that looks like this:

      PVBatch connect1.png
      PVBatch Connect 1

Connecting to the Server

  1. Get a Kerberos ticket
  2. Start ParaView. Once running do:
    1. Click File -> Connect
    2. Click "MJM", then -> Connect ->
  3. A list of user-configurable options will appear.

    PVBatch connect2.png
    Connection configuration options

    1. Select either ParaView version 3.4.0 or 3.6.1
    2. Set the SSH Executable location by typing it in the box or using the PVBatch_dots.png button. The actual path to SSH will be site-dependent, however you will want the HPCMP-kerberized version of SSH (for instance, on ARL systems, the path is /usr/krb5/bin/ssh). NOTE FOR WINDOWS USERS: The HPCMP-supplied program "plink.exe" will be used as the "SSH" executable. It will likely be in C:\Program Files\HPCMP\Kerberos\plink.exe or a similar location.
    3. Set your username and project number. These will be saved for the next time you use ParaView.
    4. Set the queue name. The options are debug, urgent, staff, high, challenge, cots, interactive, standard-long, standard, and background. You will likely not have access to all of these queues. For information on run times and priorities for each queue, see this link for PBS documentation on different systems.
    5. Local port number cannot be changed due to a bug in ParaView 3.4.0/3/6.1. If another service is running on port 50002 on your local machine, you will need to edit the servers.pvsc file that is part of your ParaView configuration. When the bug in ParaView is resolved, this option will be enabled.
    6. Remote port number and Connection ID can usually be ignored. If you are having trouble connecting to a server, it may be useful to try a different Remote port number, as someone else may already be using the port you requested.
    7. The number of processes sets the number of CPUs that ParaView can use. Since MJM has multiple processors per node, there will be multiple ParaView instances on each node.
    8. If you are memory-bound, it may be useful to change processor tiling from default. This will reduce the number of CPUs/node that are used.
    9. Wall time sets the length of time you want to run ParaView. There is a tradeoff in how long you request. If you request a long wall time, your job will take longer before it starts. If you request a short wall time, your program may be terminated before you are finished. Remember that you can use File -> Save State if you are getting close to your requested wall time. You could then start a new connection and use File -> Load State to pick up where you left off. This will also work if you need to increase/decrease the number of CPUs for your session.
  4. Click connect. The server can take 10 seconds to several minutes or more to connect. This is based mostly on the current load of the server and your priority in the queue.

Debugging Problems

  • Suggestion:
Debug and Interactive are the only queues that allow an interactive shell. This may be useful if you are troubleshooting the connection, as additional debugging information may be available.
  • Error:
ssh_askpass: exec(/usr/local/ossh/libexec/ssh-askpass): No such file or directory Host key verification failed.
Solution:
This means you have never logged into this machine with SSH before. Use Putty or SSH from the command line to access this machine. Respond 'yes' when are asked if you want to continue to connect to the machine. Try the ParaView connection again.
  • Error:
Permission denied (gssapi-with-mic).
Solution:
This can be a couple of things. The most likely is that you need to request/renew your Kerberos ticket. It can also mean that you do not have an account on the machine you are connecting to.

Under the Hood

Some of you are probably trying to figure out what is happening to make all this work. Once you have decided on the connection options and click 'connect', the ParaView client starts listening to the local port number. An SSH connection is then opened to the server name that is chosen and a PBS script is submitted to the scheduler. The SSH connection also includes a tunnel that connects the local port number on the client to the remote port number on the server name.

When the PBS scheduler decides it is time for your job to run, a script is run on the first cluster node. This script sets up a tunnel back to the login node that you picked and starts pvserver on all the nodes you were allocated by PBS. Because this second tunnel is set up, there is a pathway all the way from the cluster nodes to the client machine. This lets pvserver connect to a port on its node, which sends data to the cluster's login node, which sends data to your desktop. Because of the tunnels, any firewalls preventing TCP/IP traffic from the server to your desktop are avoided.

ParaView Client Connectiong to a Cluster
ParaView Client Connectiong to a Cluster

The Code

Most users will only need mjm.pvsc ( download).

If you are adapting this to work on another server, you will also need ParaView_batch.pbs (download).

mjm.pvsc

<Servers>
  <Server name="MJM" resource="csrc://127.0.0.1">
    <CommandStartup>
      <Options>
        <Option name="VERSION" label="ParaView Version" save="true">
          <Enumeration default="3.6.1">
            <Entry value="3.4.0" label="3.4.0"/>
            <Entry value="3.6.1" label="3.6.1"/>
          </Enumeration>
        </Option>
        <Option name="SSHLOC" label="SSH executable" save="true">
          <File default="/usr/brl/bin/ssh"/>
        </Option>
        <Option name="USERNAME" label="Username" save="true">
          <String default=""/>
        </Option>
        <Option name="PROJECTNUM" label="Project number" save="true">
          <String default=""/>
        </Option>
        <Option name="SERVERNAME" label="Server name" save="true">
          <Enumeration default="mjm-l1.arl.hpc.mil">
            <Entry value="mjm-l1.arl.hpc.mil" label="mjm-l1.arl.hpc.mil"/>
            <Entry value="mjm-l2.arl.hpc.mil" label="mjm-l2.arl.hpc.mil"/>
            <Entry value="mjm-l3.arl.hpc.mil" label="mjm-l3.arl.hpc.mil"/>
            <Entry value="mjm-l4.arl.hpc.mil" label="mjm-l4.arl.hpc.mil"/>
            <Entry value="mjm-l5.arl.hpc.mil" label="mjm-l5.arl.hpc.mil"/>
            <Entry value="mjm-l6.arl.hpc.mil" label="mjm-l6.arl.hpc.mil"/>
            <Entry value="mjm-l7.arl.hpc.mil" label="mjm-l7.arl.hpc.mil"/>
          </Enumeration>
        </Option>
        <Option name="QUEUE" label="Queue name" save="true">
          <Enumeration default="debug">
            <Entry value="debug" label="debug"/>
            <Entry value="urgent" label="urgent"/>
            <Entry value="staff" label="staff"/>
            <Entry value="high" label="high"/>
            <Entry value="challenge" label="challenge"/>
            <Entry value="cots" label="cots"/>
            <Entry value="interactive" label="interactive"/>
            <Entry value="standard-long" label="standard-long"/>
            <Entry value="standard" label="standard"/>
            <Entry value="background" label="background"/>
          </Enumeration>
        </Option>
        <Option name="PV_SERVER_PORT" label="Local port number" readonly="true">
          <Range type="int" min="1024" max="65535" step="1" default="50002"/>
        </Option>
        <Option name="SERVER_PORT" label="Remote port number">
          <Range type="int" min="1024" max="65535" step="1" default="random"/>
        </Option>
        <Option name="PV_CONNECT_ID" label="Connection ID">
          <Range type="int" min="1" max="65535" step="1" default="random"/>
        </Option>
        <Option name="NUMPROC" label="Number Of Processes">
          <Range type="int" min="1" max="256" step="1" default="2"/>
        </Option>
        <Option name="PTILE" label="Processor tiling" >
          <Enumeration default="Temporarily Disabled Feature">
            <Entry value="1" label="1 process per node"/>
          </Enumeration>
        </Option>
        <Option name="WALLTIME" label="Wall time (minutes)">
          <Range type="int" min="1" max="65535" step="1" default="5"/>
        </Option>
      </Options>
      <Command exec="$SSHLOC$" delay="0">
        <Arguments>
          <Argument value="-q"/>
          <Argument value="-o StrictHostKeyChecking=no"/>
          <Argument value="-R"/>
          <Argument value="$SERVER_PORT$:127.0.0.1:50002"/>
          <Argument value="$SERVERNAME$"/>
          <Argument value="-l"/>
          <Argument value="$USERNAME$"/>
          <Argument value="/usr/bin/env"/>
          <Argument value="PV_CONNECT_ID=$PV_CONNECT_ID$"/>
          <Argument value="PV_PORT=$SERVER_PORT$"/>
          <Argument value="CSE_PARAVIEW=$VERSION$"/>
          <Argument value="qsub"/>
          <Argument value="-V"/>
          <Argument value="-A $PROJECTNUM$"/>
          <Argument value="-N pvserver"/>
          <Argument value="-q $QUEUE$"/>
          <Argument value="-l walltime=$WALLTIME$:00"/>
          <Argument value="-l select=$NUMPROC$:ncpus=4:mpiprocs=1"/>
          <Argument value="-l place=scatter:excl"/>
          <Argument value="-W block=true"/>
          <Argument value="-k oe -j oe"/>
          <Argument value="/usr/cta/CSE/Misc/ParaView_batch.pbs"/>
        </Arguments>
      </Command>
    </CommandStartup>
  </Server>
  <Server name="builtin" resource="builtin:">
    <ManualStartup/>
  </Server>
</Servers>

ParaView_batch.pbs

#!/bin/csh

# December 2008
# LSF script to allow ParaView distributed processing on (MJM)
# mjm.pvsc is required on the client side

# Modified by Rick Angelini (U.S. Army Research Laboratory) to use QSUB.
# 10/1/2009


/usr/cta/CSE/modules/utils/logger ParaView_batch.pbs MJM

# Check Variables

if (! $?PV_CONNECT_ID) then
    echo PV_CONNECT_ID not set
    exit 1
endif
echo PV_CONNECT_ID = ${PV_CONNECT_ID}

if (! $?PV_PORT) then
    echo PV_PORT not set
    exit 1
endif
echo PV_PORT = $PV_PORT

if (! $?CSE_PARAVIEW) then
    echo CSE_PARAVIEW not set
    exit 1
endif
echo CSE_PARAVIEW = ${CSE_PARAVIEW}

# Initialize Modules

if (-e /usr/cta/modules/3.1.6/init/csh) then
    source /usr/cta/modules/3.1.6/init/csh      # MJM
else
    echo Problem loading modules
endif

module load Master pbs
module load ti06/gcc4.2 ti06/openmpi-1.3
module load cseinit cse-tools
module load cse/mesa/latest cse/ParaView/${CSE_PARAVIEW}

#set up ssh port forwarding back to login node
echo setting SSHPID to connect back to login node ${PBS_O_HOST}

(/usr/bin/ssh -o StrictHostKeyChecking=no -N -L${PV_PORT}:127.0.0.1:${PV_PORT} ${PBS_O_HOST} ) &
set SSHPID = $!
echo SSHPID = $SSHPID

#give ssh some time to connect
sleep 3

#run ParaView
openmpirun.pbs pvserver \
   --use-offscreen-rendering \
   --server-port=${PV_PORT} \
   --client-host=127.0.0.1 \
   --reverse-connection \
   --connect-id=${PV_CONNECT_ID}

#cleanup
echo killing SSH: $SSHPID
kill $SSHPID

Back to ParaView