Bultin activities in XPFlow

There is a variety of built-in activities available in XPFlow’s standard library.

Basic builtins

import (or use)

import(name) → nil

Import processes and activities from a different file (or predefined library) to the scope of the current file.

import "common.rb"

process :main do

log (or puts)

log(*messages) → nil

Adds a message (or a list of messages) to the execution log.

process :example do |arg|
    log "Starting process..."
    puts "Passed argument: #{arg}"


sleep(duration) → nil

Suspend the process or the activity for duration seconds.

value (or id)

value(x) → x

Returns the given argument.


system(command) → string

Executes a command on the main machine (i.e., the machine that XPFlow runs on) and returns standard output.

process :main do
    ls = system "ls"
    log ls


range(begin, end, step) → array

Creates an array containing the sequence between begin to end (inclusive) incremented by step.

process :main do
    values = range(0, 10, 1)
    log values


assert(condition) → nil

Does nothing if the condition is true and fails otherwise.


fail(message) → nil

Fails execution of a workflow unconditionally.


var(name, type=:str, default = nil) → type

Gets a variable from arguments provided in command line arguments (using -V switch), from a YAML file (using -f switch) or from user input (otherwise). In the following example, we get the value of :name from command line:

process :main do
    log "Hello #{var(:name, :str, 'john doe')}"
$ xpflow helloname.rb -V name=Henri

There are different types available:

type name example
:str String "abc"
:int Integer 42
:float Float 3.141592
:bool Boolean true
:range Range 2..8

Data structures

This API is not stable and is likely to change in the near future.


data_vector(array) → vector of data

Creates an object that stores an arbitrary values in an array. If called without arguments, it will be created empty, otherwise the vector will be initialized with contents of array.

process :main do
    d1 = data_vector()
    d2 = data_vector([ 1, 2, 3 ])


data_push(vector, object) → nil

Pushes object at the end of vector.

process :main do
    d1 = data_vector()
    data_push(d1, 42)
    data_push(d1, 77)
    log d1


simple_node(“user@machine”) → node

Returns a node object used to access machine. See example of execute_one.


execute_one(node, command) → result of command execution

result is an object which contains stdout, stderr, node and command fields.

Executes a command on a given node and returns its result. If execution of the command fails, a failure is thrown.

process :main do
    node = simple_node "user@machine"
    r = execute_one node, "hostname"
    log "Hello I am #{r.stdout}"
    log "and I executed #{r.command} command"


execute(nodes, command) → results of command executions

Executes command on all nodes passed to the activity. The commands are executed sequentially on consecutive nodes and therefore the activity may not scale when the number of nodes is large. In that case one should use execute_many.


execute_many(nodes, command) → results of command executions

Executes command on all nodes passed to the activity using a scalable and efficient method. The return value is fully compatible with the return value of execute.


file(node, path) → remote file object

Returns an object that points to a remote file at path on a given node. The result can be used as a source in some activities, like distribute or copy.

process :main do
    f = file(localhost, "/etc/resolv.conf")
    copy f, localhost, "/tmp/resolv.conf"


copy(source, nodes, path) → nil

Copies a given source to all nodes at a given path. The files are copied in sequence - if you need more performance, use distribute which offers more scalable way to transfer files.

The source is either:

  • a glob pattern which refers to files on a node executing XPFlow,
  • a file object created with file activity,
  • a result of command execution obtained with execute or execute_one.
process :main do
    copy "/etc/hostname", localhost, "/tmp/hostname-1"
    copy file(localhost, "/etc/resolv.conf"), localhost, "/tmp/hostname-2"
    r = execute_one localhost, "hostname"
    copy r, localhost, "/tmp/hostname-3"


distribute(source, nodes, path) → nil

Copies a given source to all nodes at a given path using an efficient and scalable algorithm. It accepts the same parameters as copy.

Statistical Activities


conf_interval(array, opts = {}) → confidence interval

Calculates a confidence interval for the mean value of values in array.

Options may contain:

  • :dist - distribution of mean of values in array; can be either normal (:n, :normal) or t-Student (:t, :tstudent); by default it is normal,
  • :conf - confidence of the interval, that is, a probability that the real mean is inside the calculated interval (95% by default).

Beware, using :normal assumes that the distribution of average value is close to normal distribution. In practice, this means that the number of initial samples drawn must be at least 30. If you know that the distribution of a single measure is normal (or you have good reasons to believe so) then you should use approximation based on Student’s t-distribution. It is more correct to use it for small sample sizes in that case.

process :main do
    arr = [ 1, 2, 3, 4 ]
    i1 = conf_interval arr
    i2 = conf_interval arr, :conf => 0.9
    i3 = conf_interval arr, :conf => 0.9, :dist => :t
    log(i1); log(i2); log(i3)

Notice that each consecutive interval is bigger than a previous one.


minimal_sample(array, opts = {}) → integer

Calculates a minimal number of samples necessary to obtain a defined confidence interval. The decision is based on a small sample provided in array.

Options may contain:

  • :dist and :conf - just like in conf_interval,
  • :abs - absolute precision, i.e., the real mean value must be at most that value from the center of the interval (with a probability given with :conf),
  • :rel - relative precision, i.e., the real mean value must be at most that many percents from the center of the interval (with a probability given with :conf); the default value is 10%.
process :main do
    arr = [ 1, 2, 3, 4 ]
    c1 = minimal_sample arr, :abs => 1.0
    c2 = minimal_sample arr, :rel => 0.2
    log(c1); log(c2)


sample_enough(array, opts = {}) → true or false

Returns true if the given array is big enough to provide a desired accuracy of its confidence interval as computed with conf_interval. Accepts the same arguments as minimal_sample.

process :main do
    arr = [ 1, 2, 3, 4 ]
    i = conf_interval arr
    se1 = sample_enough arr, :abs => 1.0
    se2 = sample_enough arr, :abs => 2.0
    log(se1); log(se2)

Grid’5000 builtins


g5k_reserve_nodes(specification) → reservation object

Makes a reservation on Grid’5000 platform according to specification provided.

process :main do
    r = g5k_reserve_nodes :site => 'nancy', :nodes => 1, :time => '00:10:00'
    log r

The available reservation parameters are:

  • :site - name of the site to reserve nodes from (required)
  • :nodes - number of nodes to reserve (default: 1)
  • :time - duration of the reservation (default: 1 hour)
  • :type - type of the reservation (:normal or :deploy, the former is default)

The reservations are cleaned up after experiment execution automatically.


g5k_get_avail(specification = nil) → reservation object

Tries to find an existing reservation on Grid’5000 platform. Without arguments, it will search throughout the platform to find an available job. The argument of type :site may be provided to narrow down the search domain.

process :main do
    r = g5k_get_avail :site => 'nancy'
    log r


g5k_sites(reservation) → list of site names

Returns an array of site names in Grid’5000.


g5k_nodes(reservation) → list of node objects

Returns an array of nodes associated with the reservation.

process :main do
    r = g5k_get_avail :site => 'nancy'
    nodes = g5k_nodes r
    log nodes


g5k_kadeploy(reservation, environment) → list of node objects

Deploys environment on nodes associated with reservation.

process :main do
    r = g5k_get_avail :site => 'nancy'
    nodes = g5k_kadeploy r, "wheezy-x64-nfs"
    log nodes


g5k_site(site) → node object

Returns a node object associated with Grid’5000 site.

process :main do
    frontend = g5k_site "nancy"
    r = execute_one frontend, "date"
    log r.stdout


This section a list of activities that are mostly useful to potential developers or people who seek more advanced logging primitives.


send(object, method, *arguments) → result

Calls an arbitrary method on a given object. Method arguments may be provided as well.

process :main do
    x = value "this string has 16 characters"
    y = send x, :gsub, "16", "29"
    log "y = #{y}"


code(*variables) → result

Executes a raw Ruby code during workflow execution.

process :example do |arg|
    code(arg) { |x| IO.write("/tmp/file", x.to_s) }

The workflow variables must be passed explicitely via arguments to code.