Bultin activities in XPFlow

There is a variety of built-in activities available in XPFlow’s standard library.

Basic builtins

import (or use)

import(name) → nil

Import processes and activities from a different file (or predefined library) to the scope of the current file.

import "common.rb"

process :main do
    process_in_common()
end

log (or puts)

log(*messages) → nil

Adds a message (or a list of messages) to the execution log.

process :example do |arg|
    log "Starting process..."
    puts "Passed argument: #{arg}"
end

sleep

sleep(duration) → nil

Suspend the process or the activity for duration seconds.

value (or id)

value(x) → x

Returns the given argument.

system

system(command) → string

Executes a command on the main machine (i.e., the machine that XPFlow runs on) and returns standard output.

process :main do
    ls = system "ls"
    log ls
end

range

range(begin, end, step) → array

Creates an array containing the sequence between begin to end (inclusive) incremented by step.

process :main do
    values = range(0, 10, 1)
    log values
end

assert

assert(condition) → nil

Does nothing if the condition is true and fails otherwise.

fail

fail(message) → nil

Fails execution of a workflow unconditionally.

var

var(name, type=:str, default = nil) → type

Gets a variable from arguments provided in command line arguments (using -V switch), from a YAML file (using -f switch) or from user input (otherwise). In the following example, we get the value of :name from command line:

process :main do
    log "Hello #{var(:name, :str, 'john doe')}"
end
$ xpflow helloname.rb -V name=Henri

There are different types available:

type name example
:str String "abc"
:int Integer 42
:float Float 3.141592
:bool Boolean true
:range Range 2..8

Data structures

This API is not stable and is likely to change in the near future.

data_vector

data_vector(array) → vector of data

Creates an object that stores an arbitrary values in an array. If called without arguments, it will be created empty, otherwise the vector will be initialized with contents of array.

process :main do
    d1 = data_vector()
    d2 = data_vector([ 1, 2, 3 ])
end

data_push

data_push(vector, object) → nil

Pushes object at the end of vector.

process :main do
    d1 = data_vector()
    data_push(d1, 42)
    data_push(d1, 77)
    log d1
end

simple_node

simple_node(“user@machine”) → node

Returns a node object used to access machine. See example of execute_one.

execute_one

execute_one(node, command) → result of command execution

result is an object which contains stdout, stderr, node and command fields.

Executes a command on a given node and returns its result. If execution of the command fails, a failure is thrown.

process :main do
    node = simple_node "user@machine"
    r = execute_one node, "hostname"
    log "Hello I am #{r.stdout}"
    log "and I executed #{r.command} command"
end

execute

execute(nodes, command) → results of command executions

Executes command on all nodes passed to the activity. The commands are executed sequentially on consecutive nodes and therefore the activity may not scale when the number of nodes is large. In that case one should use execute_many.

execute_many

execute_many(nodes, command) → results of command executions

Executes command on all nodes passed to the activity using a scalable and efficient method. The return value is fully compatible with the return value of execute.

file

file(node, path) → remote file object

Returns an object that points to a remote file at path on a given node. The result can be used as a source in some activities, like distribute or copy.

process :main do
    f = file(localhost, "/etc/resolv.conf")
    copy f, localhost, "/tmp/resolv.conf"
end

copy

copy(source, nodes, path) → nil

Copies a given source to all nodes at a given path. The files are copied in sequence - if you need more performance, use distribute which offers more scalable way to transfer files.

The source is either:

  • a glob pattern which refers to files on a node executing XPFlow,
  • a file object created with file activity,
  • a result of command execution obtained with execute or execute_one.
process :main do
    copy "/etc/hostname", localhost, "/tmp/hostname-1"
    copy file(localhost, "/etc/resolv.conf"), localhost, "/tmp/hostname-2"
    r = execute_one localhost, "hostname"
    copy r, localhost, "/tmp/hostname-3"
end

distribute

distribute(source, nodes, path) → nil

Copies a given source to all nodes at a given path using an efficient and scalable algorithm. It accepts the same parameters as copy.

Statistical Activities

conf_interval

conf_interval(array, opts = {}) → confidence interval

Calculates a confidence interval for the mean value of values in array.

Options may contain:

  • :dist - distribution of mean of values in array; can be either normal (:n, :normal) or t-Student (:t, :tstudent); by default it is normal,
  • :conf - confidence of the interval, that is, a probability that the real mean is inside the calculated interval (95% by default).

Beware, using :normal assumes that the distribution of average value is close to normal distribution. In practice, this means that the number of initial samples drawn must be at least 30. If you know that the distribution of a single measure is normal (or you have good reasons to believe so) then you should use approximation based on Student’s t-distribution. It is more correct to use it for small sample sizes in that case.

process :main do
    arr = [ 1, 2, 3, 4 ]
    i1 = conf_interval arr
    i2 = conf_interval arr, :conf => 0.9
    i3 = conf_interval arr, :conf => 0.9, :dist => :t
    log(i1); log(i2); log(i3)
end

Notice that each consecutive interval is bigger than a previous one.

minimal_sample

minimal_sample(array, opts = {}) → integer

Calculates a minimal number of samples necessary to obtain a defined confidence interval. The decision is based on a small sample provided in array.

Options may contain:

  • :dist and :conf - just like in conf_interval,
  • :abs - absolute precision, i.e., the real mean value must be at most that value from the center of the interval (with a probability given with :conf),
  • :rel - relative precision, i.e., the real mean value must be at most that many percents from the center of the interval (with a probability given with :conf); the default value is 10%.
process :main do
    arr = [ 1, 2, 3, 4 ]
    c1 = minimal_sample arr, :abs => 1.0
    c2 = minimal_sample arr, :rel => 0.2
    log(c1); log(c2)
end

sample_enough

sample_enough(array, opts = {}) → true or false

Returns true if the given array is big enough to provide a desired accuracy of its confidence interval as computed with conf_interval. Accepts the same arguments as minimal_sample.

process :main do
    arr = [ 1, 2, 3, 4 ]
    i = conf_interval arr
    se1 = sample_enough arr, :abs => 1.0
    se2 = sample_enough arr, :abs => 2.0
    log(i)
    log(se1); log(se2)
end

Grid’5000 builtins

g5k_reserve_nodes

g5k_reserve_nodes(specification) → reservation object

Makes a reservation on Grid’5000 platform according to specification provided.

process :main do
    r = g5k_reserve_nodes :site => 'nancy', :nodes => 1, :time => '00:10:00'
    log r
end

The available reservation parameters are:

  • :site - name of the site to reserve nodes from (required)
  • :nodes - number of nodes to reserve (default: 1)
  • :time - duration of the reservation (default: 1 hour)
  • :type - type of the reservation (:normal or :deploy, the former is default)

The reservations are cleaned up after experiment execution automatically.

g5k_get_avail

g5k_get_avail(specification = nil) → reservation object

Tries to find an existing reservation on Grid’5000 platform. Without arguments, it will search throughout the platform to find an available job. The argument of type :site may be provided to narrow down the search domain.

process :main do
    r = g5k_get_avail :site => 'nancy'
    log r
end

g5k_sites

g5k_sites(reservation) → list of site names

Returns an array of site names in Grid’5000.

g5k_nodes

g5k_nodes(reservation) → list of node objects

Returns an array of nodes associated with the reservation.

process :main do
    r = g5k_get_avail :site => 'nancy'
    nodes = g5k_nodes r
    log nodes
end

g5k_kadeploy

g5k_kadeploy(reservation, environment) → list of node objects

Deploys environment on nodes associated with reservation.

process :main do
    r = g5k_get_avail :site => 'nancy'
    nodes = g5k_kadeploy r, "wheezy-x64-nfs"
    log nodes
end

g5k_site

g5k_site(site) → node object

Returns a node object associated with Grid’5000 site.

process :main do
    frontend = g5k_site "nancy"
    r = execute_one frontend, "date"
    log r.stdout
end

Advanced

This section a list of activities that are mostly useful to potential developers or people who seek more advanced logging primitives.

send

send(object, method, *arguments) → result

Calls an arbitrary method on a given object. Method arguments may be provided as well.

process :main do
    x = value "this string has 16 characters"
    y = send x, :gsub, "16", "29"
    log "y = #{y}"
end

code

code(*variables) → result

Executes a raw Ruby code during workflow execution.

process :example do |arg|
    code(arg) { |x| IO.write("/tmp/file", x.to_s) }
end

The workflow variables must be passed explicitely via arguments to code.