> Schema  > Products  > SRTransport  > Documentation  > Using DataScript

 
  Using DataScript

 

DataScript is used to manipulate data in the expressions part of a transport and in pre and post processing. (See Transforming Data). It is not intended to be a complete programming language. For quick reference see DataScript Reference

Value Expressions

In the transport window the Source and Target Columns actually represent an object called a tuple. A tuple is like a dictionary or a hash that holds key, value pairs. Each key (variable name) is associated with a variable in the tuple. Expressions often operate on the variables in the source tuple and assign the results to variables in the destination tuple. A 'Value Expression' field is associated with the 'Target Column' field in the same row. The simplest and most common operation is a direct assignment from source to destination. Put the name of the source variable in the expressions row corresponding to the destination variable. The values assigned to the Destination tuple are ultimately sent to the database or output file.

In the expressions column you assign values to the destination variables. They can be values from a source column with or without modifications or they can be derived in some other way.

Source Columns

name

monthlySal

Value Expression

Target Columns

name

Name

monthlySal * 12

yearlySal

"Schema Research Corp."

employer

The most important functions are available by selecting 'Functions' from the Transport menu. To use one, select a target column name and select the desired function from the menu. The function will appear in the 'Value Expression' for the selected column. Fill in the required values in the parenthesis. (See the function explanations under 'Pre and Post Processing' for more information on what parameters and return types functions take.)

Pre and Post Processing

In the center of the main 'Transport' window is a button marked with an ellipsis. If you click on it the Transport Customization window will be displayed. This window has a pop-up menu that contains a number of places to change the way SRTransport handles the rows as they move through from source to destination. If you are unfamiliar with the sequence of events read SRTransport Processing Architecture. What concerns us here are the four pre and post processing areas.

In pre and post processing, the language is DataScript. You can set a variable to a constant value, to some other variable value or to the result of a function call that returns a value. (Procedures that do not return a value are used with the keyword call).

You can try these examples in the Interactive DataScript window ( select 'Interactive DataScript' from the Transport menu).

Working With Variables

There are 4 types of variables that are important for processing - integer, double, string and text. There are several other types that SRTransport uses that are not needed for processing but you might see mentioned. They include 'tuple', and 'list'. As explained above a tuple is basically a dictionary, and a 'list' is basically an array.

All variables are global and affect SRTransport's processing space.
All variables must be 'declared' and should be 'removed' when no longer needed.
Variables that have been 'declared' can be used in the 'Value Expression' column of the transport.
The fields of the source tuple are available as constants. They should not be 'declared', 'removed' or changed. If you need to manipulate them before using them in the Expressions, assign them to an intermediate variable and change that variable.
The '#' sign precedes comments.

The use of declare, set, print and remove when working with variables.
The output of print is directed to the Log output in the Transporter Room or to the console in the Log window.

   declare i integer 
   set i = 100
   print i
   remove i

Variables may be initialized with a constant when they are declared.
More than one variable may be declared on the same line.
Multiple variables may be removed in one statement.

   declare j integer = 100, d double
   set d = 99.99 
   print = j + d
   remove j, d

If you need a variable for processing rows, declare it in Run preprocessing and remove it in Run postprocessing.

Manipulating Strings

Strings, like all other variables must be declared. You can assign a constant value to a string in the declaration, or use set. Constant strings must appear in quotes.They may be single or double quoted. If a string needs to contain one use the other to quote it with.

   declare aString string = "some text"
   print aString
   set aString = 'some "other" text'
   declare anotherString string
   set anotherString = aString

If you must have both kinds of quotes in a string, double quote the string and escape double quotes with another double quote.

   set aString = """A very 'confused' quote"", he said."

Strings can be concatenated using the '+' operator.

   set aString = aString + " with yet more text added"
   print aString

There are a number of utility functions for working with strings.

To remove white space (spaces, tabs, CR's) from either or both ends of a string use trim(string) (both sides), ltrim(string) (left side) or rtrim(string) (right side).

   set aString = "   empty spaces on either end     "
   set aString = trim(aString)
        or
   set aString = ltrim(aString)
        or
   set aString = rtrim(aString)

To remove characters from a string use strip(string, 'characters').

   set aString = '"text that includes quotes and $dollar signs"'
   set aString = strip(aString, '"$')

You can set strings to all upper or lower case with upper(string) and lower(string).

   set aString = 'lowercase text'
   set aString = upper(aString)
   set aString = lower(aString)

You can find out how long a string is with length(string).
You can find a substring within a string with position(substring, string) (returns 0 if not found).
You can select a substring from a string with substr(string, position, length).

   set aString = "This long string contains a 'short string'"
   declare shortString string = "short string"
   set i = position(shortString, aString)
   declare anotherString string = substr(aString, i, length(shortString))
   print anotherString

Convert an expression to a string with convert(type, expression).

   set aString = convert(string, i*12)

Substitutions can be made with three different substitute functions.

substitute1(key-string, substitute-column, key-column, table)
Look for a given key-string in the key-column of the table. If the key-string is found, return the associated value found in the substitute-column of the same table. If the key-string is not found in the key-column of the table, then the empty string is returned.

substitute2 (key-string, default-string {,value, substitute}+)
Look for a given key-string in the various values. If the key-string matches one of the values, then return the corresponding substitute value. If the key-string does not match any of the values, return the default-string.
Example: substitute2 (aString, "Unknown Color", "R", "Red", "G", "Green", "B", "Blue")

substitute3 (key-string, filename)
Look for the key-string in the file and returns its substitute. If the key-string is not found, return the empty string. The file is formatted with a pipe (|) separating the key-value pairs, each pair on its own line.
Example: given a file "color_map" with this data,
    R|Red
    G|Green
    B|Blue
You would write substitute3 (aString, "color_map" )

That's it for strings - Don't forget to clean up.

   remove aString, shortString, anotherString, i, 

Working with the 'text' type

'text' differs from 'string' in that it can contain binary data. It can also contain ascii characters (and therefore be printable). 'text' can also be longer( megabytes - gigabytes ) whereas a 'string' is limited to 1000 characters. Some functions return 'text' types. Blobs are 'text'. To assign data to a 'text' variable use the print to keywords.

   declare someText text
   print to someText "some very interesting text"
   print someText
   remove someText

Working With Files

During processing you can read from and write to files.
To output a 'text' type to a file use redirect [text] to [filename] with print to [text].
To read the contents of a file into a 'text' use filecontents(path). One example of this is removing blobs from a database and writing their contents to the file system. Then store an ID and the file name in the database.

   declare F text
   redirect F to file "/tmp/photo.data"
   print to F photo
   reset F
   remove F

Going in the opposite direction - Suppose you had a file, 'paths.txt', containing the file path names for blobs you wanted to insert into a table. Use 'paths.txt' as a source and configure one field called 'path' with a type 'string'.

In 'Run preprocessing' declare a text variable -

   declare bigdata text

In 'Row preprocessing' set bigdata's value to the contents of the file in the variable 'path'-

   set bigdata = filecontents(path)

then in 'Value Expression' you can use 'bigdata' as the value of the destination blob.

Important Functions

execsql(sqlString) is a powerful tool in pre- and post- processing.
It allows you to send arbitrary sql at any time during processing.
Warning - execsql() sends sql during "trial" runs of transports!!
Suppose you have a source, sTable, with these columns fname, lname, etc. but the customer_id is in another table, aTable, with columns name, customer_id etc. You want to combine the two in the destination table.

In 'Run preprocessing' declare some variables -

   declare sql string, c_id integer

If in 'Row preprocessing' one wrote something like this -
(We want our sql to look something like this, 'SELECT customer_id FROM aTable WHERE name = "Smith, Joe"')

  (one line)
   set sql = 'SELECT customer_id FROM aTable
              WHERE name = "' + lname + ', ' + fname + '"'
   set c_id = execsql(sql)

You can now use c_id in the Value Expressions column

In 'Run postprocessing' clean up.

   remove sql, c_id

sys(cmdString) (not available on "Windows") The command is executed (by /bin/sh) and the stdout is returned as a 'text' type (can be converted to a string). This allows you to use many unix functions for processing source data. You can execute shell scripts.

For example. Suppose you have a shell script called /usr/home/myscript.sh that takes a string, myString, as an argument and returns some output that you want to assign to a string variable.

   declare result string, cmd string
   set cmd = "sh /usr/home/myscript " + myString
   set result = convert( string, sys(cmd) )

Other Functions

isnull(a,b) If a is null, substitute b.
isnotnull(a,b) If a is not null, substitute b.
hex(string) Convert the hex codes to a string or binary value. The result of function hex() is TText.

Control Statements

DataScript has a number of control structures that are similar to C.

if  -   else if  -   else
while - break - continue

Notice that parenthesis are not required around the condition, that brackets are only required when there is more than one statement as part of the block and that you use '=' for equality (not '==' as you would in C, C++ or Java).

Break causes an exit from the while loop. Continue causes the next iteration of the loop to begin, bypassing any subsequent statements in the loop.

   while [some condition]
   {
      if 1 < 2
         print "one is less than two"
      else if 1 = 2
      {
         print "there is something wrong with this math"
         continue # goes to the top of the loop
      }
      else   # 1 > 2
      {
         print "No way!"
         break      # exits the loop
      }
      print "end of loop" # prints only when 1 < 2
   }

DataScript does not have a for loop but you can use a while loop to accomplish the same thing.

   declare j integer = 100
   declare i integer = 0
   while i < j {
      // do stuff
      set i = i+1
   }
   remove i, j

Writing Your Own Functions

Function declarations use the keyword 'procedure'. To return a value, use the return statement. If you don't explicitly return a value, then NULL is returned implicitly.

Here is a trivial example that shows the basic syntax.

   procedure MyAdder ( a integer, b integer )
   {
      declare answer integer
      set answer = a + b
      return answer
   }

   declare i integer
   set i = MyAdder( 1, 2 )
   print i

See Also

SQL in DataScript