Administration Guide

Installation

Before the installation, you should know that Red Sqirl has an online repository and takes advantage of it. Module management become easier if the repository is available and can be done directly from the Red Sqirl application. If the repository is not available only the administrator can install manually new modules. You can find the installation instructions on the Install Red Sqirl page.

Red Sqirl Folder Content

In this paragraph we are listing the folder and their usage.

  • apache-tomcat-7.0.42: Folder with tomcat. Feel free to use the tomcat documentation to change anything there. We will only flag the current log file available for troubleshooting in logs/catalina.out
  • conf: In Red Sqirl the conf folder will contain shared related material.
    • licenseKey.properties: a file containing the license keys related to the installation (one key per package and one for Red Sqirl itself)
    • redsqirl_sys_lang.properties: a description of every Red Sqirl settings.
    • redsqirl_sys.properties: Settings shared by every user. Sometimes, user can overwrite them in their redsqirl_user.properties. See user folder. This file is readable by everybody and administrators can change it through Red Sqirl web application.
    • logo.txt: The Red Sqirl logo in ASCII character.
  • users/user folder: In the user folder, you will find lots of auto-generated files, by modifying them you will change Red Sqirl behaviour.
    • redsqirl-workflow.log: The log file, with all the log generated by the user session. This log file is key when troubleshooting.
    • redsqirl_user.properties: User properties can only be read and writen by the user.
    • output_colours.properties: This file specify workflow arrow colors. It uses css color names or HEX Value.
    • packages: User specific packages. It will contain the package installed by the user
    • superactions: The superactions folder, will contain all the user superactions classified per model. When changing a model, the _BACKUP directory in this model will be updated with a zip file containing the last state of the previous version.
    • functions file: Files such as functionsOozie.xml, functionsPig.xml contains the entire list of functions available inside Red Sqirl. The exml is divided in menu. Do not create new menus, but you can create/delete functions.
      <function>
      <name>AVG()</name>
      <input>NUMBER</input>
      <return>FLOAT</return>
      <help>@function:AVG( ELEMENT )@short:Use the AVG function to compute the average of a set of numeric values in a single-column bag@param: ELEMENT item to average@description:Computes the average of the numeric values in a single-column bag. AVG requires a preceding GROUP ALL statement for global sums and a GROUP BY statement for group sums@example: AVG(A.id) returns the average value of A.id</help>
      </function>
    • hidden files: The hidden files (starting by .), shouldn't be edited in any circonstances. If deleted, they will be regenerated. Most of them are used as cach in order to log in to Red Sqirl faster.
    • tmp: Used while a Red Sqirl user session is open.
    • jobs: Used as a temporary folder when a job runs, it should be empty most of the time.
    • lucene: Folder that power Red Sqirl help search engine, do not edit.
  • tutorialdata: Small files used in tutorial proposed in Red Sqirl documentation
  • scripts: This folder contains script templates. Red Sqirl users can have access to these template through script nodes.
  • superactions: Lists of superactions available to every Red Sqirl user.
  • packages: Lists of all the packages and the different files it contains for uninstalling the packages cleanly and check for conflicts.
  • war: The Red Sqirl war file (compiled code).
  • lib: The Red Sqirl jar files and dependencies (compiled code). The packages folder inside contains package jars.
  • lucene: Folder that power Red Sqirl help search engine, do not edit.
  • usageRecordLog: Records high level user interaction for the tool so trends can be analysed.

A function file copied from a user to the shared conf folder will be used by any user that does not have the function file in their user folder. It can be used for UDFs or other customizations.

Start and Shutdown

Scripts to start and shutdown are accessible in the bin directory. Do not use the tomcat start and shutdown scripts as they do not handle well shutdown. If you have problems to restart Red Sqirl check the Red Sqirl processes, kill them all and start again. You could achieve this through command line.

$ps aux | grep redsqirl
$kill -9 <PID>

User and Security

In Red Sqirl security is managed on the OS level. It means the application only allow what the user can do on command line. In order to enforce this property Red Sqirl uses SSH and RMI. A process is created on localhost for every user through SSH. The web application send RMI request to the user specific process. The pid of the process is stored temporarily on the user Red Sqirl folder. When signing out the user process is killed. A user cannot have more than two processes running at the same time and therefore one cannot sign in simultaneously two times with the same login.

This design makes integration with any proper security system such as Kerberos, ACL or Apache Ranger easy and inherited.

Job Management

Jobs executed on Red Sqirl are exclusively managed by Apache Oozie. The jobs can be suspended, killed, resumed through Oozie. Red Sqirl provides also those features through its UI. Red Sqirl enable users to change the execution queues for every user, the master processes (Oozie Map only job) run in the launch queue. The action queue manages the actual data processing jobs. Red Sqirl Oozie jobs are located into the /user/$user/.redsqirl/jobs folder on HDFS.

Package Management

If the Red Sqirl application can reach the repository, packages can be installed through the interface. In the administration view, go to the Package page. In the user view, you can install uninstall plugin through Project>Manage Packages or Project>Sub-Workflow>Manage Models.

If the Red Sqirl application cannot reach the internet, the installation of a package has to follow several steps

  • Log in on the web-site
  • Go in Search, choose your package and install
  • Download the source and the license file
  • Go in the administration view
  • Licence tab: upload the corresponding file
  • Package view: upload the corresponding file

Troubleshooting

Below we are listing the most frequent problems regarding Red Sqirl administration.

Sign in error

Red Sqirl has two types of sign in.

  • Sign in to the cloud for downloading apps.
  • Sign in to Red Sqirl as a user.

Problem signing in to the cloud

If you can't sign in to the cloud from Red Sqirl.

  1. On the Red Sqirl website, click on Sign In and enter your user name and password.
    • If that works, the problem is with Red Sqirl application, make sure your "email" and not your "user" is required.
    • Check the logs.
  2. Click on forget password.
  3. Check your email including your spam folder.
  4. Click on the link, then on the "click here" button.
  5. Go in your email box and copy your new password.
  6. Go back to the Red Sqirl website and Sign in with your new password.

Problem signing in to Red Sqirl

Red Sqirl uses SSH for signing in.

  1. Open a terminal and ssh the Red Sqirl server.
  2. Run the following command. If it does not work, then Red Sqirl won't be able to sign in any user:
    ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no <myuser>@localhost
  3. Check if your environment variables and system is sain. At least java 1.7 should be available on your path.
    [etienne@cdnode1 projects]$ java -version
    java version "1.7.0_67"
    Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

Kerberos

Kerberos settings for Red Sqirl can be tricky and troubleshooting can be trickier. We will look into keytab file, user permission, and hadoop user matching here. The redsqirl-workflow.log file will be critical for troubleshooting as it can give you a hint on what is going on.

  1. Check if it works on the OS level. On your environment follow steps similar to the following. The keytab have to be passwordless. The principal should always be user/hostname@REALM. Below, we demonstrate what we should see.
    [etienne@cdnode1 ~]$ kdestroy
    [etienne@cdnode1 ~]$ hadoop fs -ls /
    .
    .
    .
    Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:561)
        at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:376)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:731)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:726)
        ... 32 more
    Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121)
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
    [etienne@cdnode1 ~]$ kinit -k -t /home/etienne/etienne.keytab etienne/cdnode1.local.net@IDIRO.LOCAL
    [etienne@cdnode1 ~]$ hadoop fs -ls /
    Found 2 items
    drwxrwxrwt   - hdfs supergroup          0 2017-06-01 14:10 /tmp
    drwxr-xr-x   - hdfs supergroup          0 2017-06-01 13:58 /user
  2. Check permissions, it has be readable by the user, and ideally only readable by him/her. Here my principal is for the etienne user, therefore the file should be owned and accessible by etienne.
    [etienne@cdnode1 ~]$ ls -l /home/etienne/etienne.keytab 
    -rw------- 1 etienne root 466 May 25 11:20 /home/etienne/etienne.keytab
  3. Check your settings again. Find below the settings.
    [etienne@cdnode1 ~]$ #Reminder of our princial and keytab path in this example
    [etienne@cdnode1 ~]$ kinit -k -t /home/etienne/etienne.keytab etienne/cdnode1.local.net@IDIRO.LOCAL
    
    [etienne@cdnode1 ~]$ #Content of our property file (accessible also through the Settings in Red Sqirl web application.
    [etienne@cdnode1 projects]$ cat redsqirl-hadoop-2.6.0-hive-1.1.0--1.4.1/conf/redsqirl_sys.properties | grep security
    core.security.realm=IDIRO.LOCAL
    core.security.enable=TRUE
    core.security.hadoop_conf=/etc/hadoop/conf
    core.security.hostname=cdnode1.local.net
    core.security.user_keytab_template=/home/_USER/_USER.keytab