Run shell script from local directory instead of HDFS via Oozie - hadoop

I want to run a shell script from the local path(Edge node) instead of hdfs directory via oozie. My local shell script contains ssh steps which I cant run from hdfs directory.
XYZ is the userid and xxxx is the server(Edge node). I used below action in the workflow but this is not working. Please help
<action name="abc">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>xyz#xxxx</host>
<command>/local path/create_success_file.sh</command>
<capture-output/>
</ssh>
<ok to="success-mail"/>
<error to="fail-Email"/>
</action>

Related

How to get oozie jobId in oozie workflow?

I have a oozie workflow that will invoke a shell file, Shell file will further invoke a driver class of mapreduce job. Now i want to map my oozie jobId to Mapreduce jobId for later process. Is there any way to get oozie jobId in workflow file so that i can pass the same as argument to my driver class for mapping.
Following is my sample workflow.xml file
<workflow-app xmlns="uri:oozie:workflow:0.4" name="test">
<start to="start-test" />
<action name='start-test'>
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${jobScript}</exec>
<argument>${fileLocation}</argument>
<argument>${nameNode}</argument>
<argument>${jobId}</argument> <!-- this is how i wanted to pass oozie jobId -->
<file>${jobScriptWithPath}#${jobScript}</file>
</shell>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>test job failed
failed:[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
Following is my shell script.
hadoop jar testProject.jar testProject.MrDriver $1 $2 $3
Try to use ${wf:id()}:
String wf:id()
It returns the workflow job ID for the current workflow job.
More info here.
Oozie drops an XML file in the CWD of the YARN container running the shell (the "launcher" container), and also sets an env variable pointing to that XML (cannot remember the name though).
That XML contains a lot of stuff like name of Workflow, name of Action, ID of both, run attempt number, etc.
So you can sed back that information in the shell script itself.
Of course passing explicitly the ID (as suggested by Alexei) would be cleaner, but sometimes "clean" is not the best way. Especially if you are concerned about whether it's the first run or not...

How to get oozie workflow duration at the end

Is there any way to email the duration of the workflow with the completion email? Is there such a variable that I can use?
I dont think such a variable is available. But if needed you can do such using shell action. During your workflow start execute a shell script for start time and save it in a variable. At the time of workflow just finish before your email action have a another shell script which will calculate the current time - start time and use it in your email. But this makes your workflow dirty
This is a remarkable shortcoming of Oozie. Each of our workflows starts with an shell action that calls a simple bash script to get timestamp.
<action name="start-time">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>utc-time.sh</exec>
<file>../common/utc-time.sh#utc-time.sh</file>
<capture-output/>
</shell>
<ok to="the-first-actual-action"/>
<error to="fail"/>
</action>
And this is testable with Java EL in the email we send on completion, error, like so:
<action name="email">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailsToAlert}</to>
<subject>COMPLETED: ${wf:name()}</subject>
<body>
Workflow ID: ${wf:id()}
Workflow Name: ${wf:name()}
Workflow app path: ${wf:appPath()}
Start Time: ${wf:actionData('start-time')['time']}
End Time: ${timestamp()}
</body>
</email>
<ok to="end"/>
<error to="fail"/>
</action>
Getting duration is another jump-through-hoop exercise involving passing the start and end time to a bash script.
I was investigating the Oozie SLA functionality, but I haven't found a way to extract the data.

How to make Hue - Oozie workflow run a java job which has config file?

I have a buildModel.jar, and a folder "conf" which contain a configuration file named config.properties.
The command line running it look like this:
hadoop jar /home/user1/buildModel.jar -t fp-purchased-products -i hdfs://Hadoop238:8020/user/user2/recommend_data/bought_together
After doing some analyze, it use the db information in "config.properties" file to store data to a mongo db.
Now i need to run it with Hue Oozie workflow, so I used Hue to upload the jar file and folder "conf" to hdfs then created a workflow. I also added "config.properties" file in workflow
This is the workflow.xml
<workflow-app name="test_service" xmlns="uri:oozie:workflow:0.4">
<start to="run_java_file"/>
<action name="run_java_file">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>xxx.xxx.recommender.buildModel.Application</main-class>
<arg>-t=fp-purchased-products</arg>
<arg>-i=hdfs://Hadoop238:8020/user/user2/recommend_data/bought_together</arg>
<file>/user/user2/service/build_model/conf/config.properties#config.properties</file>
</java>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
And this is the workflow-metadata.json
{"attributes": {"deployment_dir": "/user/hue/oozie/workspaces/_user2_-oozie-31-1416890719.12", "description": ""}, "nodes": {"run_java_file": {"attributes": {"jar_path": "/user/user2/service/build_model/buildModel.jar"}}}, "version": "0.0.1"}
After doing analyze, it got error when save data to mongo db. It seem that the java file can't see the config.properties.
Can anyone guide me how to use Hue Oozie run java which has config file ?
Sorry for late answer.
As Romain explained above. Hue will copy the config.properties to the same directory with the BuildModel.jar. So i changed the code to let BuildModel.jar read config file at the same directory. It worked !

FNF: Not able to execute ssh-base.sh

Trying to run a Oozie workflow but keep getting the following error message:
org.apache.oozie.action.ActionExecutorException: FNF: Not able to execute ssh-base.sh on username#servername | ErrorStream: *********************************************************************
This machine is the property of xyz....
(Note: I've setup passpharase-less access. If I run the steps manually it works, but when I run thru Oozie it doesn't. In other words, I can login to the machine as user 'oozie', then ssh username#servername (without entering password) & then run the 'command'. This works, but the Oozie workflow doesn't)
Here's my workflow.xml
<workflow-app name="my app" xmlns="uri:oozie:workflow:0.2">
<start to="sshAction"/>
<action name="sshAction">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>username#servername</host>
<command>cd /export/home/user/test/bin;./test.sh --arg value</command>
<capture-output/>
</ssh>
<ok to="sendEmail"/>
<error to="sendEmail" />
</action>
<action name="sendEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>username#xyz.com</to>
<subject>Output of workflow ${wf:id()}</subject>
<body>Status of the file move: ${wf:actionData('sshAction')['STATUS']}</body>
</email>
<ok to="end"/>
<error to="end"/>
</action>
<end name="end"/>
</workflow-app>
Figured out what was wrong by looking at the code. FNF stands for 'File not found'. It appears the 'ssh action' doesn't handle commands separated by semi-colon such as this:
cd /export/home/user/test/bin;./test.sh --arg value
Here's what I did:
1) Changed the command to:
./test.sh --arg value
2) Copied test.sh to the root directory of the user.
3) Added cd /export/home/user/test/bin to the beginning of the 'test.sh'
It's working now!

oozie ssh action takes long time to complete

I tried running an ssh action workflow job in oozie with the following action code
Passwordless ssh was configured :
<action name="sshaction">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>127.0.0.1</host>
<command>/bin/bash</command>
<args>/home/510600/HADOOP_ECO/CDH4/oozietest/test.sh</args>
<args>first</args>
<capture-output/>
</ssh>
<ok to="WordCount" />
<error to="fail" />
</action>
<action name="WordCount">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/510600/output/" />
</prepare>
<main-class>${parse_mainClass}</main-class>
<arg>${inputDir}</arg>
<arg>${parse_Output}</arg>
</java>
<ok to="end" />
<error to="fail" />
</action>
Problem I encountered with the above code is oozie ssh action takes long time to complete even with a 2 line shell script, However other action runs very fast.
For the above 2 actions sshaction took 12 mins to complete and the action WordCount took only 15 Seconds to complete
my shellscript is as /home/510600/HADOOP_ECO/CDH4/oozietest/test.sh
#!/bin/bash
rm -rf /home/510600/abc.log
Can anyone explain why oozie ssh action takes long time to run ?
If everything works fine except sending status to oozie webserver from shell script, I'nk the issue would be curl.
Linux utility curl should be present in the remote machine.
Because oozie webserver internally uses two bash scripts ssh-base.sh and ssh-wapper.sh for executing the commands in remote machine. ssh-base.sh script uses the linux utility curl to send the status back to oozie server by invoking oozie webservice.
Sometimes it may occurs because of configuration or authenticatin issues.
Did you try executing the script without oozie. How long it takes to complete ?

Resources