Login

07-15-2010, 08:38 AM

[eluser]JuanG[/eluser]
Hello,

after looking around a bit through the CI forums, I've decided to post my question here hoping you can give me some directions to better implement my idea, using this great framework.

The situation could be described as follows:

I'm developing a web application in CI that should provide users with the possibility to create jobs (calculations) using a web form, and the send them to compute on another server located on the same network, retrieving things like status, output, etc.

From the physical computers point of view I have:

- One web server with everything installed and running (CI, database, web page with login access, etc)
- One Beowulf cluster on the same network (set of computers managed through a master node)

Using php, I'm use to build jobs (e.g. build a text file) with a web form, and any shell command of execution (using exec or passthru), retrieval of status and outputs, but everything on the same server. Now I have to do the same thing but building everything on the web server, sending execution instructions to the computing server, and then getting results back to the web server to display on the screen.

I understand MVC model can help me separate not only logical layers but physical layers as well. The problem is, I don't know where to start or how to accomplish this.

Could you help me find my way to this implementation? I don't just want to do it, I'd like to do it right.

07-15-2010, 09:28 AM

[eluser]WanWizard[/eluser]
This is not something CI is meant for, or can do our of the box. But in the end, it's based on PHP, and with a program language you can do (almost) everything.

Normally you would solve these kind of issues with a message queue system. Your front end inserts messages into the queue, possible with a message type and a prority. The backend runs a process that requests a message from the queue, processes the message (the assignment), and sends the result back. Then the cycle repeats. Another option is using a central dispatcher. In this case the processes sit and wait until they are assigned a job by the dispatcher, which they then execute.

The first option is more scalable, but with a single cluster not really relevant.

I use the first system for ExiteCMS' background processing. I have a cron engine and a workflow engine that both generate background tasks. I have a generic background processor that picks up the task (= a specific CI controller/method) to execute. I can run this on the same system, or on other systems, and I can run as many as I need to handle the load. The only requirement is access to the database that holds the message queue.

07-15-2010, 09:45 AM

[eluser]jedd[/eluser]
Hi Juan,

Sounds like an interesting environment and problem.

I did something superficially similar recently, using system() calls to do ssh connects to remote boxes, run a commandline app, capture the output and then parse it.

What is the mechanism for launching scripts on the cluster - is it a SOAP style thing, or do you just connect (eg ssh) and launch scripts that way?

If the latter, then I'd be tempted to just put that functionality into a library. If you're using system() or exec() calls already, as you say, then it's a near-trivial change to prefix those calls with an ssh connection.

07-16-2010, 04:12 PM

[eluser]JuanG[/eluser]
Thank you guys for taking the time to answer my question(s). I'd have to ask a bit more if it's ok with you:

@WanWizard: how should communication between front and back-end be implemented? How should I pass information from one computer to the other? Should I create identical accounts and log in or something? This is some implementation I have never done before so I have no idea how to communicate between machines (and let's avoid getting into the fact that I'm a beginner with CI).

Any further details on this would be great for me to get started.

@Jedd: Indeed it is pretty interesting. If I wasn't against the clock I'd be more excited than stressed out ;-). I have no mechanism for launching scripts in mind right now. I'm not clear at all. I used SOAP with php one time and it was a nightmare. Maybe I didn't use it correctly but I didn't get a warm feeling after having to analyze arrays of arrays of arrays to get the data out.

The thing is, today the user logs in to his cluster account and runs several jobs with command-line instructions. I have to somehow abstract the user from all of that. I have no idea yet on how to get into his account to read his jobs and display them on the web app, but I'll get to that later.

Now I need to find at least where to start. If you guys have more suggestions, or even a bit more details on what you've already suggested, I'll be in debt to you forever.

Thanks again for your help.

07-16-2010, 05:51 PM

[eluser]jedd[/eluser]
Well, I think we need a bit more info too.

Do these applications have to run as the user that launches them - for either permission reasons or to track/audit usage?

If they do, then it gets a bit more complex - you need to authenticate the users at the web server, and (securely) onward authenticate to the cluster. You can start doing passwords here, but that'll get painful real quick (and you also need to look at https, self-signed certificates, etc).

If there's a small number of users involved you could perhaps keep copies of their ssh keys - but this introduces security risks of its own.

If you just need to connect to the remote host and run applications, and you don't care as who - then this is (relatively) easy.

How much data are you sending across, too - and in what format (lots of small files, very large binary lumps, just a few parameters on a commandline?)

07-17-2010, 04:17 PM

[eluser]JuanG[/eluser]
OK I have to figure a few things out here and then I'll get back to you with every detail.

I'm pro simple solutions, and since this is a tool for us to use on the lab (not a public, high traffic, ultra secured system), I can go over some things that could be considered essential in a public access tool if it's going to make everything easier.

The main idea behind this will always be allowing a user to connect to his account (on the web server), reach his files (e.g input files), save them locally or do whatever they want with them, select some input and hit "submit" so this could transform on a queue command and start running the job on the cluster.

I'll get back, probable tomorrow, with more details.

Thanks again for everything, don't go away just yet :-)

Cheers,