HeavyJobs

Rating: 0 - 0 votes

Company Logo

Company Name

Company Contact

Page Type

This page is about a company.

What (summary)

Manage long-running jobs on available compute resources (servers) using db tables to keep track of work, and inter-process communication to keep track of workers.

Why this is important

We will use this infrastructure to manage our algorithmic data collection. This is a strategic direction for the company.

DoneDone

We will be satisfied with this infrastructure when:

we can launch, balance, and diagnose all steps of our pilot whois refresh path.
- fetchers
- parsers
- aggregators
we have startup scripts that will resume proper job processing after a machine reboot or other operational events.
we can monitor overall health and productivity of all heavy job processing through a web interface.

Bugs and Todos

(new items)

Improve the job deployment process (see below)
Integrate stop and terminate: stop leaves looping jobs looping
Heavy_job_monitor racks up lots of cpu. Why? Sleeping jobs aren't so good either.

(prioritized high, medium and low for week with Ethan.)

~~A worker should mark a chunk with its id~~ (array of ids when restarted)
- ~~this lets us draw a line per worker on throughput graph~~
~~Workers should do partially completed chunks before starting new chunks.~~
- ~~for now we will add ui that can reset an incomplete chunk to zero.~~
~~A worker should sleep when a manager has no more work to do~~
~~Integrate the two controllers~~

~~show chunk id in heavy_jobs/show~~
~~show ps of workers in heavy_worker/status~~
~~kill or restart hung workers~~
move fetchers into framework, have it create parsing chunks
Tally throughput, good records, etc
keep a log of automatic actions
Should HeavyJob be the source for actions?? Need better requirements here.

Finer-grained progress
Zabbix script to count busy and idle workers. (Or count something else interesting. Ethan is not too interested in this. Mostly he doesn't want "noise" alerts that distract him from real emergencies.)

Deploying Heavy Jobs

We're currently running these jobs on mist which is not one of our deployment targets. Mist is chosen by entry in a config file. A git clone of compostus is brought over. Two processes, a heavy job manager and heavy job monitor are launched in a screen session. It is then possible to start new workers through the web interface.

If one wants to add or modify a job algorithms, or modify the monitor, one must log into mist, find the screen session, and then update it as follows.

kill monitor (kills workers)
pull code
restart monitor
restart interrupted chunks
start new workers

Use a variation of this to update the manager, a simpler task because it has no children. When jobs are distributed across multiple machines, there will be a monitor per machine but only one manager.

Pilot Workflow

HeavyJobs

Company Logo

Company Name

Company Contact

Page Type

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating

Company Logo

Company Name

Company Contact

Page Type

What (summary)

Why this is important

DoneDone

Bugs and Todos

Deploying Heavy Jobs

Pilot Workflow

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating

Company Logo

Company Name

Company Contact

Page Type

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating