HeavyJobs

Revision as of 19:50, 11 April 2008 by Stephen Judkins (talk | contribs) (Bugs and Todos: more todos)



OurWork Edit-chalk-10bo12.png

What (summary)

Manage long-running jobs on available compute resources (servers) using db tables to keep track of work, and inter-process communication to keep track of workers.

Why this is important

We will use this infrastructure to manage our algorithmic data collection. This is a strategic direction for the company.

DoneDone

We will be satisfied with this infrastructure when:

  • we can launch, balance, and diagnose all steps of our pilot whois refresh path.
    • fetchers
    • parsers
    • aggregators
  • we have startup scripts that will resume proper job processing after a machine reboot
  • we can monitor overall health of all heavy job processing with zabbix, including system administrator alerts

Bugs and Todos

(non-prioritized at the moment)

  • Workers should do partially completed chunks before starting new chunks.
  • A worker should terminate when a manager has no more work to do.
  • Integrate the two controllers (how to be determined)
  • Finer-grained progress
  • Tally throughput, good records, etc
  • Should HeavyJob be the source for actions??
  • need to get pid of hung worker (check that this is fixed)
  • kill or restart hung workers
  • keep a log of automatic actions
  • move fetchers into framework

Pilot Workflow

HeavyJobsWorkflow.png



Retrieved from "http://aboutus.com/index.php?title=HeavyJobs&oldid=15261300"