Difference between revisions of "HeavyJobs"

Revision as of 01:39, 23 April 2008

Rating: 0 - 0 votes

Company Logo

Company Name

Company Contact

Page Type

This page is about a company.

OurWork

What (summary)

Manage long-running jobs on available compute resources (servers) using db tables to keep track of work, and inter-process communication to keep track of workers.

Why this is important

We will use this infrastructure to manage our algorithmic data collection. This is a strategic direction for the company.

DoneDone

We will be satisfied with this infrastructure when:

we can launch, balance, and diagnose all steps of our pilot whois refresh path.
- fetchers
- parsers
- aggregators
we have startup scripts that will resume proper job processing after a machine reboot
we can monitor overall health of all heavy job processing with zabbix, including system administrator alerts

Bugs and Todos

(prioritized high, medium and low for this week.)

~~A worker should mark a chunk with its id~~ (array of ids when restarted)
- this lets us draw a line per worker on throughput graph
Workers should do partially completed chunks before starting new chunks.
- for now we will add ui that can reset an incomplete chunk to zero.
~~A worker should sleep when a manager has no more work to do~~
~~Integrate the two controllers~~

~~show chunk id in heavy_jobs/show~~
~~show ps of workers in heavy_worker/status~~
~~kill or restart hung workers~~
move fetchers into framework, have it create parsing chunks
Tally throughput, good records, etc
keep a log of automatic actions
Should HeavyJob be the source for actions?? Need better requirements here.

Finer-grained progress
Zabbix script to count busy and idle workers. (Or count something else interesting.)

Pilot Workflow

@@ Line 27: / Line 27: @@
-* show chunk id in heavy_jobs/show
+* <s>show chunk id in heavy_jobs/show</s>
 * <s>show ps of workers in heavy_worker/status</s>
 * <s>kill or restart hung workers</s>

Difference between revisions of "HeavyJobs"

Revision as of 01:39, 23 April 2008

Company Logo

Company Name

Company Contact

Page Type

What (summary)

Why this is important

DoneDone

Bugs and Todos

Pilot Workflow

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating

Company Logo

Company Name

Company Contact

Page Type

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating