Difference between revisions of "HeavyJobs"

Latest revision as of 18:34, 4 May 2008

Rating: 0 - 0 votes

Company Logo

Company Name

Company Contact

Page Type

This page is about a company.

OurWork

What (summary)

Manage long-running jobs on available compute resources (servers) using db tables to keep track of work, and inter-process communication to keep track of workers.

http://www.aboutus.org/au_web_services/heavy_jobs

Why this is important

We will use this infrastructure to manage our algorithmic data collection. This is a strategic direction for the company.

DoneDone

We will be satisfied with this infrastructure when:

we can launch, balance, and diagnose all steps of our pilot whois refresh path.
- fetchers
- parsers
- aggregators
we have startup scripts that will resume proper job processing after a machine reboot or other operational events.
we can monitor overall health and productivity of all heavy job processing through a web interface.

Bugs and Todos

(new items)

Detect when worker goes dark > 2 min. Record last status in chunk; terminate and restart.
from feed_aggregator: :error=>"private method `log_error' called for #<0xb7e9c318>

@@ Line 3: / Line 3: @@
 == What (summary) ==
 Manage long-running jobs on available compute resources (servers) using db tables to keep track of work, and inter-process communication to keep track of workers.
+* http://www.aboutus.org/au_web_services/heavy_jobs
 == Why this is important ==
@@ Line 19: / Line 21: @@
 (new items)
+* Detect when worker goes dark > 2 min. Record last status in chunk; terminate and restart.
+* from feed_aggregator: :error=>"private method `log_error' called for #<HeavyWorker:0xb7e9c318>"
+* heavy_jobs/show can't find pid when buried in :last
+** (related) first attempt to look both places (in some other method) is coded with nil sensitivity
 * Improve the job deployment process (see below)
-* Integrate stop and terminate: stop leaves looping jobs looping
+* Add startup scripts that launch monitor (and manager) on server reboot
-* Heavy_job_monitor racks up lots of cpu. Why? Sleeping jobs aren't so good either.
+* <s>Integrate stop and terminate: stop leaves looping jobs looping</s>
+* <s>Heavy_job_monitor racks up lots of cpu. Why? (trying longer sleeps)  </s>
+** <s>Sleeping jobs aren't so good either.</s>
 (prioritized high, medium and low for week with Ethan.)
@@ Line 57: / Line 65: @@
 Use a variation of this to update the manager, a simpler task because it has no children. When jobs are distributed across multiple machines, there will be a monitor per machine but only one manager.
+The working part of any heavy job should be unit tested before deployment. This is then wrapped up as a job that can be launched within the Heavy Jobs framework. We don't yet have a functional testing strategy for this part so be careful. Once deployed and started, one should look at production db to make sure that a job is working as intended.
 == Pilot Workflow ==
@@ Line 62: / Line 72: @@
 [[Image:HeavyJobsWorkflow.png|500px]]
+</noinclude>
+[[Category:DevelopmentTeam]]
 [[Category:OpenTask]]
-[[Category:DevelopmentTeam]]
-</noinclude>

Difference between revisions of "HeavyJobs"

Company Logo

Company Name

Company Contact

Page Type

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating

Latest revision as of 18:34, 4 May 2008

Company Logo

Company Name

Company Contact

Page Type

What (summary)

Why this is important

DoneDone

Bugs and Todos

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating

Company Logo

Company Name

Company Contact

Page Type

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating