Difference between revisions of "Rewrite PageCreationBot"
Arif Iqbal (talk | contribs) (Added stuff from RewritePageCreationScraper so we can nuke RewritePageCreationScraper) |
|||
Line 1: | Line 1: | ||
− | <noinclude><big>[[DevelopmentTeam]] < [[DevelopmentPriorities|Priorities]] < </noinclude>[[Rewrite AboutUsBot]] ('''3''' remaining) (('''[[User:Stephen Judkins|Stephen]]''') <noinclude></big> | + | <noinclude><big>[[DevelopmentTeam]] < [[DevelopmentPriorities|Priorities]] < </noinclude>[[Rewrite AboutUsBot]] ('''3''' remaining) (('''[[User:Stephen Judkins|Stephen]]''') <includeonly>{{JustTinyEditIcon|Rewrite AboutUsBot}}</includeonly><noinclude></big> |
− | |||
== What (summary) == | == What (summary) == | ||
Line 22: | Line 21: | ||
* Checks robots.txt before spidering the website. | * Checks robots.txt before spidering the website. | ||
− | [[Category: | + | [[Category:DevelopmentTeamTask]] |
</noinclude> | </noinclude> |
Revision as of 11:39, 22 August 2007
What (summary)
- New page-building bot
- Still relies on Java/Tomcat to do crawling (for now)
- Carefully tested
- This is essentially a 1-1 rewrite of the scraping pieces of the bot in ruby instead of Java.
Why this is important
- We need to have control over the pages that our created on our site.
- The old bot was known to pollute the database; we need control over all the access points that could screw up our data.
- Gaining mastery over the code so that we can add new features easily.
DoneDone
- Creates news pages based on a template
- Monitoring and logging have been added (tests whether or not the bot succeeds)
- Hooked in to all the old points Bot was
- Checks robots.txt before spidering the website.