Difference between revisions of "Rewrite PageCreationBot"

(flagged as noextra)
 

(46 intermediate revisions by 12 users not shown)

Line 1: Line 1:
<noinclude><big>[[DevelopmentTeam]] < [[DevelopmentPriorities|Priorities]] < </noinclude>[[Rewrite AboutUsBot]] ('''3''' remaining) (('''[[User:Stephen Judkins|Stephen]]''') <noinclude></big>
+
<noinclude><big>[[OurWork]] < [[DevelopmentTeam]] < [[DevelopmentTeamPriorities|Priorities]] < </noinclude>('''2''') [[Rewrite PageCreationBot]] ('''[[Mohammad Ghufran|Ghufran]]''', '''[[Umar Sheikh]]''') {{JustTinyEditIcon|Rewrite PageCreationBot}}<noinclude></big>
 
+
__NOTOC__
 
 
 
== What (summary) ==
 
== What (summary) ==
  
* Remove the bot from Thunderclap and shutdown apache
+
* New page-building bot
* Tie in to partner apis
+
* Still relies on Java/Tomcat to do crawling (for now)
 
* Carefully tested
 
* Carefully tested
  
 +
== Current Status ==
 +
* <s>Creates new pages based on a template</s>
 +
* <s>Monitoring and Logging has been added</s>
 +
* <s>Test cases added</s>
 +
* We have created a sample page which is a rough sketch of how a page looks like after being created by the bot. [[PageCreationBot_Sample | Here...]]
 +
* The current version of the PageCreationBot is not using the thumbnail extracted from Alexa. It is currently using the thumbnail tag being used in the Domain_Page template.
 +
** This can be changed by using the get_thumbnail function that is already in place.
  
 
== Why this is important ==
 
== Why this is important ==
 +
* We need to have control over the pages that are created on our site.
 +
* The old bot was known to pollute the database; we need control over all the access points that could screw up our data.
 +
* Gaining mastery over the code so that we can add new features easily.
  
Moving the bot off of our Database server will improve site performanceAlso, moving it to the same technology we are using for other parts of the system gives us better control over the bot and hence more opportunities to create good features for the bot.
+
== [[DoneDone]] ==
 +
* Creates news pages based on a template
 +
* Monitoring and logging have been added (tests whether or not the bot succeeds)
 +
** Output to a log file.  Either on each squal box (with aggregation) or an NFS volumeHave emailed Ethan and Michael about this.
 +
* Hooked in to all the old points Bot was
 +
** Not exactly the same points, but the same end-user functionality.
 +
* [[Projects:BotTest]] problems fixed
  
== [[DoneDone]] ==
+
== Bot insertion points into Mediawiki ==
 +
* <strike>/wiki/skins/common/generatePage.js (and some other javascript that we should remove)</strike>
 +
* <strike>/wiki/extensions/AboutUsDomainRedirect/SpecialRedirectToDomain.php (deprecate and point to CaseSpace)</strike>
 +
* <strike>/wiki/extensions/CaseSpace/CaseSpace.php (Ultimately, here is where the magic will happen.)</strike>
 +
* /wiki/extensions/AboutUsBuildDomain/AboutUsBuildDomain.php should be the best place to keep it.
  
[[Category:DevelopmentTeamProject]]
+
== Schema ==
 +
* New schema location http://images.aboutus.org/images/b/be/Aboutusbot_new.zip. Its an sql file and not a compressed one.
 +
==Discussion==
 +
* I heard rumor of a possible change in format for new pages.  Is this true?  Where is the discussion about the new format possibilities happening?  [[User:TedErnst|TedErnst]] | <small>[[User talk:TedErnst|talk]]</small> 13:50, 25 October 2007 (PDT)
 +
* I think that the bot is still using <nowiki><graphic></nowiki> tag instead of the tag <nowiki><email></nowiki> with the new name. Please correct me if I'm wrong. :) {{IconSig|Vartan|17:21, 25 October 2007 (PDT)}}
 +
[[Category:OpenTask]]
 +
[[Category:DevelopmentTeam]]
 
</noinclude>
 
</noinclude>

Latest revision as of 11:31, 19 December 2013

OurWork Edit-chalk-10bo12.png

What (summary)

  • New page-building bot
  • Still relies on Java/Tomcat to do crawling (for now)
  • Carefully tested

Current Status

  • Creates new pages based on a template
  • Monitoring and Logging has been added
  • Test cases added
  • We have created a sample page which is a rough sketch of how a page looks like after being created by the bot. Here...
  • The current version of the PageCreationBot is not using the thumbnail extracted from Alexa. It is currently using the thumbnail tag being used in the Domain_Page template.
    • This can be changed by using the get_thumbnail function that is already in place.

Why this is important

  • We need to have control over the pages that are created on our site.
  • The old bot was known to pollute the database; we need control over all the access points that could screw up our data.
  • Gaining mastery over the code so that we can add new features easily.

DoneDone

  • Creates news pages based on a template
  • Monitoring and logging have been added (tests whether or not the bot succeeds)
    • Output to a log file. Either on each squal box (with aggregation) or an NFS volume. Have emailed Ethan and Michael about this.
  • Hooked in to all the old points Bot was
    • Not exactly the same points, but the same end-user functionality.
  • Projects:BotTest problems fixed

Bot insertion points into Mediawiki

  • /wiki/skins/common/generatePage.js (and some other javascript that we should remove)
  • /wiki/extensions/AboutUsDomainRedirect/SpecialRedirectToDomain.php (deprecate and point to CaseSpace)
  • /wiki/extensions/CaseSpace/CaseSpace.php (Ultimately, here is where the magic will happen.)
  • /wiki/extensions/AboutUsBuildDomain/AboutUsBuildDomain.php should be the best place to keep it.

Schema

Discussion

  • I heard rumor of a possible change in format for new pages. Is this true? Where is the discussion about the new format possibilities happening? TedErnst | talk 13:50, 25 October 2007 (PDT)
  • I think that the bot is still using <graphic> tag instead of the tag <email> with the new name. Please correct me if I'm wrong. :) Vartan 17:21, 25 October 2007 (PDT)

Retrieved from "http://aboutus.com/index.php?title=Rewrite_PageCreationBot&oldid=40118904"