Difference between revisions of "WhoisRefreshRunRefresh"

Revision as of 12:54, 5 November 2007

Rating: 0 - 0 votes

Company Logo

Company Name

Company Contact

Page Type

This page is about a company.

Run over all pages pertaining to website information that have 0 human edits and get and insert fresh whois information. FOr example, www.aboutus.org/facebook.com.

Steps to DoneDone

Find out how many pages this would hit - approx 7,659,827
modify one page
- Contact information:
  - Contact name
  - Contact email (protected)
  - Street Address (protected)
  - City, State/Province, Postal Code
  - Geocode for maps location
  - Contact Phone Number
  - Contact Fax Number
  - Wiki comment for as-of date of whois info
design a process to modify all pages

Status

Currently, PageCreationBot, PageScrapeBot and WhoisParsing together generate a page as follows:

PageCreationBot creates a domainbox template and uses the thumbnail tag to imbed the thumbnail into itself. PageScrapeBot fetches thumbnail from alexa for a given domain name and stores it locally in a pre-defined directory. We need to figure out the mechanism behind thumbnail tag. i.e. how does it locate a particular thumbnail image. Corresponding to this, we need to provide a mechanism to put the fetched thumbnail image so that the mediawiki can locate it.
PageCreationBot creates a section named 'Logo' where it puts the logo that PageScrapeBot fetched from the site itself. The logo is inserted into the page using the wiki Image tag. Need to find a better way of doing this ala thumbnails.
Next, the PageCreationBot creates a description section which is filled with description fetched from alexa followed by any about us text extracted from the site. (The aboutus text is contained in a sub-section)
Related and Inlinking Domains sections are populated. Related Domains are fetched from google, whereas sites linking in are fetched from alexa.
Keywords fetched from meta tags in the home page are placed in a seperate section 'Keyword'
Categories fetched from alexa are used to create categories that the page belongs to using the categories tag.
Contact info is to be fetched from contact table that is populated by WhoisParsing and put in it's own section.

Things to do:

Need to embed logo and thumbnails in same manner.
Understand mechanism behind thumbnail tag.
Devise a mechansim to detect registration by proxy. Decide on plan of action if proxy registration encountered.

Possible Scenario

One of our valued clients enters the following url : http://www.aboutus.org/i_am_not_on_aboutus_yet.com
Unfortunately, this page currently does not exist in our db.
The default wiki behavior is to return a newly created empty page to the client.
Surely, we can do better.
So we try to make a best-effort autogenerated page
Our top-level glue will first call PageScrapeBot's process method with this new domain as its argument. This will result in domain-specific information being dumped into database.
It will then do the same to fetch whois information by calling WhoIsParsings' parse method.
At the end of this process, the db is populated with relevant details regarding this domain.
Once loaded with all this amunition...it will fire a request to pagecreationbot to create this new page using relevant data from db.
And voila, we have a newly created page for our valued client.

@@ Line 16: / Line 16: @@
 *** Wiki comment for as-of date of whois info
 * design a process to modify all pages
+== Status ==
+Currently, PageCreationBot, PageScrapeBot and WhoisParsing together generate a page as follows:
+* PageCreationBot creates a domainbox template and uses the thumbnail tag to imbed the thumbnail into itself. PageScrapeBot fetches thumbnail from alexa for a given domain name and stores it locally in a pre-defined directory. We need to figure out the mechanism behind thumbnail tag. i.e. how does it locate a particular thumbnail image. Corresponding to this, we need to provide a mechanism to put the fetched thumbnail image so that the mediawiki can locate it.
+* PageCreationBot creates a section named 'Logo' where it puts the logo that PageScrapeBot fetched from the site itself. The logo is inserted into the page using the wiki Image tag. Need to find a better way of doing this ala thumbnails.
+* Next, the PageCreationBot creates a description section which is filled with description fetched from alexa followed by any about us text extracted from the site. (The aboutus text is contained in a sub-section)
+* Related and Inlinking Domains sections are populated. Related Domains are fetched from google, whereas sites linking in are fetched from alexa.
+* Keywords fetched from meta tags in the home page are placed in a seperate section 'Keyword'
+* Categories fetched from alexa are used to create categories that the page belongs to using the categories tag.
+* Contact info is to be fetched from contact table that is populated by WhoisParsing and put in it's own section.
+Things to do:
+* Need to embed logo and thumbnails in same manner.
+* Understand mechanism behind thumbnail tag.
+* Devise a mechansim to detect registration by proxy. Decide on plan of action if proxy registration encountered.
 == Possible Scenario ==

Difference between revisions of "WhoisRefreshRunRefresh"

Revision as of 12:54, 5 November 2007

Company Logo

Company Name

Company Contact

Page Type

Steps to DoneDone

Status

Possible Scenario

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating

Company Logo

Company Name

Company Contact

Page Type

Edit Page Image

Edit Name

Edit Contact Information

Edit Page Type

Map

Edit Page Rating