WhoisRefreshRunRefresh
Run over all pages pertaining to website information that have 0 human edits and get and insert fresh whois information. FOr example, www.aboutus.org/facebook.com.
Steps to DoneDone
- Find out how many pages this would hit - approx 7,659,827
- modify one page
- Contact information:
- Contact name
- Contact email (protected)
- Street Address (protected)
- City, State/Province, Postal Code
- Geocode for maps location
- Contact Phone Number
- Contact Fax Number
- Wiki comment for as-of date of whois info
- Contact information:
- design a process to modify all pages
Status
Currently, PageCreationBot, PageScrapeBot and WhoisParsing together generate a page as follows:
- PageCreationBot creates a domainbox template and uses the thumbnail tag to imbed the thumbnail into itself. PageScrapeBot fetches thumbnail from alexa for a given domain name and stores it locally in a pre-defined directory. We need to figure out the mechanism behind thumbnail tag. i.e. how does it locate a particular thumbnail image. Corresponding to this, we need to provide a mechanism to put the fetched thumbnail image so that the mediawiki can locate it.
- PageCreationBot creates a section named 'Logo' where it puts the logo that PageScrapeBot fetched from the site itself. The logo is inserted into the page using the wiki Image tag. Need to find a better way of doing this ala thumbnails.
- Next, the PageCreationBot creates a description section which is filled with description fetched from alexa followed by any about us text extracted from the site. (The aboutus text is contained in a sub-section)
- Related and Inlinking Domains sections are populated. Related Domains are fetched from google, whereas sites linking in are fetched from alexa.
- Keywords fetched from meta tags in the home page are placed in a seperate section 'Keyword'
- Categories fetched from alexa are used to create categories that the page belongs to using the categories tag.
- Contact info is to be fetched from contact table that is populated by WhoisParsing and put in it's own section.
Things to do:
- Need to embed logo and thumbnails in same manner.
- Understand mechanism behind thumbnail tag.
- Devise a mechanism to detect registration by proxy. Decide on plan of action if proxy registration encountered.
- Decide on course of action based on the status of domain. i.e. parked, locked.
Possible Scenario
- One of our valued clients enters the following url : http://www.aboutus.org/i_am_not_on_aboutus_yet.com
- Unfortunately, this page currently does not exist in our db.
- The default wiki behavior is to return a newly created empty page to the client.
- Surely, we can do better.
- So we try to make a best-effort autogenerated page
- Our top-level glue will first call PageScrapeBot's process method with this new domain as its argument. This will result in domain-specific information being dumped into database.
- It will then do the same to fetch whois information by calling WhoIsParsings' parse method.
- At the end of this process, the db is populated with relevant details regarding this domain.
- Once loaded with all this amunition...it will fire a request to pagecreationbot to create this new page using relevant data from db.
- And voila, we have a newly created page for our valued client.

