BlueLinking Case Variants
1. We add page_lookup table to have the following schema: (this is in extensions/title_updater.php)
page_lookup -------- page_id -- new page_title_ci (only now it's more than just case insensitive --> only alphanumeric, ., and -) page_title page_len page_namespace -- new page_is_redirect -- (just in case we want to update Parser to use this table too)
2a. populate table (page_lookup.sql)
2b. build a trigger that populates the table (page_lookup_triggers.sql)
(the following need to be updated to use this table:
- apache rewrite (search only default namespace)
- special aboutus search (should order by namespace)
)
Mysql doesn't support multiple triggers on same table-same event. so we have to take out the page_map triggers to add our page_lookup triggers.
3. Then we make it so that the check to see whether links exist looks in the page_lookup table (not the page table);
- if we get one that has a perfect title match, make it blue
- if we get one that has no match, make it red
- if we get one that has a casespace match but no case match, we fix the page and save a new revision.
Parser ReplaceLinkHolders
4. That check does not get run on a cached page. That means we need to:
- make sure that when a page gets created, any page with a redlink that points to that page's casespace needs to have its cache invalidated
- put together bot strategy to invalidate the existing cache so that all the pages in the system get updated.
Test Cases
The following pages exist:
- Main Page
- User:Ethan
- AboutUs:Community_Portal
- Template:Sample Page
- Yowsers!#$%
- G o o g l e
- (goo_gle)
- Image:BeautifulGirl.jpg
There are links to the following pages:
I. existing as is (on a page named BlueAsIs) (each of the above)
II. doesn't exist, but there are others in casespace (on a page named BlueAsWillBe)
- main page
- @#$main page
- User: ethan
- Template:samplepage
- yowsers
- Image:beautiful_girl.jpg
III no such page at all (on a page named AlwaysRed)
- Ethan
- Sample Page
- User:google
- AboutUs:Main Page
- MinnieTheMoocher
- go.ogl-e
- nonsense
- Image:beautiful.gif
Test Before
- All the links in Case I are blue and link to the correct page
- All the links in Case II and III are red
>
Test After
- All the links in Case I and II blue and link to the correct page
- All the links in Case III are red
Process 1. Re-render each of the case pages by calling the rendering directly.
Process 2 Invalidate cache on each of case pages and then make a request for the url of each of the pages
Testing Rewrite
Should currently pass:
wagn.org rewrites to Wagn.org Wagn.org rewrites to Wagn.org WAGN.Org rewrites to Wagn.org Wang.org hmmm... creates new page (first time only, of course)
Should pass with edits
Grass_Commons.org rewrites to GrassCommons.org assumegoodfaith rewrites to AssumeGoodFaith (assume_good_faith) rewrites to AssumeGoodFaith
Testing Special Search
the above pages all get instantly returned by special search (no rewrite...).
Testing Issues
in RunTests we set up a global $wgDatabase that points to the test database. However, the main wiki code apparently isn't using that database-- it still uses the primary.
Relevant Performance Tests
mysql> insert into page_lookup select page_id, lcase(replace(page_title, '_', )), page_namespace, page_title, page_len, page_is_redirect from page order by page_id limit 1000,1000; Query OK, 1000 rows affected, 13 warnings (0.08 sec) Records: 1000 Duplicates: 0 Warnings: 13
mysql> insert into page_lookup select page_id, lcase(replace(page_title, '_', )), page_namespace, page_title, page_len, page_is_redirect from page order by page_id limit 10000,10000; Query OK, 10000 rows affected, 387 warnings (0.92 sec) Records: 10000 Duplicates: 0 Warnings: 387
mysql> insert into page_lookup select page_id, lcase(replace(page_title, '_', )), page_namespace, page_title, page_len, page_is_redirect from page order by page_id limit 100000,100000; Query OK, 100000 rows affected, 25254 warnings (11.19 sec) Records: 100000 Duplicates: 0 Warnings: 25254
mysql> insert into page_lookup select page_id, lcase(replace(page_title, '_', )), page_namespace, page_title, page_len, page_is_redirect from page order by page_id limit 1000000,1000000; Query OK, 1000000 rows affected (1 min 48.82 sec) Records: 1000000 Duplicates: 0 Warnings: 0
