BlueLinking Case Variants

1. We add page_lookup table to have the following schema: (this is in extensions/title_updater.php)

page_lookup
--------
page_id  -- new
page_title_ci (only now it's more than just case insensitive --> only alphanumeric, ., and -)
page_title
page_len
page_namespace -- new
page_is_redirect -- (just in case we want to update Parser to use this table too)


2a. populate table (page_lookup.sql)

2b. build a trigger that populates the table (page_lookup_triggers.sql)

(the following need to be updated to use this table:

  • apache rewrite (search only default namespace)
  • special aboutus search (should order by namespace)

)

Mysql doesn't support multiple triggers on same table-same event. so we have to take out the page_map triggers to add our page_lookup triggers.

3. Then we make it so that the check to see whether links exist looks in the page_lookup table (not the page table);

  • if we get one that has a perfect title match, make it blue
  • if we get one that has no match, make it red
  • if we get one that has a casespace match but no case match, we fix the page and save a new revision.

Parser ReplaceLinkHolders


4. That check does not get run on a cached page. That means we need to:

  • make sure that when a page gets created, any page with a redlink that points to that page's casespace needs to have its cache invalidated
  • put together bot strategy to invalidate the existing cache so that all the pages in the system get updated.


Test Cases

The following pages exist:

  • Main Page
  • User:Ethan
  • AboutUs:Community_Portal
  • Template:Sample Page
  • Yowsers!#$%
  • google
  • Google
  • GOOGLE
  • G o o g l e
  • (goo_gle)
  • Image:BeautifulGirl.jpg


There are links to the following pages:

I. existing as is (on a page named BlueAsIs) (each of the above)

II. doesn't exist, but there are others in casespace (on a page named BlueAsWillBe)

  • main page
  • @#$main page
  • User: ethan
  • Template:samplepage
  • yowsers
  • gOOgle
  • Image:beautiful_girl.jpg

III no such page at all (on a page named AlwaysRed)

  • Ethan
  • Sample Page
  • User:google
  • AboutUs:Main Page
  • MinnieTheMoocher
  • go.ogl-e
  • nonsense
  • Image:beautiful.gif


Test Before

  1. All the links in Case I are blue and link to the correct page
  2. All the links in Case II and III are red

>

Test After

  1. All the links in Case I and II blue and link to the correct page
  2. All the links in Case III are red


Process 1.
Re-render each of the case pages by calling the rendering directly.
Process 2
Invalidate cache on each of case pages and then make a request for the url of each of the pages


Testing Rewrite

Should currently pass:

wagn.org rewrites to Wagn.org
Wagn.org rewrites to Wagn.org
WAGN.Org rewrites to Wagn.org
Wang.org hmmm... creates new page  (first time only, of course)


Should pass with edits

Grass_Commons.org rewrites to GrassCommons.org
assumegoodfaith rewrites to AssumeGoodFaith
(assume_good_faith) rewrites to AssumeGoodFaith


Testing Special Search

the above pages all get instantly returned by special search (no rewrite...).

Testing Issues

in RunTests we set up a global $wgDatabase that points to the test database. However, the main wiki code apparently isn't using that database-- it still uses the primary.


Relevant Performance Tests

mysql> insert into page_lookup select page_id, lcase(replace(page_title, '_', )), page_namespace, page_title, page_len,   page_is_redirect from page order by page_id limit 1000,1000;
Query OK, 1000 rows affected, 13 warnings (0.08 sec)
Records: 1000  Duplicates: 0  Warnings: 13
mysql> insert into page_lookup select page_id, lcase(replace(page_title, '_', )), page_namespace, page_title, page_len,  page_is_redirect from page order by page_id limit 10000,10000;
Query OK, 10000 rows affected, 387 warnings (0.92 sec)
Records: 10000  Duplicates: 0  Warnings: 387
mysql> insert into page_lookup select page_id, lcase(replace(page_title, '_', )), page_namespace, page_title, page_len, page_is_redirect from page order by page_id limit 100000,100000;
Query OK, 100000 rows affected, 25254 warnings (11.19 sec)
Records: 100000  Duplicates: 0  Warnings: 25254
mysql> insert into page_lookup select page_id, lcase(replace(page_title, '_', )), page_namespace, page_title, page_len, page_is_redirect from page order by page_id limit 1000000,1000000;
Query OK, 1000000 rows affected (1 min 48.82 sec)
Records: 1000000  Duplicates: 0  Warnings: 0


Retrieved from "http://aboutus.com/index.php?title=BlueLinking_Case_Variants&oldid=5829214"