User:Llywrch/Essay

Thoughts after scrubbing the links in a thousand different articles

First, after looking at a thousand pages on AboutUs, I have confess: there are a lot of webpages out there on a wide variety of topics. It's one of those things I remember noticing when I first became interested in the World Wide Web back in 1994, but it's someting easy to forget. After all of this time using the resources of the Internet, it's quite easy to just hunker down with a few useful websites and rely on an outdated impression of the rest of the Web to shape one's views.

It would nice to think that Aboutus is helping its users to get a handle on this wealth of information out there, providing an overview of the websites and what they contain, as well as providing links to related sites either by "Categories" or the selection of sites under "Related Domains".

However, the power of the "Category" links have yet to be fully harnessed. In part I feel this is because we haven't discussed how we want to use the potential of this tool. There are, generally speaking, two ways that categories could be set up. The first is to build a tree-structure out of them, a hierarchy where something like "Business and Economy" would be the top catgory, under which would be categories like "Construction", "Technology" and the "Internet", & under these another layer of categories like "Home Construction", "Heavy Machinery", "Construction Engineering" and so forth. There are strengths and weaknesses with this approach. For example the tendency that we'd find large number of overlapping categories at the bottom of each page that users would want to simplify down to a few essential ones; thus Intel.com might not have a "Computer" category on it because it already has a "Motherboard" tag and "Motherboard" is a subset of "Computer".

The other way to structure categories would be to think of them as a searchable database of metadata tags. One could then find information by using queries to narrow the search; for example, asking the search function for all of the pages in the category "Web 2.0" and "Portland" but not "Maine". This would require a lot of work to clean up the categories -- but so would any scheme to organize these categories. We simply need to decide how to do this.

Cleaning these categories -- purging out all of the junk and duplicate categories -- is another priority. At one time I thought that this could only be done by hand, that there are just too many cases where a live human being must make a call for a bot to usefully make inroads on thsi problem. Having done a fair sample of this work myself, I now see that there are some areas that a bot would help greatly -- & without causing any harm:

  • There are a lot of categories that are a simple preposition, like "For", "in", "During" -- as well as other useless terms like "And", "Or" and "Best". And then every letter of the alphabet (as well as a number of acronyms) has its own category, most of which could easily be purged. (The important exception is "C", which in many cases refers to the programming language.)
  • Does anyone find the category "Cheap" useful? Think about it: it could be made to be useful if we structure categories like a database as I proposed above. Or we might find that it is useless. This is something that needs to be discussed before someone decides to be bold and start purging it from very many pages.
  • There are a lot of duplicate categories, like "User group" and "user groups". I'm partial to the later, but if we favor one plural form over the singular (e.g. "Restaurants" over "Restaurant"), we should do it for every category. While it would be easy to stop all work on this by endless indecisive talk, even to announce that "it's going to be this way unless someone speaks up" pushes the matter to a consensus. And gives some of us (okay, me) confidence to proceed in making sweeping changes.
  • On the other hand, going through these categories has proved to me that there are some that are badly needed. For example, I've found the need for "Commercial Real Estate" and "Century 21" (a chain of realtors, whose members have over a thousand websites). I'm certain that a little more persistent work slogging through pages will reveal more of the same.

Now for "Related domains".

It would be easy to make this no more than a duplicate of "Categories" -- but at the same time, it has the potential to provide connections that "Categories" do not. As an example, look how they are used on OVP.com: there the "Related Domains" are grouped not only by associated businesses, but also by clients. Unfortunately, the weakness of this feature is that for it to be truly useful, this section has to be created by hand on every page. As a result, for many of the pages where I have modified this section, I have tended to copy the section from one page to another. Not the best solution, but in many cases I feel I have left things better than I found them.