ConcernsPage/Archive/Archive1
I have a copyright and historical information concern
Some people here have expressed concern that they don't want either the description of their site or their contact information shown on AboutUs pages and further that deleting the information using the standard edit is insufficient because the history remains. We now have a way to get rid of the history and reinstate a domain page with a minimal set of information. This will prevent the page from getting recreated and contact information restored by the bot. It's not automated yet, but send me (ray@aboutus.com) any names you own and wish to have scrubbed in this manner and I will do them manually for now. --User:Ray King | talk 21:33, 27 August 2006 (PDT)
Additionally, upon normal creation of a page, the system has been adjusted to take much less descriptive text from the site and it now calls that out with with "Excerpted from the website description" and the actual text is indented and italicized. You can see examples of how this is now working by looking at pages recently created. --User:Ray King | talk 22:54, 27 August 2006 (PDT)
I don't want the AboutUsBot to gather information from my site
If there is not already a page on the wiki for your site, then simply use one of the methods described here. If there is already a page, then see the paragraph above. --User:Ray King | talk 22:56, 27 August 2006 (PDT)
I don't want my address and other contact information shown
Send me an e-mail (ray@aboutus.com) and I will remove it and reinstate it with a shorter page that doesn't include that information or a history of it.
As a note, domain ownership contact information is publicly available at a number of other sites, some examples below:
In addition to this site, if you would like to prevent your information from showing up anywhere, you may wish to contact your domain name registrar and purchase a privacy service which most of the majors domain name registrars have. That doesn't deal with historical information which may already be out there, but at least it deals with it on a go forward basis. --User:Ray King | talk 13:10, 28 August 2006 (PDT)
Put concerns below this section and I will do my best to address them above
Well, the changes to the bot and the manual deletion are a step in the right direction, but there's still nothing to prevent someone who has nothing whatsoever to do with a domain from editing and posting information which you cannot guarantee the accuracy of. The site still does not require domain owners to "opt in" to the listing, and so many will have no idea that you're still writing their personal information on an easy to search public site. However even this deletion will still leave a "placeholder" page on the site - this is not good enough - I don't want my domain listed on this site. Ever. You need a way to achieve this.
I hope you've got a lot of stuff in place to prevent spammers crawling the site, ripping off peoples addresses, email addresses and phone numbers - information like that is very valuable to the spam/con community.
Some responses:
- It's a wiki and edits made by most people are constructive, but unfortunately, vandals and spammers do show up from time to time. We are watching changes carefully so we can spot and remove vandalism.
- On "full deletion", this is a wiki of websites, so someone else wanting to make a page for a given site can always do so. The "placeholder" prevents the bot from creating a more complete page with contact information at that time. Our users should have the right to write about and review any site they want to, just as anyone can blog about another site.
- We have gone to lengths to protect e-mail addresses. They default to a protected graphic and are only scrapable if later edited without the graphic tag. And even then we try to go back and put those in for people.
These things said, your points are still well taken and we will continue to work towards making this a productive excercise. Thanks --User:Ray King | talk 01:52, 28 August 2006 (PDT)
I don't thnk what you're doing is here is either useful or ethical or even legal.
I was in shock to discover whole pages of several of my websites sitting here, some of them with my full name and address and even a street map to my house! Gee, thanks! It's been worth my while to get privacy protection for my registration when you guys just dig up old archived information from several years ago from a registrar I no longer use. Can you tel how furious I am?
Why did you do it? Did you ever stop to think that maybe we don't want out content to be replicated anywhere else? And most likely we don't want private information plastered in a wiki? As far as I'm concerned you're not authorized to disseminate this information, even if you somehow obtained it readily - though it must have been also rather unethically since all registrars I know discourage bulk enquiries.
Yeah, sure, if the information is public from the registrar anybody can get it - from that registrar, when asking for it specifically, if it's not protected by privacy protection as mine is. Not from a whatchamaycallit wiki that got a feed from god knows where!
Maybe you feel it's a public service or an exercise in web programaing to find and extract all manner of information like that.
Before you guys dig yourselves into a really deep hole, stop it right there. You'd be well advised to get permission from site owners before you disseminate this information.
- I appreciate the fair and considered criticism, and am sorry that we are not on the same page. We believe that AboutUs is a valuable free resource to many people. Yes, the whois information is public and even without AboutUs, whois history is available from many other sources. But if you would prefer it not to be on this site, then please either edit it away or contact me with any further questions. --User:Ray King | talk 15:54, 24 August 2006 (PDT)
- Ray, the problem is that even if the information is "edited away", it's still in the page's history. If you hide the page histories from the world, that would go some distance toward meeting the objections many will have. Even better would be a "delete this entry and its history completely" button or other function. Maybe add something to your bot that looks for a top-level file with aboutus.org instructions, rather like robots.txt. That way you could let site authors mark their domain for purging from your wiki and have some reasonable assurance that the request came from them. But I think you're going to need some kind of opt-out mechanism, and fairly quickly, because your only other reasonable alternative is to dump every bit of information you've collected and rebuild it on a purely opt-in basis. --Eric A. Meyer
- Eric, yes, I am using a more or less standard implementation of MediaWiki and need to figure out how to make your request possible. I also agree that you are not the only one who will be asking this question. My current understanding is that search engines will only look at the current page, although users can go to the page history if they are so motivated. I will research this further and thanks again for your thoughts --User:Ray King | talk 05:24, 25 August 2006 (PDT)
How to stop the bot
How can I prevent your bot crawling my site and stealing my copyrighted content?
- The site won't re-create your page unless it is deleted. So please use the edit button and delete whatever you don't want to appear here --User:Ray King | talk 16:00, 24 August 2006 (PDT)
- Why should webmasters and content owners be responsible for editing out information for every domain they own when they did not want it here in the first place? I personally spent more than an hour removing all my information for about 30 domains. I was hardly impressed. I'll be even less impressed if my information is scraped and reposted. In fact, if my information is reposted, then I'll be seeking other remedies to protect my copyrighted material from being scraped and posted.
- Indeed. I don't care about editing the information after the fact, I don't want you to scrape my copyrighted content and break domain registrar agreements to begin with. Why don't you have your bot obey robots.txt and give it a decent UA in the first place? Should be pretty simple to do. As for editing your own content, seems your "recent changes patrol" likes to go a bit overboard (rather pointless to allow people to remove their own content if you are going to have that patrol, so make up your mind. Personally I'd say dump your patrol). I had a fight with one of them after attempting to edit my own page and he kept reverting it.
- Apologies on recentchanges patrol, the editor probably didn't know why the page was being erased. --User:Ray King | talk 22:30, 27 August 2006 (PDT)
- It's all well and good saying "use the edit button and delete whatever you don't want to appear here" as it's still present under that "history" button at the top. How do I get something removed completely, and what guarantee do you provide that the site software will not allow an entry for a domain with that name to be recreated in the future?
- We can now do that, see post at the top --User:Ray King | talk 22:30, 27 August 2006 (PDT)
What about copyright?
Ray, even when i dont know you... and i have no reason to think this is made with "not-quite-honorable" intentions.. im wondering What about the copyright? Every single one of my sites had been scrapped and you are displaying its content here. Alongside with the information on the domains being displayed (which i know are publi trough the whois) and i dont feel comfortable with this.
So please, is there a mail where i can send you all of my domains to be deleted from this database, and can you instruct your bot to stop crwaling my sites? many thanks
- Yes
. Also, there is no intention to violate copyright, but rather, like search engines do, provide a little text to help the user quickly understand more about the site, and then a link to the site (which in this case is the thumbnail). Also, the site is entirely editable, so while I'm happy to assist, you can also edit pages yourself and show as much or as little as you'd like. Thanks and let me know if I can be of further assistance --User:Ray King | talk 16:29, 24 August 2006 (PDT)
- I disagree with your comparison to search engines. Here's why. Search engines either accept URL submissions (opt in) or follow links, and in the process obey robots.txt files and/or meta tags (allowing the website owner to choose). They also honor removal requests by this method. They also give webmasters the opportunity to remove any images, or include them by either robots text or meta tags. In other words, legit services put control over a site's content exactly where it belongs, in the hands of the site owner him/herself. On the other hand, AboutUs.org not only has zero information on how to completely remove domains and their associated history files, the only response I see so far is to "edit your own information".
- Another difference is that search engines will deliver you quality traffic for your product or service. Where is the benefit to webmasters or content owners by having their pages and content scraped and posted here -- information that can be edited by any anonymous user or worse, a malicious competitor, at any time they please? Where is the protection of copyright and the protection against unjust statements, for site owners? Most owners would certainly not have the time to constantly monitor unapproved/unverified changes to their listings. This particular aspect is rather infuriating that anyone can alter anyone else's listings anytime they like.
- I'm sorry, but to try and spin this site as being in the same context as a legit search engine service is hardly a valid comparison.
- Unless I am mistaken, this site neither respects robots.txt nor meta tags, and certainly not copyright. Quite frankly, this seems to be a site built on the work, copyrights, trademarks, and domains of others, for your own personal gain, and without acquiring anyone's consent (to opt in). At this point, I would not even consider this site as legitimate, but rather the results of a rogue harvesting bot.
- You have still not answered the original question... How to prevent the bot stealing my site content. Please provide the IP address it uses, so I can deny it access at the server level, or the UA string, so I can deny it access. The approach of "steal first, and if the webmaster doesn't like it he should edit it away" is no good, as many webmasters will not know this site exists. You should only include sites in your database who have clearly stated that they want to be here i.e. "opt-in" rather than "opt-out". Another difference between your site and a search engine, is that with a search engine, I don't need to know the name of the site - here, I do. So why don't I just go to the site?
- Basically, I do not approve of the contents of my site (even just a section of the about page) reproduced under a reference to my domain name, then placed on a page carrying adverts. I have adverts on my site - if people can read my content here, how do I get revenue?
- This is exactly the reason why I deem this site as one built for personal gain by the owner(s). Ray King (and it appears NameIntelligence/DomainTools.com are possibly involved), in my opinion, seems to have conspired to both profit and gain search engine relevance on the backs of other's domains, trademarks, service marks and copyrights -- for which they have neither any claim, nor have permission. Beyond the Google advertisements, I suspect this same data scraping may possibly be sold to marketers as well. In other words, this site benefits no one except the owner(s) and data harvesters. It certainly provides no redeeming benefit for the site owner's whose content has been scraped. The lack of any official reasonable response throughout this discussion leads me to believe that the only way to ensure one's web properties, service marks, trademarks, and copyrights are permanently deleted from this site, and free of damages, may be via injunctive relief. This is, however, a weekend, so I believe it only fair to wait for an official response after the weekend.
- See latest post on top and send me the domains so I can remove the history for you --User:Ray King | talk 22:33, 27 August 2006 (PDT)
I'm Just Horrified
How can you justify creating a site that is based on stealing copyrighted content? Your site sucking content from two of my sites, but conveniently edited off the copyright notices on its pages. Please implement a system to allow site owners to disallow their sites with robots.txt!
--User:lisavollrath | talk 16:29, 24 August 2006 (PDT)
I think this site, and its scraping methods, without providing any method of removal of entire domains and content by the owners of the content, is a recipe for a legal challenge. I am by no means a litigious person, but without a method to opt out entire domains, this system seems destined for lawsuits -- more especially with the ability of anonymous users being able to post negatively against a competitor unjustly. An opt out method is absolutely needed, and I would suggest rather quickly, in light of the negative attention this site is quickly gathering in various web circles.
- I think we need self-written prose. Other than that, I see no reason why this website would be infringing upon copyright.—♦♦ SʘʘTHING(Я) 13:30, 25 August 2006 (PDT)
- This is insane I dont want to have to edit my entries.. if i dont want to be here, i would expect that you would honor any requests to be removed from your DB. So please let us know how we can get our stuff PERMENANTLY removed.
- Uhhh, we don't want our personal information displayed and our copyrights broken? Drop the history, give us webmasters a way to opt-in (opt-out is so very spammer-like), and I think you'll please most of the people here.
- See latest post on top and send me the domains so I can remove the history for you. My apologies for not having this sooner --User:Ray King | talk 22:35, 27 August 2006 (PDT)
Respect Copyright
It's laughable that your edit page contains the warning:
"You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see Project:Copyrights for details). DO NOT SUBMIT COPYRIGHTED WORK WITHOUT PERMISSION!"
When your bot doesn't respect other people's copyright.
Plus, once copyrighted material or personal contact details are on here, it's permenantly in the page history - so simply deleting the text doesn't work.
- Ironic, isn't it? They yell at us to not submit copyrighted work and yet they scrape our pages for a lot more text than a search engine does. Not to mention that a search engine is controllable when the AboutUs bot is not, except by banning it by IP. Copyright notices were even removed, according to one account up the page. I was able to successfully block the bot by adding 66.249.16.207 to my iptables on my server. Tested and AboutUs complained there was no text on the page when there actually was.
- Yes, blocking by IP at the server level is one method. However, it is possible to place the bot on any IP within their given IP block. That is, it does not necessarily need to run from the AboutUS.com/org domain's IP itself. Of course, given the somewhat shady scraping method of operation already, I would not hold my breath on disclosure of an IP range by which to block the bot(s). And blocking still does not provide the content owner relief from potential damages resulting from false accusations, unjust comments and discussion, or other harmful posts, contained in permanent history or otherwise. As such, I feel the only reasonable solution is to allow website owners to completely remove their domains in their entirety, including removal of all history, and make the service opt-in. Anything less is unacceptable.
- I absolutely agree, it was just one suggestion for the interim until we can work out how to stop this mess altogether. I for one would fully support aboutus.org going down completely, but since that is not likely an option without legal action, it was one suggestion to keep the bot at bay.
How do I prevent my site from turning up here in the first place?
Opting out after my information has been posted online just isn't cutting it for me. If you are going to provide this "service" (whom does this benefit?) you need to provide a way for webmasters to opt out BEFORE your bot steals copyrighted content and reposts it right alongside your address and a map to your house. I am sending an e-mail requesting that my domain be removed, but what about my websites that have not been posted yet?
- See AboutUsBot for info on robots.txt --User:Ray King | talk 22:36, 27 August 2006 (PDT)
THIS WEBSITE IS A HORRIBLE MISTAKE
Stealing content to drive ad revenue. Nice one, Ray. Way to totally miss the clueboat.
Hope you can keep up with all the negative comments that people are leaving. (Shouldn't that be a clue that this is a bad idea?)
- I shouldn't have to opt-out of a site before I know it exists. Blocking the robot is all well and good, but once it's ripped off my copyrighted content and dumped it in the page history, the damage is done.
- See top post for information on scrubbing the page history --User:Ray King | talk 22:37, 27 August 2006 (PDT)
No title
So, essentially, am I to understand that you have publicly lied about honoring requests for removal of domains by request? YOUR words were exactly as follows: If you are the site owner, just let me know and i'll be happy to delete the domain. --User:Ray King. And now you refuse after stating you will?
- Sure, this is a brand new beta site and I'm in the process of understanding people's concerns and trying to create policy that best addresses those concerns while at the same time develops the site. Rather than get into unprofessional comments back and forth, I'm trying to more clearly articulate fair and well thought out policies. If that's not happening fast enough, I can only apologize. We did post something on the AboutUsBot and Robots.txt just a couple of hours ago. If you wish to help, then do it in a constructive manner and I will engage as best I can. Also, I don't mind anonymous comments, but understand that if you don't leave your name I will take it less seriously. --User:Ray King | talk 20:34, 26 August 2006 (PDT)
- Ray, you not only have my name, but my e-mail address AND phone number, plus the list of domains (sent 3 times, of which you replied once). Given that I want nothing to do with this site once my domains are deleted, I have little interest in registering. The only thing I want from you, is to please honor my request by your own statement promising removal. -Robert
- Robert, A) I didn't know it was you that left this particular comment. B) You sent your first e-mail to the wrong address so I didn't get it, and then two e-mails in close proximity, and now you're deleting the pages yourself - none of which I've reverted. As I mentioned earlier, I am working on clearer policies to handle situations such as these and will post as soon as possible. --User:Ray King | talk 20:54, 26 August 2006 (PDT)
Copyright and Personal Information in the History
You have still not explained how to purge the history containing copyrighted content and/or persoanl information, nor have you explained how to opt out of the site forever, and prevent recreation of the page about a particular domain.
I still believe that the site should be "opt in" not "opt out" as even when following your instructions to "opt out" by deleteing content, the copyrighted material stolen from other sites is still present in the history.
I do not want my domains listed on this site, I do not even want "placeholder" pages here to prevent you determining this as "vandalism", I do not want anyone else to be able to create a page about a domain I own once my domain has been removed from your service. Is this too much to ask? I think not.
- I will be posting a solution which I think will address much of your concern, but probably not all of it. --User:Ray King | talk 03:26, 27 August 2006 (PDT)
- I beleive this site does not comply with the Acceptable Use Policy of your upstream host. You should have had these "solutions" in place before you went live and ripped peoples copyrighted material from their own sites. Blocking the robot is only any good before the domain is indexed, unfortunately, people generally only find out about this site after the fact - as my sites were ripped off before I even knew this site existed, blocking the robot is very much "shutting the gate after the horse has bolted" - far too little, far too late. You should not be ripping content off of other sites to provide content for your own site (which seems to serve little purpose other than to generate Google Ads revenue) without the permission of the copyright holder.
No Title
So you're scraping content from other places, slapping Google ads on it, and hoping the internet will fill in the rest so you can make a buck. Very classy.
I hope you listen to the comments here and call off this whole thing. It can only end in lawyers.
- Admittedly I too was skeptical myself, at first. But when you look closer at the potential this site has for networking and creating working relationships with similar websites for which you may not have otherwise known about, I believe it's worth taking a second look. This site clearly has enormous potential. Ray, the founder seems to take criticism well. In fact, he seems to welcome it to help shape the community itself, and he appears to be working towards reasonable resolutions to user's concerns. I think it's important to note that this site is in a BETA stage and constantly evolving as each day goes by. While I understand your points of concern, I think you will find your concerns do not go unnoticed here. And that this site will ultimately become one that everyone will "want" to be listed in, rather than removed from it. I think it just has to evolve, and right now experiencing growing pains. Just my $0.02 --Robert
- Thanks for the note --User:Ray King | talk 01:33, 28 August 2006 (PDT)