View Full Version : Question: What do you know about Cyveillance?
starcom
Feb 1st, 2006, 10:02 AM
Yesterday my site was visited by Cyveillance.
I've read that Cyveillance is a company that conducts searches of unauthorised copyrighted music for the music industry (RIAA).
I'm completely safe as I play only podsafe music and royalty free production music in my podcasts, but it did catch my attention.
Has anyone had any experience with Cyveillance?
bg
Slone
Feb 1st, 2006, 11:14 AM
Deleted my last post without checking my htaccess filters...
Cyveillance has sucked up a lot of my pages before, for which I have the following in my htaccess if you just want to cut and paste it into yours.
RewriteCond %{REMOTE_ADDR} ^38\.118\.25\.(5[6-9]¦6[0-3])$ [OR]
RewriteCond %{REMOTE_ADDR} ^38\.118\.42\.3[2-9]$ [OR]
you can also just block their spider
RewriteCond %{HTTP_USER_AGENT} ^Cyveillance [OR]
Hope this helps!
Scott
mongrel
Feb 1st, 2006, 11:59 AM
Rock on!
Would that be at the root level?
Slone
Feb 1st, 2006, 12:46 PM
Yes in the www or htdocs or where you store your html files.
I'd provide technical assistance, but don't want to be blamed for any server issues. Although if you know you're way around htaccess then the following code will work fine for you.
Block the IP address
RewriteCond %{REMOTE_ADDR} ^38\.118\.25\.(5[6-9]¦6[0-3])$ [OR]
RewriteCond %{REMOTE_ADDR} ^38\.118\.42\.3[2-9]$ [OR]
Block the Spider
RewriteCond %{HTTP_USER_AGENT} ^Cyveillance [OR]
htaccess is better than the usual robots.txt file.
For newbies to htaccess - It's easy! Here are some tutorials.
http://www.javascriptkit.com/howto/htaccess.shtml (pretty good one)
http://httpd.apache.org/docs/1.3/howto/htaccess.html
http://www.freewebmasterhelp.com/tutorials/htaccess/
Hope this helps...
Slone
Feb 1st, 2006, 12:48 PM
P.S. Keep an eye on spider, bot IP addresses to keep up to date when those change. A lot of companies realize their ban to many sites and change up IP addresses or their spider/bot name.
starcom
Feb 1st, 2006, 01:55 PM
I just talked to GoDaddy.com who is my host, and they suggested doing a robots.txt file.
Is there any way to exclude Cyveillance using a robot.txt file?
bg
Slone
Feb 1st, 2006, 02:07 PM
GoDaddy is wrong and out of the loop... remember you probably spoke with a level I tech, which is prob going off a manual ;)
A robots.txt file is often ignored just because it is so easy to ignore.
If someone is searching for copyright violations etc.. or even the RIAA if they wish, their team is going to ignore the robots.txt for sure.
.htaccess is you most secure bet here! …and if I may add - To make your .htaccess more secure, be sure to enter this below.
<Files .htaccess>
deny from all
</Files>
This will prevent the ability to view your .htaccess
Cheers,
Scott
Slone
Feb 1st, 2006, 02:11 PM
Oh and if you must use robots.txt
1.) Create a file called: robots.txt
2.) Past in the following: (plus a few others you should include)
User-agent: *
Disallow: /
User-agent: *
Disallow: /cgi-bin/
User-agent: Cyveillance
Disallow: /
You're done! Now upload to your ftp where your web files are...
starcom
Feb 1st, 2006, 02:25 PM
Thanks for the help!
In appreciation....I've subscribed via iTunes and will listen to the show tomorrow.
Also found a great resource for writing robot text for anyone who is new to it like I was.
http://www.searchengineworld.com/robots/robots_tutorial.htm
bg
Slone
Feb 1st, 2006, 02:38 PM
Hope you enjoy the show...
Happy to help, downloading your show as we speak :D
Cheers!
Scott
ElNacho
Feb 1st, 2006, 04:23 PM
wait, why shud peeps block their ip and robots and such?
Slone
Feb 1st, 2006, 04:50 PM
Good question:
In this case 'Cyveillance' sucks a lot of bandwidth, which for Podcasters is valuable enough. Why would you give up your bandwidth to someone scraping your site for copyright material... they don't pay for it! Poorly designed spider anyway.
More than anything - filtering who scrapes your content helps protect your content. Perhaps you have a membership side of your site, and want to keep things secure.
It's all part of taking your site management to the next level.
Scott
ElNacho
Feb 1st, 2006, 04:59 PM
if i just blocked the 3 ips from my cPanel, would that do the same as the .htaccess? does it just add those blocked IPS to my .htaccess?
edit: whaa? I could have sworn there was a "Block IP" button...huh...
ElNacho
Feb 1st, 2006, 05:06 PM
do i just add
RewriteCond %{REMOTE_ADDR} ^38\.118\.25\.(5[6-9]¦6[0-3])$ [OR]
RewriteCond %{REMOTE_ADDR} ^38\.118\.42\.3[2-9]$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Cyveillance [OR]
to the .htaccess file in the index of my site?
what's with
deny from 38.118.25.uh...
deny from 38.118.42.uh...
well replace the uh...with...i cant really understand that ip bit there
would that work?
just...what do i add to my .htaccess file?
right now it looks like this:
RewriteCond %{REMOTE_ADDR} ^38\.118\.25\.(5[6-9]¦6[0-3])$ [OR]
RewriteCond %{REMOTE_ADDR} ^38\.118\.42\.3[2-9]$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Cyveillance [OR]
that enough?
Slone
Feb 1st, 2006, 05:15 PM
I don't want to find myself advising how to setup a good htaccess, in fear you make a mistake and then I find myself in a support situation... sure you can understand. Use those links posted above for help and tutorials.
All that aside:
RewriteCond is much better than deny in my opinion.
deny from 00.0.000.00 - It's less code and but not as native as a rewrite.
Personal preference it's your call. If you find yourself with a huge filter list the 'deny' may be the way to go.