Depends on how nice the crawler is...<br><br>If it uses a specific user agent, respects robots.txt, and keeps a certain IP address, then you can block it using those methods.<br><br>If it sends a user agent of "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)", ignores robots.txt, crawls a page every 15 seconds or so, and switches an IP address after a short while using anonymous proxies (also read: virus infected computers worldwide) then no program, or human, can know it's not a human surfing.<br>
<br>--<br><br> Tzafrir Rehan.<br><br><div class="gmail_quote">On Feb 10, 2008 8:47 AM, Shahar Dag <<a href="mailto:dag@cs.technion.ac.il">dag@cs.technion.ac.il</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi<br><br>OK, this sounds interesting, but what about the other side.<br>How do a web muster can block all those crawlers?<br>(I prefer a mail answer since I can't come to the lecture)<br><br>Thanks<br>Shahar Dag<br>_____________________________________________________________________________________________<br>
I am looking for old Vinyl record.<br>If you have any that you don't need please mail me<br><br>Thanks<br>Shahar<br><div><div></div><div class="Wj3C7c"><br>----- Original Message -----<br>From: "Eli Billauer" <<a href="mailto:eli@billauer.co.il">eli@billauer.co.il</a>><br>
To: "Haifa linux club" <<a href="mailto:haifux@haifux.org">haifux@haifux.org</a>><br>Cc: "linux-il" <<a href="mailto:linux-il@cs.huji.ac.il">linux-il@cs.huji.ac.il</a>><br>Sent: Saturday, February 09, 2008 3:15 PM<br>
Subject: [Haifux Meeting] Crawling in Lightning<br><br><br>> Next Monday, 11th of February, at 18:30 the Haifa Linux Club, will gather<br>> for a lightning talk session<br>><br>> Crawling in Lightning<br>
><br>> Abstract<br>><br>> This is a show-me-the-source meeting, during which several one-liners and<br>> scripts will be presented. The core subject is methods for interacting<br>> with HTTP web servers ("faking Firefox") in order to fetch information,<br>
> vote automatically in polls etc.<br>><br>> This meeting consists of several short talks, by several speakers (*). The<br>> agenda is as follows, 5-10 minutes per item (subject to change):<br>><br>> * A very short introduction to HTTP (mainly showing a typical session<br>
> transcript)<br>> * GET<br>> * wget<br>> * curl<br>> * A script in Python with exception handling<br>> * A short script in Python for fetching mp3's<br>> * Perl script to rip image galleries (LWP) with cookie handling for login<br>
> * A Ruby script<br>> * Perl: Using the POST method to vote automatically<br>> * A Perl/Tk GUI script helping in developing crawlers<br>><br>> (*) It turned out that there is more interest than experience in the field<br>
> among Haifuxers. As a result, more than one of the items above will be<br>> delivered by yours truly.<br>><br>> ======================================================<br>><br>> We meet in Taub building, room 6. For location information see:<br>
> <a href="http://www.haifux.org/where.html" target="_blank">http://www.haifux.org/where.html</a><br>><br>> Attendance is free, and you are all invited!<br>><br>> ======================================================<br>
><br>> Future Lectures:<br>><br>> Tapping into the Fountain of CPUs---On Operating System Support for<br>> Programmable Devices, by Muli Ben-Yehuda, 25/2/2008<br>><br>> ======================================================<br>
><br>> We are always interested in hearing your talks and ideas. If you wish to<br>> give a talk, hold a discussion, or just plan some event Haifux might be<br>> interested in, please contact us at <a href="mailto:webmaster@haifux.org">webmaster@haifux.org</a><br>
><br>><br>><br></div></div>> =================================================================<br>> To unsubscribe, send mail to <a href="mailto:linux-il-request@cs.huji.ac.il">linux-il-request@cs.huji.ac.il</a> with<br>
> the word "unsubscribe" in the message body, e.g., run the command<br>> echo unsubscribe | mail <a href="mailto:linux-il-request@cs.huji.ac.il">linux-il-request@cs.huji.ac.il</a><br><div><div></div><div class="Wj3C7c">
><br><br>_______________________________________________<br>Haifux mailing list<br><a href="mailto:Haifux@haifux.org">Haifux@haifux.org</a><br><a href="http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux" target="_blank">http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux</a><br>
</div></div></blockquote></div><br>