[Haifux] [Haifux Meeting] Crawling in Lightning

Alon Altman alon at 8ln.org
Sun Feb 10 10:41:35 MSK 2008


Well, then you can just block that useragent with the added benefit of
blocking IE users.

  Alon

On 2/9/08, Tzafrir Rehan <tzafrir.r at gmail.com> wrote:
> Depends on how nice the crawler is...
>
> If it uses a specific user agent, respects robots.txt, and keeps a certain
> IP address, then you can block it using those methods.
>
> If it sends a user agent of "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT
> 5.2; .NET CLR 1.1.4322)", ignores robots.txt, crawls a page every 15 seconds
> or so, and switches an IP address after a short while using anonymous
> proxies (also read: virus infected computers worldwide) then no program, or
> human, can know it's not a human surfing.
>
> --
>
>    Tzafrir Rehan.
>
>
> On Feb 10, 2008 8:47 AM, Shahar Dag <dag at cs.technion.ac.il> wrote:
> > Hi
> >
> > OK, this sounds interesting, but what about the other side.
> > How do a web muster can block all those crawlers?
> > (I prefer a mail answer since I can't come to the lecture)
> >
> > Thanks
> > Shahar Dag
> >
> _____________________________________________________________________________________________
> > I am looking for old Vinyl record.
> > If you have any that you don't need please mail me
> >
> > Thanks
> > Shahar
> >
> >
> >
> >
> > ----- Original Message -----
> > From: "Eli Billauer" <eli at billauer.co.il>
> > To: "Haifa linux club" <haifux at haifux.org>
> > Cc: "linux-il" <linux-il at cs.huji.ac.il>
> > Sent: Saturday, February 09, 2008 3:15 PM
> > Subject: [Haifux Meeting] Crawling in Lightning
> >
> >
> > > Next Monday, 11th of February, at 18:30 the Haifa Linux Club, will
> gather
> > > for a lightning talk session
> > >
> > >             Crawling in Lightning
> > >
> > > Abstract
> > >
> > > This is a show-me-the-source meeting, during which several one-liners
> and
> > > scripts will be presented. The core subject is methods for interacting
> > > with HTTP web servers ("faking Firefox") in order to fetch information,
> > > vote automatically in polls etc.
> > >
> > > This meeting consists of several short talks, by several speakers (*).
> The
> > > agenda is as follows, 5-10 minutes per item (subject to change):
> > >
> > > * A very short introduction to HTTP (mainly showing a typical session
> > > transcript)
> > > * GET
> > > * wget
> > > * curl
> > > * A script in Python with exception handling
> > > * A short script in Python for fetching mp3's
> > > * Perl script to rip image galleries (LWP) with cookie handling for
> login
> > > * A Ruby script
> > > * Perl: Using the POST method to vote automatically
> > > * A Perl/Tk GUI script helping in developing crawlers
> > >
> > > (*) It turned out that there is more interest than experience in the
> field
> > > among Haifuxers. As a result, more than one of the items above will be
> > > delivered by yours truly.
> > >
> > > ======================================================
> > >
> > > We meet in Taub building, room 6. For location information see:
> > > http://www.haifux.org/where.html
> > >
> > > Attendance is free, and you are all invited!
> > >
> > > ======================================================
> > >
> > > Future Lectures:
> > >
> > > Tapping into the Fountain of CPUs---On Operating System Support for
> > > Programmable Devices, by Muli Ben-Yehuda, 25/2/2008
> > >
> > > ======================================================
> > >
> > > We are always interested in hearing your talks and ideas. If you wish to
> > > give a talk, hold a discussion, or just plan some event Haifux might be
> > > interested in, please contact us at webmaster at haifux.org
> > >
> > >
> > >
> > >
> =================================================================
> > > To unsubscribe, send mail to linux-il-request at cs.huji.ac.il with
> > > the word "unsubscribe" in the message body, e.g., run the command
> > > echo unsubscribe | mail linux-il-request at cs.huji.ac.il
> >
> >
> >
> > >
> >
> > _______________________________________________
> > Haifux mailing list
> > Haifux at haifux.org
> > http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
> >
>
>
> _______________________________________________
> Haifux mailing list
> Haifux at haifux.org
> http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
>
>



More information about the Haifux mailing list