[Haifux] some additions and eratta to today's lecture

Tue Mar 15 12:00:02 MSK 2011

"Hijacking" the thread to a more general HD discussion.

Since there was an interest in SSD (flash) drives, here is a benchmark
of normal hard-drives and flash drives:

http://techreport.com/articles.x/19330/3

2 points which are easy to see in the graph, and were raised in the
discussion yesterday:

* In normal (mechanical) hard drives, the first bytes are read much
faster than the last bytes.

* In SSDs the read speed is *mostly* the same everywhere (though it
depends on the controller's behavior and the workload history of the
disk)

--Shachar

On Tue, Mar 15, 2011 at 3:52 AM, guy keren <choo at actcom.co.il> wrote:
>
> someone reminded me about the "small trail through the linux kernel"
> link i mentioned. it is:
>
> http://www.win.tue.nl/~aeb/linux/vfs/trail.html
>
> note that it is from 2001 and relates to kernel 2.4 (or even older) -
> but the general has not completely changed.
>
> you can find a more up-to-date info about this in the book "the linux
> kernel, 3rd edition" - in the VFS chapter.
>
> --guy
>
> On Tue, 2011-03-15 at 01:35 +0200, guy keren wrote:
> > 1. etzion asked about controlling the age of dirty pages before pdflush
> >    flushes them - the default value is 30 seconds, and can be seen by:
> >
> >     cat /proc/sys/vm/dirty_expire_centisecs
> >
> >     (the time there is in milli-seconds). it can be changed by echoing
> > the desired time into that file, e.g. to change it to 40 seconds:
> >
> >      echo 40 > /proc/sys/vm/dirty_expire_centisecs
> >
> > this parameter (and some other pdflush-related parameters) is described
> > in the link i mentioned during the meeting today - that talks about
> > configuring pdflush for write-intensive workloads:
> >
> >     http://www.westnet.com/~gsmith/content/linux-pdflush.htm
> >
> > 2. if you read the above link - you'll see that in case there are too
> > many dirty pages - the writing to disk is done directly by processes
> > calling the write() system call (if you'll check the stack trace -
> > you'll see these processes waiting for the page transfer to complete).
> > this is done to serve as a flow-control mechanism, that slows down the
> > processes that fill up the dirty cache.
> >
> > 3. regarding whether the write system call copied the data directly into
> > the page-cache, or into a temporary buffer - it indeed copies the data
> > directly into the page-cache. the generic write() system call passes
> > control to the file-system's code - and this eventually allocates the
> > pages, then maps the page with the user-data into the kernel's address
> > space, and copy the data into the page-cache's page. note: i checked
> > this for the ext3 file-system in kernel 2.6.18 - but the code it uses
> > resides in generic kernel code - so i think other file-systems will
> > behave the same.
> >
> > 4. regarding the output of iostat -x:
> >
> >     the 'wrqm/s' (write requests merged per-second) field, shows the
> > number of write requests that were merged into existing requests by the
> > elevator. i.e. if 3 requests were merged together, '2' will be added to
> > this counter (the first of these requests was not merged. the other two
> > were merged into the first). this, at least according to the code of
> > kernel 2.6.18
> >
> >
> > if i forgot some question, or you have a question about what we covered
> > today - please shout!
> >
> > --guy
> >
> > _______________________________________________
> > Haifux mailing list
> > Haifux at haifux.org
> > http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
>
>
> _______________________________________________
> Haifux mailing list
> Haifux at haifux.org
> http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux