1. [crit] (28)No space left on device — problem

    Today got call from friend saying he is unable to restart crashed apache. Logged in and just tailed apache error logs and I saw following error 

    [crit] (28)No space left on device: mod_jk: could not create jk_log_lock 

    After fiddling around with files created just searched for above string (easy right). That pointed me to a link saying it may be a all semaphore used issue and no more can be created.

    Just searched how to list semaphores and I ran following command as root

    root# ipcs -s

    Now the next task was to clean all these semaphores.

    root# ipcrm -s <semaphore_id>

    was the command to be used but I had lots of semaphores to delete. So wrote following shell oneliner to do it

    root# for i in `ipcs -s | grep apache | cut -d ” ” -f2`;do ipcrm -s $I; done 

    String to be greped may change depending on you software which is not releasing the semaphores after closing. Take care you are not removing in use semaphore. 

  2. Windows troubleshooting utilities  →

  3. via @sarathmenon 

    via @sarathmenon 

  4. MySql hot backups

    After having very painful experience that mysql backups, maintenance and restore, I finally stumbled upon Xtrabackup (I found it a year back, but it turned 1.0 recently). 

    I was using mysqldump with “binary log copying” as incremental backup from a dedicated slave in 2nd datacenter (aws ec2)  different from main one (just because second datacenter provided a easy redundant file storage - read s3 ) and I also have one more slave running in main datacenter too.

    Above logs will cause loss of 1 hour data in worst case (fortunately it never happened.  This backup will be used only when somebody did a drop database on master and it propogated to all it slaves or master and all the slave conked off).

    Above strategy will cause delay in restore as sql files created by mysqldump takes it own sweet sequential time (Though this can be reduced by splitting the dumpfile into per table sql file and load in parallel ).

    Just started playing with Xtrabackup on home systems and I found it to be very good, much better then other things I have evaluated like a maatkit parallel backup script.

    After doing some search I also got some online tests done by people. Will try similar tests at home.

    Few helpful link

      1. Simple usage doc

      2. Few tips from users

  5. A 2 yr old article on server performance audit framework by Percona →

  6. epoll v/s poll - myth breaking -- via @vinayakh →

  7. Article about scalable networking  →

  8. Sudoku in lisp

    I wrote this when totally rusty with ‘paranthesis’.

  9. Multilingual Text Stemmer - a naive comparison

    Stemmers are critical in any text mining/text analysis/Information retrieval application like search/text classifiers. Following are the few which I found which are being used for some time (some are used for last 30 years). 

    1. Porter Stemmer - One of the oldest one. Better ones available, so can be avoided it use as its primitive. ( porter2 is a improved version of it).

    2. Kstem - Available in lucene, I feel this is good for search applications. Stems produced are more near to real words.  

    3. Snowball - This is also available in lucene and even in Sphinx. This is collection of stemmer in different languages (eg. Russian, Dutch, English,2 German). It got porter stemmer to in its collection. It even got one lovins stemmer. This is 

  10. Creating User Defined Service on Windows  →

    and you can delete your service by (ref

    sc [<ServerName>] delete [<ServiceName>]

  11. Flash Security Sandbox and permission controls

    Following links should make things easy to understand.

    Security Sandboxes

    Permission Control 

    Allow Secure Domain

    Global Security Setting Panel

    Bitmap Drawing Security Violation

    Setting Sandbox Type

  12. One of my favorite comic, I use this to lecture my younger colleagues about SQL injections during code reviews. 

visit xkcd

    One of my favorite comic, I use this to lecture my younger colleagues about SQL injections during code reviews. 

    visit xkcd

  13. Identifying process causing lots of IOWait

    Most common reason of high IOWait is bottleneck on disk io. Other reasons could be high network traffic or even terminal to output).

    Once you see high IOWait problem comes to identifying the cause of this high IOWait.

    To do that you can use block IO debugging capability of Linux kernel.

    echo 1 > /proc/sys/vm/block_dump

    After this run 

    dmesg | egrep "READ|WRITE|dirtied" |  awk '{print $1}'|  sort | uniq -c | sort -rn | head
       1583 kjournald(2764):
        545 kjournald(1023):
         48 beam.smp(21021):
         47 sendmail(20992):
         28 crond(20974):
    

    Now disable the block IO debugging

    echo 0 > /proc/sys/vm/block_dump

    Ignore kernel threads like kjournald (which is background thread for journaling all the writes happning throught ext3/4-fs). Ignore them and remaining user processes should be on your target list. (Here beam.smp and sendmail). Thanks Vijayr for pointing this out. 

    IOWait causing process will be sure in the output list. You can proceed from here with more investigations.

    Sysstat-7.1.6 (Available in debian repos and not on RHEL/CentOS) ‘s pidstat gives per process IO statistics.

    pidstat -d 2

    This gives snapshot of process doing IO every two seconds. This should help too.

    iotop a top like utility which shows who is doing most io right now. This requires Linux 2.6.20 or more with TASK_DELAY_ACCT and TASK_IO_ACCOUNTING options enabled.

  14. Per CPU kernel threads appearing in top’s output

    Today server suddenly stopped responding on port 80, I got an alert. I sshed and found that I can see lots of “ksoftirqd, watchdog, migration” in output of top non of the app server processes taking any cpu.

    These are basically per-cpu-kernel threads. If you have n cpu you will find n threads of above type (there are more per-cpu-kernel threads like events, aio, kblockd, cqueue, ata, kondemand, rpciod, kmpathd).

    Kernel threads can be seen in top or ps output, they will have zero VM size as they don’t have any userspace memory. 

    So about the todays’s problem creator threads.

     

           ksoftirqd  is  a  per-cpu  kernel  thread that runs when the machine is
           under heavy soft-interrupt load.  Soft interrupts are normally serviced
           on  return from a hard interrupt, but it's possible for soft interrupts
           to be triggered more quickly than they can  be  serviced.   If  a  soft
           interrupt  is  triggered  for  a  second time while soft interrupts are
           being handled, the ksoftirq daemon is  triggered  to  handle  the  soft
           interrupts in process context.  If ksoftirqd is taking more than a tiny
           percentage of CPU time, this indicates the machine is under heavy  soft
           interrupt load.
           migration this thread does the migration of processes from one CPU to another.
           watchdog this is the process which keeps check that system is working fine.

    So today lots of IO (unusual disk + usual network IO on a serving server) was causing lots of interrupts and that put pressure on ksoftirqd daemon. This led to userspace CPU starvation and because of that migration starting trying to move processes between processors, in middle of these chaos watchdog who wanted to publish a heartbeat was waiting to do the publishing IO. I started the hunt to find the trouble making process. It was zipping of file of size 100GB was triggering the problem. It was effect of too much of logging.

  15. Cloud node cheaper the Amazon EC2

    vps.net looks cheaper to start with. I Like.