2014-09-08

tools

linux performace tools

###

linux benchmarking tools

linux observability sar

linux tuning tools

Tools: Basic

uptime
top or htop
mpstat
iostat
vmstat
free
ping
nicstat
dstat

Tools: Intermediate

sar
netstat
pidstat
strace
tcpdump
blktrace
iotop
slabtop
sysctl
/proc

Tools: Advanced

perf
- perf record program [program_option]
- perf report
DTrace
SystemTap
and more…
- ps
- pmap
- traceroute
- ntop
- ss
- lsof
- oprofile
- gprof
- kcachegrind
- valgrind
- google profiler
- nfsiostat
- cifsiostat
- latencytop
- powertop
- LLTng
- ktap

Reference

2014-09-07

tools

sync

Introduce

sync, linux command, using to write any data buffer in memory out to disk.

It do nothing, but exercise the ‘sync’ system call

So, if we want to study what the ‘sync’ really do, go in the kernel.

sync command

File: coreutils-8.9/src/sync.c

int
main (int argc, char **argv)
{
  initialize_main (&argc, &argv);
  set_program_name (argv[0]);
  setlocale (LC_ALL, "");
  bindtextdomain (PACKAGE, LOCALEDIR);
  textdomain (PACKAGE);

  atexit (close_stdout);

  parse_long_options (argc, argv, PROGRAM_NAME, PACKAGE, Version,
                      usage, AUTHORS, (char const *) NULL);
  if (getopt_long (argc, argv, "", NULL, NULL) != -1)
    usage (EXIT_FAILURE);

  if (optind < argc)
    error (0, 0, _("ignoring all arguments"));

  sync ();
  exit (EXIT_SUCCESS);
}

Do nothing, but do system call sync()

sync architecture in kernel

system call sync() is defined in fs/sync.c

/*
 * sync everything.  Start out by waking pdflush, because that writes back
 * all queues in parallel.
 */
SYSCALL_DEFINE0(sync)
{
    wakeup_flusher_threads(0);
    sync_filesystems(0);
    sync_filesystems(1);
    if (unlikely(laptop_mode))
        laptop_sync_completion();
    return 0;
}

sync call sync_filesystems

static void sync_filesystems(int wait)
{
    struct super_block *sb;
    static DEFINE_MUTEX(mutex);

    mutex_lock(&mutex);        /* Could be down_interruptible */
    spin_lock(&sb_lock);
    list_for_each_entry(sb, &super_blocks, s_list)
        sb->s_need_sync = 1;

restart:
    list_for_each_entry(sb, &super_blocks, s_list) {
        if (!sb->s_need_sync)
            continue;
        sb->s_need_sync = 0;
        sb->s_count++;
        spin_unlock(&sb_lock);

        down_read(&sb->s_umount);
        if (!(sb->s_flags & MS_RDONLY) && sb->s_root && sb->s_bdi)
            __sync_filesystem(sb, wait);
        up_read(&sb->s_umount);

        /* restart only when sb is no longer on the list */
        spin_lock(&sb_lock);
        if (__put_super_and_need_restart(sb))
            goto restart;
    }
    spin_unlock(&sb_lock);
    mutex_unlock(&mutex);
}

sync_filesystems call __sync_filesystem

static int __sync_filesystem(struct super_block *sb, int wait)
{
    /*
     * This should be safe, as we require bdi backing to actually
     * write out data in the first place
     */
    if (!sb->s_bdi)
        return 0;

    /* Avoid doing twice syncing and cache pruning for quota sync */
    if (!wait) {
        writeout_quota_sb(sb, -1);
        writeback_inodes_sb(sb);
    } else {
        sync_quota_sb(sb, -1);
        sync_inodes_sb(sb);
    }
    if (sb->s_op->sync_fs)
        sb->s_op->sync_fs(sb, wait);
    return __sync_blockdev(sb->s_bdev, wait);
}

__sync_filesystem call sb->s_op->sync_fs(sb, wait), each filesystem should
implement the sync_fs()

when to use

before shutdown computer, using sync more than twice
after write big file to disk, such as using command cp
to be continue

sync,fsync,fdatasync

sync, add the modified buffer to write_queue, then return immediately, not wait
for the reality disk writed.
fsync, used for single file with specify file description,and wait the disk write finish.
fdatasync, similar to fsync, but just do effect to the data part. fsync make
effect to data and file property.

special application

How if I want to sync from serval Virtual Disks, and ignore serval VDs
To be continue

Reference

man sync
info coreutils ‘sync invocation’
kernel code
coreutils-8.9
sync、fsync和fdatasync函数
缓存同步操作–sys_sync系统调用

2014-09-06

language

bash trap

From http://blog.charlee.li/bash-pitfalls/

感谢fcicq，他的new 30 days系列为我们带来了不少好文章。

今天想分析的是这篇Bash Pitfalls, 介绍了一些bash编程中的经典错误。fcicq说可能不适
合初学者，而我认为，正是bash编程的初学者才应该好好阅读一下这篇文章。

下面就逐个分析一下这篇文章中提到的错误。不是完全的翻译，有些没用的话就略过了，
有些地方则加了些注释。

1.for i in `ls *.mp3`

常见的错误写法：

for i in `ls *.mp3`; do     # Wrong!

为什么错误呢？因为for…in语句是按照空白来分词的，包含空格的文件名会被拆成多个词。
如遇到 01 - Don’t Eat the Yellow Snow.mp3 时，i的值会依次取 01，-，Don’t，等等。

用双引号也不行，它会将ls *.mp3的全部结果当成一个词来处理。

for i in "`ls *.mp3`"; do   # Wrong!

正确的写法是

for i in *.mp3; do

2.cp $file $target

这句话基本上正确，但同样有空格分词的问题。所以应当用双引号：

cp "$file" "$target"

但是如果凑巧文件名以 - 开头，这个文件名会被 cp 当作命令行选项来处理，依旧很头疼。
可以试试下面这个。

cp -- "$file" "$target"

运气差点的再碰上一个不支持 – 选项的系统，那只能用下面的方法了：使每个变量都以目录开头。

for i in ./*.mp3; do
  cp "$i" /target
  ...

3.[ $foo = "bar" ]

当$foo为空时，上面的命令就变成了

[ = "bar" ]

类似地，当$foo包含空格时：

[ multiple words here = "bar" ]

两者都会出错。所以应当用双引号将变量括起来：

[ "$foo" = bar ]      # 几乎完美了。

但是当$foo以 - 开头时依然会有问题。在较新的bash中你可以用下面的方法来代替，
[[ 关键字能正确处理空白、空格、带横线等问题。

[[ $foo = bar ]]      # 正确

旧版本bash中可以用这个技巧（虽然不好理解）：

[ x"$foo" = xbar ]    # 正确

或者干脆把变量放在右边，因为 [ 命令的等号右边即使是空白或是横线开头，依然能正常
工作。（Java编程风格中也有类似的做法，虽然目的不一样。）

[ bar = "$foo" ]      # 正确

4.cd `dirname “$f”`

同样也存在空格问题。那么加上引号吧。

cd "`dirname "$f"`"

问题来了，是不是写错了？由于双引号的嵌套，你会认为dirname 是第一个字符串，是
第二个字符串。错了，那是C语言。在bash中，命令替换（反引号``中的内容）里面的双引
号会被正确地匹配到一起，不用特意去转义。

$()语法也相同，如下面的写法是正确的。

cd "$(dirname "$f")"

5.[ "$foo" = bar && "$bar" = foo ]

[ 中不能使用 && 符号！因为 [ 的实质是 test 命令，&& 会把这一行分成两个命令的。应
该用以下的写法。

[ bar = "$foo" -a foo = "$bar" ]       # Right!
[ bar = "$foo" ] && [ foo = "$bar" ]   # Also right!
[[ $foo = bar && $bar = foo ]]         # Also right!

6.[ $foo > 7 ]

很可惜 [[ 只适用于字符串，不能做数字比较。数字比较应当这样写：

(( $foo > 7 ))

或者用经典的写法：

[ $foo -gt 7 ]

但上述使用 -gt 的写法有个问题，那就是当 $foo 不是数字时就会出错。你必须做好类型检验。

这样写也行。

[[ $foo -gt 7 ]]

7.grep foo bar | while read line; do ((count++)); done

这行代码数出bar文件中包含foo的行数，虽然很麻烦（等同于grep -c foo bar或者 grep foo bar | wc -l）。乍一看没有问题，但执行之后count变量却没有值。因为管道中的每
个命令都放到一个新的子shell中执行，所以子shell中定义的count变量无法传递出来。

8.if [grep foo myfile]

初学者常犯的错误，就是将 if 语句后面的 [ 当作if语法的一部分。实际上它是一个命令
，相当于 test 命令，而不是 if 语法。这一点C程序员特别应当注意。

if 会将 if 到 then 之间的所有命令的返回值当作判断条件。因此上面的语句应当写成

if grep foo myfile > /dev/null; then

9.if [bar="$foo"]

同样，[ 是个命令，不是 if 语句的一部分，所以要注意空格。

if [ bar = "$foo" ]

10.if [ [ a = b ] && [ c = d ] ]

同样的问题，[ 不是 if 语句的一部分，当然也不是改变逻辑判断的括号。它是一个命令。
可能C程序员比较容易犯这个错误？

if [ a = b ] && [ c = d ]        # 正确

11.cat file | sed s/foo/bar/ > file

你不能在同一条管道操作中同时读写一个文件。根据管道的实现方式，file要么被截断成0
字节，要么会无限增长直到填满整个硬盘。如果想改变原文件的内容，只能先将输出写到
临时文件中再用mv命令。

sed 's/foo/bar/g' file > tmpfile && mv tmpfile file

12.echo $foo

这句话还有什么错误码？一般来说是正确的，但下面的例子就有问题了。

MSG="Please enter a file name of the form *.zip"
echo $MSG         # 错误！

如果恰巧当前目录下有zip文件，就会显示成

Please enter a file name of the form freenfss.zip lw35nfss.zip

所以即使是echo也别忘记给变量加引号。

13.$foo=bar

变量赋值时无需加 $ 符号——这不是Perl或PHP。

14.foo = bar

变量赋值时等号两侧不能加空格——这不是C语言。

15.echo <<EOF

here document是个好东西，它可以输出成段的文字而不用加引号也不用考虑换行符的处理
问题。不过here document输出时应当使用cat而不是echo。

# This is wrong:
echo <<EOF
Hello world
EOF


# This is right:
cat <<EOF
Hello world
EOF

16.su -c 'some command'

原文的意思是，这条基本上正确，但使用者的目的是要将 -c ‘some command’ 传给shell。
而恰好 su 有个 -c 参数，所以su 只会将 ‘some command’ 传给shell。所以应该这么写：

su root -c 'some command'

但是在我的平台上，man su 的结果中关于 -c 的解释为

-c, --commmand=COMMAND
            pass a single COMMAND to the shell with -c

也就是说，-c ‘some command’ 同样会将 -c ‘some command’ 这样一个字符串传递给shell
，和这条就不符合了。不管怎样，先将这一条写在这里吧。

17.cd /foo; bar

cd有可能会出错，出错后 bar 命令就会在你预想不到的目录里执行了。所以一定要记得判断cd的返回值。

cd /foo && bar

如果你要根据cd的返回值执行多条命令，可以用 ||。

cd /foo || exit 1;
bar
baz

关于目录的一点题外话，假设你要在shell程序中频繁变换工作目录，如下面的代码：

find ... -type d | while read subdir; do
  cd "$subdir" && whatever && ... && cd -
done

不如这样写：

find ... -type d | while read subdir; do
  (cd "$subdir" && whatever && ...)
done

括号会强制启动一个子shell，这样在这个子shell中改变工作目录不会影响父shell（执行
这个脚本的shell），就可以省掉cd - 的麻烦。

你也可以灵活运用 pushd、popd、dirs 等命令来控制工作目录。

18.[ bar == "$foo" ]

[ 命令中不能用 ==，应当写成

[ bar = "$foo" ] && echo yes
[[ bar == $foo ]] && echo yes

19.for i in {1..10}; do ./something &; done

& 后面不应该再放 ; ，因为 & 已经起到了语句分隔符的作用，无需再用;。

for i in {1..10}; do ./something & done

20.cmd1 && cmd2 || cmd3

有人喜欢用这种格式来代替 if…then…else 结构，但其实并不完全一样。如果cmd2返回
一个非真值，那么cmd3则会被执行。所以还是老老实实地用 if cmd1; then cmd2; else
cmd3 为好。

21.UTF-8的BOM(Byte-Order Marks)问题

UTF-8编码可以在文件开头用几个字节来表示编码的字节顺序，这几个字节称为BOM。但Unix
格式的UTF-8编码不需要BOM。多余的BOM会影响shell解析，特别是开头的 #!/bin/sh 之类
的指令将会无法识别。

MS-DOS格式的换行符(CRLF)也存在同样的问题。如果你将shell程序保存成DOS格式，脚本就无法执行了。

$ ./dos
-bash: ./dos: /bin/sh^M: bad interpreter: No such file or directory

22.echo "Hello World!"

交互执行这条命令会产生以下的错误：

-bash: !": event not found

因为 !” 会被当作命令行历史替换的符号来处理。不过在shell脚本中没有这样的问题。

不幸的是，你无法使用转义符来转义!：

$ echo "hi\!"
hi\!

解决方案之一，使用单引号，即

$ echo 'Hello, world!'

如果你必须使用双引号，可以试试通过 set +H 来取消命令行历史替换。

set +H
echo "Hello, world!"

23.for arg in $*

$*表示所有命令行参数，所以你可能想这样写来逐个处理参数，但参数中包含空格时就会失败。如：

#!/bin/bash
# Incorrect version
for x in $*; do
  echo "parameter: '$x'"
done


$ ./myscript 'arg 1' arg2 arg3
parameter: 'arg'
parameter: '1'
parameter: 'arg2'
parameter: 'arg3'

正确的方法是使用 $@。

#!/bin/bash
# Correct version
for x in "$@"; do
  echo "parameter: '$x'"
done


$ ./myscript 'arg 1' arg2 arg3
parameter: 'arg 1'
parameter: 'arg2'
parameter: 'arg3'

在 bash 的手册中对 $* 和 $@ 的说明如下：

*    Expands to the positional parameters, starting from one.  
     When the expansion occurs within double quotes, it 
     expands to a single word with the value of each parameter 
     separated by the first character of the IFS special variable.  
     That is, "$*" is equivalent to "$1c$2c...",
@    Expands to the positional parameters, starting from one. 
     When the expansion occurs within double quotes, each 
     parameter expands to a separate word.  That  is,  "$@"  
     is equivalent to "$1" "$2" ...

可见，不加引号时 $ 和 $@ 是相同的，但$ 会被扩展成一个字符串，而 $@ 会被扩展成每一个参数。

24.function foo()

在bash中没有问题，但其他shell中有可能出错。不要把 function 和括号一起使用。最为
保险的做法是使用括号，即

foo() {
  ...
}

2014-09-02

others

mbr

Introduce

MBR, is abbreviate from Master Boot Record.

MBR(512Bytes) =

partition entry

0 1byte partition flag
1~3 3bytes CHS address
4 1bytes File system flag
5~7 3bytes CHS address
8~11 4bytes start of sector
12~15 4bytes count of sector

File system flag:
00H
01H FAT12
04H FAT16 small than 32MB
05H Extended
06H FAT16 large than 32MB
07H HPES or NTFS
0BH Windows95 FAT32
0CH Windows95 FAT32
0EH Windows95 FAT16
0FH Windows95 Extended(large than 8G)
82H Linux swap
83H Linux
85H Linux extended
86H NTFS volume set
87H NTFS volume set

A case

The current os is CentOS 6.4 X86_64, running on /dev/sdb, which is a SSD disk.

[root@localhost ~]# dd if=/dev/sdb of=/root/ssd.dat bs=1 count=512
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00146756 s, 349 kB/s
[root@localhost ~]# hexdump -C ssd.dat 
00000000  eb 48 90 d0 bc 00 7c fb  50 07 50 1f fc be 1b 7c  |.H....|.P.P....||
00000010  bf 1b 06 50 57 b9 e5 01  f3 a4 cb bd be 07 b1 04  |...PW...........|
00000020  38 6e 00 7c 09 75 13 83  c5 10 e2 f4 cd 18 8b f5  |8n.|.u..........|
00000030  83 c6 10 49 74 19 38 2c  74 f6 a0 b5 07 b4 03 02  |...It.8,t.......|
00000040  80 00 00 80 98 3e 04 00  00 08 fa 90 90 f6 c2 80  |.....>..........|
00000050  75 02 b2 80 ea 59 7c 00  00 31 c0 8e d8 8e d0 bc  |u....Y|..1......|
00000060  00 20 fb a0 40 7c 3c ff  74 02 88 c2 52 f6 c2 80  |. ..@|<.t...R...|
00000070  74 54 b4 41 bb aa 55 cd  13 5a 52 72 49 81 fb 55  |tT.A..U..ZRrI..U|
00000080  aa 75 43 a0 41 7c 84 c0  75 05 83 e1 01 74 37 66  |.uC.A|..u....t7f|
00000090  8b 4c 10 be 05 7c c6 44  ff 01 66 8b 1e 44 7c c7  |.L...|.D..f..D|.|
000000a0  04 10 00 c7 44 02 01 00  66 89 5c 08 c7 44 06 00  |....D...f.\..D..|
000000b0  70 66 31 c0 89 44 04 66  89 44 0c b4 42 cd 13 72  |pf1..D.f.D..B..r|
000000c0  05 bb 00 70 eb 7d b4 08  cd 13 73 0a f6 c2 80 0f  |...p.}....s.....|
000000d0  84 f0 00 e9 8d 00 be 05  7c c6 44 ff 00 66 31 c0  |........|.D..f1.|
000000e0  88 f0 40 66 89 44 04 31  d2 88 ca c1 e2 02 88 e8  |..@f.D.1........|
000000f0  88 f4 40 89 44 08 31 c0  88 d0 c0 e8 02 66 89 04  |..@.D.1......f..|
00000100  66 a1 44 7c 66 31 d2 66  f7 34 88 54 0a 66 31 d2  |f.D|f1.f.4.T.f1.|
00000110  66 f7 74 04 88 54 0b 89  44 0c 3b 44 08 7d 3c 8a  |f.t..T..D.;D.}<.|
00000120  54 0d c0 e2 06 8a 4c 0a  fe c1 08 d1 8a 6c 0c 5a  |T.....L......l.Z|
00000130  8a 74 0b bb 00 70 8e c3  31 db b8 01 02 cd 13 72  |.t...p..1......r|
00000140  2a 8c c3 8e 06 48 7c 60  1e b9 00 01 8e db 31 f6  |*....H|`......1.|
00000150  31 ff fc f3 a5 1f 61 ff  26 42 7c be 7f 7d e8 40  |1.....a.&B|..}.@|
00000160  00 eb 0e be 84 7d e8 38  00 eb 06 be 8e 7d e8 30  |.....}.8.....}.0|
00000170  00 be 93 7d e8 2a 00 eb  fe 47 52 55 42 20 00 47  |...}.*...GRUB .G|
00000180  65 6f 6d 00 48 61 72 64  20 44 69 73 6b 00 52 65  |eom.Hard Disk.Re|
00000190  61 64 00 20 45 72 72 6f  72 00 bb 01 00 b4 0e cd  |ad. Error.......|
000001a0  10 ac 3c 00 75 f4 c3 00  00 00 00 00 00 00 00 00  |..<.u...........|
000001b0  00 00 00 00 00 00 00 00  a9 51 05 00 00 00 80 20  |.........Q..... |
000001c0  21 00 83 65 24 41 00 08  00 00 00 00 10 00 00 65  |!..e$A.........e|
000001d0  25 41 82 90 85 4b 00 08  10 00 00 00 80 00 00 90  |%A...K..........|
000001e0  86 4b 83 fe ff ff 00 08  90 00 00 20 2a 03 00 00  |.K......... *...|
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200

/dev/sda is a virtual disk of raid1(raid with two disks);
boot.disk, which is backup with dd if=/dev/sda1 of=/root/backup/boot.disk,
not had mbr info!

[root@localhost ~]# dd if=/root/backup/boot.disk of=/root/b.dat bs=1 count=512
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.0019702 s, 260 kB/s
[root@localhost ~]# hexdump -C b.dat 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200

The new raid1 logic disk not had any mbr info, what I had done before was:
dd if=/root/backup/boot.disk of=/dev/sda1
dd if=/root/backup/root.disk of=/dev/sda2

[root@localhost ~]# dd if=/dev/sda of=/root/raid1.dat bs=1 count=512
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00175408 s, 292 kB/s
[root@localhost ~]# hexdump -C raid1.dat 
00000000  fa b8 00 10 8e d0 bc 00  b0 b8 00 00 8e d8 8e c0  |................|
00000010  fb be 00 7c bf 00 06 b9  00 02 f3 a4 ea 21 06 00  |...|.........!..|
00000020  00 be be 07 38 04 75 0b  83 c6 10 81 fe fe 07 75  |....8.u........u|
00000030  f3 eb 16 b4 02 b0 01 bb  00 7c b2 80 8a 74 01 8b  |.........|...t..|
00000040  4c 02 cd 13 ea 00 7c 00  00 eb fe 00 00 00 00 00  |L.....|.........|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001b0  00 00 00 00 00 00 00 00  c2 02 0b 00 00 00 80 20  |............... |
000001c0  21 00 83 9f 06 19 00 08  00 00 00 40 06 00 00 9f  |!..........@....|
000001d0  07 19 83 fe ff ff 00 48  06 00 00 00 00 01 00 fe  |.......H........|
000001e0  ff ff 82 fe ff ff 00 48  06 01 00 20 80 00 00 00  |.......H... ....|
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200

With such virtual disk, I can’t boot to run OS normal, it stop with an flashing
underline, if normal it should show a grub to boot the os.

So, I think, when I backup the OS, I should backup the MBR also, when restore I
should restore it to new disk.

Below, I using DVD to write the MBR to /dev/sda, then it show correct 512B info

[root@localhost ~]# dd if=/dev/sda of=/root/raid1-ok.dat bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000300859 s, 1.7 MB/s
[root@localhost ~]# hexdump -C raid1-ok.dat 
00000000  eb 48 90 10 8e d0 bc 00  b0 b8 00 00 8e d8 8e c0  |.H..............|
00000010  fb be 00 7c bf 00 06 b9  00 02 f3 a4 ea 21 06 00  |...|.........!..|
00000020  00 be be 07 38 04 75 0b  83 c6 10 81 fe fe 07 75  |....8.u........u|
00000030  f3 eb 16 b4 02 b0 01 bb  00 7c b2 80 8a 74 03 02  |.........|...t..|
00000040  80 00 00 80 7a 61 00 00  00 08 fa 90 90 f6 c2 80  |....za..........|
00000050  75 02 b2 80 ea 59 7c 00  00 31 c0 8e d8 8e d0 bc  |u....Y|..1......|
00000060  00 20 fb a0 40 7c 3c ff  74 02 88 c2 52 f6 c2 80  |. ..@|<.t...R...|
00000070  74 54 b4 41 bb aa 55 cd  13 5a 52 72 49 81 fb 55  |tT.A..U..ZRrI..U|
00000080  aa 75 43 a0 41 7c 84 c0  75 05 83 e1 01 74 37 66  |.uC.A|..u....t7f|
00000090  8b 4c 10 be 05 7c c6 44  ff 01 66 8b 1e 44 7c c7  |.L...|.D..f..D|.|
000000a0  04 10 00 c7 44 02 01 00  66 89 5c 08 c7 44 06 00  |....D...f.\..D..|
000000b0  70 66 31 c0 89 44 04 66  89 44 0c b4 42 cd 13 72  |pf1..D.f.D..B..r|
000000c0  05 bb 00 70 eb 7d b4 08  cd 13 73 0a f6 c2 80 0f  |...p.}....s.....|
000000d0  84 f0 00 e9 8d 00 be 05  7c c6 44 ff 00 66 31 c0  |........|.D..f1.|
000000e0  88 f0 40 66 89 44 04 31  d2 88 ca c1 e2 02 88 e8  |..@f.D.1........|
000000f0  88 f4 40 89 44 08 31 c0  88 d0 c0 e8 02 66 89 04  |..@.D.1......f..|
00000100  66 a1 44 7c 66 31 d2 66  f7 34 88 54 0a 66 31 d2  |f.D|f1.f.4.T.f1.|
00000110  66 f7 74 04 88 54 0b 89  44 0c 3b 44 08 7d 3c 8a  |f.t..T..D.;D.}<.|
00000120  54 0d c0 e2 06 8a 4c 0a  fe c1 08 d1 8a 6c 0c 5a  |T.....L......l.Z|
00000130  8a 74 0b bb 00 70 8e c3  31 db b8 01 02 cd 13 72  |.t...p..1......r|
00000140  2a 8c c3 8e 06 48 7c 60  1e b9 00 01 8e db 31 f6  |*....H|`......1.|
00000150  31 ff fc f3 a5 1f 61 ff  26 42 7c be 7f 7d e8 40  |1.....a.&B|..}.@|
00000160  00 eb 0e be 84 7d e8 38  00 eb 06 be 8e 7d e8 30  |.....}.8.....}.0|
00000170  00 be 93 7d e8 2a 00 eb  fe 47 52 55 42 20 00 47  |...}.*...GRUB .G|
00000180  65 6f 6d 00 48 61 72 64  20 44 69 73 6b 00 52 65  |eom.Hard Disk.Re|
00000190  61 64 00 20 45 72 72 6f  72 00 bb 01 00 b4 0e cd  |ad. Error.......|
000001a0  10 ac 3c 00 75 f4 c3 00  00 00 00 00 00 00 00 00  |..<.u...........|
000001b0  00 00 00 00 00 00 00 00  c2 02 0b 00 00 00 80 20  |............... |
000001c0  21 00 83 9f 06 19 00 08  00 00 00 40 06 00 00 9f  |!..........@....|
000001d0  07 19 83 fe ff ff 00 48  06 00 00 00 00 01 00 fe  |.......H........|
000001e0  ff ff 82 fe ff ff 00 48  06 01 00 20 80 00 00 fe  |.......H... ....|
000001f0  ff ff 05 fe ff ff 00 68  86 01 00 d0 2d e7 55 aa  |.......h....-.U.|
00000200

Questions

Why maximum of partitions is 4 primary or 3 primary with another extended?
Because for MBR, there only 64 bytes for partition, and each partition used 16
bytes.

Reference

2014-08-22

tools

grep

syntax

grep [OPTIONS] PATTERN [FILE…]
grep [OPTIONS] [-e PATTERN | -f FILE] [FILE…]
- -A get the above line with match lines
- -B get the below line with match lines
- -c count the lines
- -E regular expression
- -i ignore case
- –include=”*.c” only search c files
- -l file with matches
- -L file without matches
- -n line number
- -r recursive
- -w match whole word
- -v invert match
regular expression
- [1-4] match digits from 1 to 4
- [^1-4] exclude 1 to 4
- abc|123 contain abc or 123
POSIX:
- [:alpha:] Any alphabetical character, regardless of case
- [:digit:] Any numerical character
- [:alnum:] Any alphabetical or numerical character
- [:blank:] Space or tab characters
- [:xdigit:] Hexadecimal characters; any number or A–F or a–f
- [:punct:] Any punctuation symbol
- [:print:] Any printable character (not control characters)
- [:space:] Any whitespace character
- [:graph:] Exclude whitespace characters
- [:upper:] Any uppercase letter
- [:lower:] Any lowercase letter
- [:cntrl:] Control characters

skills

count the selected lines, we can do as: dmesg |grep -i error |wc -l, or like
this dmesg |grep -i error -c
search by match whole word dmesg |grep -w 'error'
grep -rl --include="*.c" MSG_QUEUE ./
select only the lines containing six, seven or eight several times.
grep -E "(six|seven|eight).*\1" test.txt
search contain TAB key, using [CTRL+V][TAB] grep " 504 " test.txt
grep -r "abc" /root/source
grep -r --include "*.h" "date" path
grep -m 1 "model name" /proc/cpuinfo only display the first match line
grep -i -E "abc|123" match abc or 123, -i ignore, -E extended regular expression.
grep -r -l "main" . search all files under each directory, -l files-with-match, -L files-withou-match
grep -w "linux" *.md match word
grep -rl --include=*.{h,cpp} "socket" . search socket only with .h .cpp files
grep --exclude-dir="_posts" --exclude-dir="_site" -r "Dennis" ./ exclude specify directory

Reference

What is grep and how do we use it

2014-08-22

tools

awk

Basic

build-in variables
- $0 当前记录（这个变量中存放着整个行的内容）
- $1~$n 当前记录的第n个字段，字段间由FS分隔
- FS 输入字段分隔符默认是空格或Tab
- NF 当前记录中的字段个数，就是有多少列
- NR 已经读出的记录数，就是行号，从1开始，如果有多个文件话，这个值也是不断累加中。
- FNR 当前记录数，与NR不同的是，这个值会是各个文件自己的行号
- RS 输入的记录分隔符，默认为换行符
- OFS 输出字段分隔符，默认也是空格
- ORS 输出的记录分隔符，默认为换行符
- FILENAME 当前输入文件的名字

reg

[:alnum:] 字母数字字符 [:alpha:] 字母字符

Skills

awk FS'' 'condition1{operator1}condition2{operator2}...' filename
awk 'NR==1{print $0}'
arr=($(awk -F'#' '{print $1,$2,$3,$4}' $conf_file))
ps aux | awk 'NR==1{print $0}$3>10{print $0}'
awk -F'<|>' '{if(NF>3){print $2 ":" $3}}' /tmp/test.xml parse xml file
awk -F'=' '/HWADDR/{print $2}' /etc/sysconfig/network-scripts/ifcfg-eth0
ip -o link show eth0 |awk '{ print toupper(gensub(/.*link\/[^ ]* ([[:alnum:]:]*).*/,"\\1", 1)); }'
awk 'END{print $0}' /root/test.log print the last line
lvdisplay |awk '/LV Name/{n=$3} /Block device/{d=$3; sub(".*:","dm-",d); print d,n;}'
print the mapping relationship of dm-*
statistics

count the total size of all c files ls -l src/*.c |awk '{sum+=$5} END {print sum}'
BEGIN, END

the origin content:

$ cat score.txt
Marry   2143 78 84 77
Jack    2321 66 78 45
Tom     2122 48 77 71
Mike    2537 87 97 95
Bob     2415 40 57 62

the awk script:

$ cat cal.awk
#!/bin/awk -f
#运行前
BEGIN {
    math = 0
    english = 0
    computer = 0

    printf "NAME    NO.   MATH  ENGLISH  COMPUTER   TOTAL\n"
    printf "---------------------------------------------\n"
}
#运行中
{
    math+=$3
    english+=$4
    computer+=$5
    printf "%-6s %-6s %4d %8d %8d %8d\n", $1, $2, $3,$4,$5, $3+$4+$5
}
#运行后
END {
    printf "---------------------------------------------\n"
    printf "  TOTAL:%10d %8d %8d \n", math, english, computer
    printf "AVERAGE:%10.2f %8.2f %8.2f\n", math/NR, english/NR, computer/NR
}

execute the script:

$ awk -f cal.awk score.txt
NAME    NO.   MATH  ENGLISH  COMPUTER   TOTAL
---------------------------------------------
Marry  2143     78       84       77      239
Jack   2321     66       78       45      189
Tom    2122     48       77       71      196
Mike   2537     87       97       95      279
Bob    2415     40       57       62      159
---------------------------------------------
  TOTAL:       319      393      350
AVERAGE:     63.80    78.60    70.00

ls -l |awk ‘$6 == “Dec”‘
netstat -apn |grep 3260 |awk '{if($2>0||$3>0){print $0}}' print lines if Recv-Q
or Send-Q large than 0
cat /tmp/.nic |awk '{print $1,$2}'|awk '{aa[$1]=aa[$1]","$2;asorti(aa,tA);}END{for(i in tA)print aa[tA[i]]}'|sed 's/^,//'
echo -e "12345\na25\nt123" | awk '/a25/{print a;}{a=$0}' print the previous line of match line

For more

Reference

2014-08-22

tools

sed

Basic

Syntax
- -i edit file
- -f run script from file, e.g: sed -f script-file test.txt
- -n only print the match line
command
- N, add the next line to current buffer for operation
- a, append, using for add line
- i, insert, using for add line
- d, delete, using for delete line
- p, print
- : b t, flow control, “:“ is a label, “b“ means branch and “t“ is test
- H,h,G,g,x: put patter space content to storage space
regular express
- ^ 表示一行的开头。如：/^#/ 以#开头的匹配。
- $ 表示一行的结尾。如：/}$/ 以}结尾的匹配。
- \< 表示词首。如 \<abc 表示以 abc 为首的詞。
- > 表示词尾。如 abc> 表示以 abc 結尾的詞。
- . 表示任何单个字符。
- - 表示某个字符出现了0次或多次。
- [ ] 字符集合。如：[abc]表示匹配a或b或c，还有[a-zA-Z]表示匹配所有的26个字符。如果其中有^表示反，如[^a]表示非a的字符

Skills

sed [options] 'command' file(s)
sed [options] -f scriptfile file(s)
sed -i '/'$prj'/{s/$.*#.*#$[0-9]\+/\1'$rev_new'/}' $conf_file, replace revision with new revision
sed -n 10,23p filepath, -n print the specific lines
sed 'N;s/\n/ /' filepath, join two line into one
echo -e "11\n22\n33\n" | sed -n '/22/{n;p}', print next line after match
sed ':a;N;s/\n/ /;ba;' file, join all lines (set label a, put the next to
operate buffer, substitute \n to space symbol, goto lable a, job continue.

sample

1. delete tags from html, using sed 's/<[^>]*//g' test.html
  
  This is what I meant. Understand?

Reference

2014-08-09

storage

optimal file system with tune2fs

view filesystem information

dennis@dennis:~$ sudo su
[sudo] password for dennis: 
root@dennis:/home/dennis# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000ee377

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *     1026048     1230847      102400    7  HPFS/NTFS/exFAT
/dev/sda2         1230848   210741247   104755200    7  HPFS/NTFS/exFAT
/dev/sda3       210741248   860366847   324812800    7  HPFS/NTFS/exFAT
/dev/sda4       860368894  1953523711   546577409    5  Extended
Partition 4 does not start on physical sector boundary.
/dev/sda5       860368896  1937033215   538332160   83  Linux
/dev/sda6      1937035264  1953523711     8244224   82  Linux swap / Solaris
root@dennis:/home/dennis# tune2fs -l /dev/sda1
tune2fs 1.42 (29-Nov-2011)
tune2fs: Bad magic number in super-block while trying to open /dev/sda1
Couldn't find valid filesystem superblock.
root@dennis:/home/dennis# tune2fs -l /dev/sda6
tune2fs 1.42 (29-Nov-2011)
tune2fs: Bad magic number in super-block while trying to open /dev/sda6
Couldn't find valid filesystem superblock.
root@dennis:/home/dennis# tune2fs -l /dev/sda5
tune2fs 1.42 (29-Nov-2011)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          cab5dcf8-904f-4f53-a2f8-25276e38f84d
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              33652736
Block count:              134583040
Reserved block count:     6729152
Free blocks:              112706967
Free inodes:              33202652
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      991
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Thu Mar 27 19:19:17 2014
Last mount time:          Sat Aug  9 13:51:31 2014
Last write time:          Wed Apr  9 11:19:12 2014
Mount count:              94
Maximum mount count:      -1
Last checked:             Thu Mar 27 19:19:17 2014
Check interval:           0 (<none>)
Lifetime writes:          174 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       18089038
Default directory hash:   half_md4
Directory Hash Seed:      d4c17cf0-bb57-4423-81ab-a8fc3df1bc13
Journal backup:           inode blocks

Terminology

The block size is the unit of work for the file system. Every read and write is
done in full multiples of the block size. The block size is also the smallest
size on disk a file can have.

Optimal

1.Inode block

2.Reserved block

Mkfs.ext3 –b 4096 - i 8192 –m 2 /dev/sda5

3.tune2fs

dennis@dennis:~$ tune2fs
tune2fs 1.42 (29-Nov-2011)
Usage: tune2fs [-c max_mounts_count] [-e errors_behavior] [-g group]
    [-i interval[d|m|w]] [-j] [-J journal_options] [-l]
    [-m reserved_blocks_percent] [-o [^]mount_options[,...]] [-p mmp_update_interval]
    [-r reserved_blocks_count] [-u user] [-C mount_count] [-L volume_label]
    [-M last_mounted_dir] [-O [^]feature[,...]]
    [-E extended-option[,...]] [-T last_check_time] [-U UUID]
    [ -I new_inode_size ] device

use tune2fs -l /dev/sda5 to view filesystem information

use tune2fs -c -1 /dev/sda5 to avoid selfcheck

use tune2fs -c -1 -i 0 /dev/sda5 to set the interval time

use tune2fs -m 3 /dev/sda5 to set the reserved_blocks_percent

Reference

优化linux文件系统提高读写速度

2014-08-08

program

speed for getting info from pipe and proc system

Test code

compile with gcc -Wall -o test test.c

The include file time_test.h, can be got frome
https://github.com/matrix207/C/blob/master/util/time_test.h

#include <stdio.h>  
#include <stdlib.h>  
#include <string.h>  
#include <pthread.h>
#include <sys/types.h>  
#include <sys/stat.h>  
#include <fcntl.h>  
#include <unistd.h>  
#include <signal.h>  
#include <sys/socket.h>
#include <sys/wait.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <net/if.h>
#include <sys/ioctl.h>
#include "time_test.h"

int exec_cmd(const char *cmd, char *result, const int size)
{
    FILE *fp = NULL;

    if (cmd == NULL) {
        printf("cmd can not be NULL\n");
        return 1;
    }

    if (NULL == (fp = popen(cmd, "r"))) {
        fprintf(stdout, "Fail to do %s\n", cmd);
        return 1;
    }

    memset(result, 0, size);
    printf("-------1-------\n");
    fgets(result, size, fp);
    printf("-------2-------\n");

    pclose(fp);
    return 0;
}

void get_from_proc(char *interface)
{
    int sockfd;
    struct ifreq ifr;
    char mac_addr[30]={0};

    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if( sockfd == -1) {
        perror("create socket falise...mac\n");
        return; 
    }

    memset(&ifr,0,sizeof(ifr));
    strncpy(ifr.ifr_name, interface, sizeof(ifr.ifr_name)-1);

    if ((ioctl(sockfd, SIOCGIFHWADDR, &ifr)) < 0) {
        printf("mac ioctl error\n");
        return;
    }

    snprintf(mac_addr, sizeof(mac_addr), "%02x%02x%02x%02x%02x%02x",  
            (unsigned char)ifr.ifr_hwaddr.sa_data[0],
            (unsigned char)ifr.ifr_hwaddr.sa_data[1],
            (unsigned char)ifr.ifr_hwaddr.sa_data[2],
            (unsigned char)ifr.ifr_hwaddr.sa_data[3],
            (unsigned char)ifr.ifr_hwaddr.sa_data[4],
            (unsigned char)ifr.ifr_hwaddr.sa_data[5]);

    printf("local mac:%s \n",mac_addr);

    char *address;
    struct sockaddr_in *addr;
    if (ioctl(sockfd,SIOCGIFADDR,&ifr) == -1)
        perror("ioctl error"),exit(1);
    addr = (struct sockaddr_in *)&(ifr.ifr_addr);
    address = inet_ntoa(addr->sin_addr);
    printf("inet addr: %s \n",address);

    close(sockfd);
}

void get_from_pipe(char *interface)
{
    char cmd[100] = {0};
    /* char *format="ifconfig %s |grep %s |awk '{print $5}'"; */
    char *format="ifconfig %s |grep ether |awk '{print $2}'";
    snprintf(cmd, sizeof(cmd), format, interface, interface); 
    char result[150] = {0};
    if (0 == exec_cmd(cmd, result, sizeof(result))) {
        printf("mac is %s\n", result);
    } else {
        printf("Fail\n");
    }
}

void time_test(int argc,char **argv)
{
    /* XXX: change this for your computer */
    char *interface = "p4p1";
    if (argc == 2)
        interface = argv[1];

    TIME_START();
    printf("---------from proc system---------\n");
    get_from_proc(interface);
    TIME_END();

    TIME_START();
    printf("---------from pipe---------\n");
    get_from_pipe(interface);
    TIME_END();
}

int main(int argc,char **argv)
{
    time_test(argc, argv);
    return 0;
}

Test detail

Using strace to log system calling time (output to file 1.log)

strace -tT -o 1.log ./test

The log info we care list as below:

11:00:47 write(1, "---------from proc system-------"..., 35) = 35 <0.000030>
11:00:47 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 <0.000049>
11:00:47 ioctl(3, SIOCGIFHWADDR, {ifr_name="p4p1", ifr_hwaddr=00:23:ae:96:d6:9a}) = 0 <0.000012>
11:00:47 write(1, "local mac:0023ae96d69a \n", 24) = 24 <0.000018>
11:00:47 ioctl(3, SIOCGIFADDR, {ifr_name="p4p1", ifr_addr={AF_INET, inet_addr("172.16.50.39")}}) = 0 <0.000013>
11:00:47 write(1, "inet addr: 172.16.50.39 \n", 25) = 25 <0.000021>
11:00:47 close(3)                       = 0 <0.000019>
11:00:47 write(1, "Elapsed time 471 usec\n", 22) = 22 <0.000017>
11:00:47 write(1, "---------from pipe---------\n", 28) = 28 <0.000018>
11:00:47 brk(0)                         = 0x1af3000 <0.000009>
11:00:47 brk(0x1b14000)                 = 0x1b14000 <0.000010>
11:00:47 brk(0)                         = 0x1b14000 <0.000007>
11:00:47 pipe2([3, 4], O_CLOEXEC)       = 0 <0.000014>
11:00:47 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6bff5eaa10) = 7658 <0.000077>
11:00:47 close(4)                       = 0 <0.000008>
11:00:47 fcntl(3, F_SETFD, 0)           = 0 <0.000007>
11:00:47 write(1, "-------1-------\n", 16) = 16 <0.000020>
11:00:47 fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.001732>
11:00:47 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6bff603000 <0.000016>
11:00:47 read(3, "00:23:ae:96:d6:9a\n", 4096) = 18 <0.003692>
11:00:47 write(1, "-------2-------\n", 16) = 16 <0.000170>
11:00:47 close(3)                       = 0 <0.000009>
11:00:47 wait4(7658, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 7658 <0.000043>
11:00:47 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=7658, si_status=0, si_utime=0, si_stime=0} ---
11:00:47 munmap(0x7f6bff603000, 4096)   = 0 <0.000018>
11:00:47 write(1, "mac is 00:23:ae:96:d6:9a\n", 25) = 25 <0.000018>
11:00:47 write(1, "\n", 1)              = 1 <0.000014>
11:00:47 write(1, "Elapsed time 6639 usec\n", 23) = 23 <0.000022>

find the max 10 time by awk '{print $NF}' 1.log |sort |tail

<0.000021>
<0.000022>
<0.000030>
<0.000043>
<0.000049>
<0.000077>
<0.000164>
<0.000170>
<0.001732>
<0.003692>

then find the time killer grep -E "001732|003692" 1.log

11:00:47 fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.001732>
11:00:47 read(3, "00:23:ae:96:d6:9a\n", 4096) = 18 <0.003692>

so, we know that the killer code is

fgets(result, size, fp);

If we using much code for our programming, and time is what we should care about,
then avoid pipe code as possible as we can.

TODO

Is each system call have fixed time?(ofcourse, there should be a litter
different for each testing)
Sort the time elapse for all the system call

Reference

2014-08-07

storage

MegaCli Common Command and Procedures

From MegaCli Common Commands and Procedures

Introduction

MegaCli commands have presented a number of questions among our users for Cisco’s
Physical Security. Here is an attempt to explain thier meaning and uses.

General Parameters

Adapter parameter -aN

The parameter -aN (where N is a number starting with zero or the string ALL)
specifies the adapter ID. If you have only one controller it’s safe to use ALL
instead of a specific ID, but you’re encouraged to use the ID for everything
that makes changes to your RAID configuration.

Physical drive parameter      -PhysDrv [E:S]

For commands that operate on one or more pysical drives, the -PhysDrv [E:S]
parameter is used, where E is the enclosure device ID in which the drive resides
and S the slot number (starting with zero). You can get the enclosure device ID
using MegaCli -EncInfo -aALL. The E:S syntax is also used for specifying the
physical drives when creating a new RAID virtual drive (see 5).

Virtual drive parameter -Lx

The parameter -Lx is used for specifying the virtual drive (where x is a number
starting with zero or the string all).

Running the executable can be accomplished by:

shell> /opt/MegaRAID/MegaCli/MegaCli <cmd>

shell> cd /opt/MegaRAID/MegaCli
shell> ./MegaCli <cmd>

Gather information

Controller information

MegaCli -AdpAllInfo -aALL
MegaCli -CfgDsply -aALL
MegaCli -adpeventlog -getevents -f lsi-events.log -a0 -nolog

Enclosure information

MegaCli -EncInfo -aALL

Virtual drive information

MegaCli -LDInfo -Lall -aALL

Physical drive information

MegaCli -PDList -aALL

MegaCli -PDInfo -PhysDrv [E:S] -aALL

Battery backup information (Cisco MSPs do not have the battery backup unit
installed, but in case yours has one)

MegaCli -AdpBbuCmd -aALL

Check Battery backup warning on boot. If this is enabled on an MSP, it will
require manual intervention every time the system boots

MegaCli -AdpGetProp BatWarnDsbl -a0

Controller management

Silence active alarm

MegaCli -AdpSetProp AlarmSilence -aALL

Disable alarm

MegaCli -AdpSetProp AlarmDsbl -aALL

Enable alarm

MegaCli -AdpSetProp AlarmEnbl -aALL

Disable battery backup warning on system boot

MegaCli -AdpSetProp BatWarnDsbl -a0

Change the adapter rebuild rate to 60%:

MegaCli -AdpSetProp {RebuildRate -60} -aALL

Virtual drive management

Create RAID 0, 1, 5 drive

MegaCli -CfgLdAdd -r(0|1|5) [E:S, E:S, ...] -aN

Create RAID 10 drive

MegaCli -CfgSpanAdd -r10 -Array0[E:S,E:S] -Array1[E:S,E:S] -aN

Remove drive

MegaCli -CfgLdDel -Lx -aN

Physical drive management

Set state to offline

MegaCli -PDOffline -PhysDrv [E:S] -aN

Set state to online

MegaCli -PDOnline -PhysDrv [E:S] -aN

Mark as missing

MegaCli -PDMarkMissing -PhysDrv [E:S] -aN

Prepare for removal

MegaCli -PdPrpRmv -PhysDrv [E:S] -aN

Replace missing drive

MegaCli -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN -aN

The number N of the array parameter is the Span Reference you get using
MegaCli -CfgDsply -aALL and the number N of the row parameter is the
Physical Disk in that span or array starting with zero (it’s not the physical
disk’s slot!).

Rebuild drive - Drive status should be “Firmware state: Rebuild”

MegaCli -PDRbld -Start -PhysDrv [E:S] -aN
MegaCli -PDRbld -Stop -PhysDrv [E:S] -aN
MegaCli -PDRbld -ShowProg -PhysDrv [E:S] -aN     
MegaCli -PDRbld -ProgDsply -physdrv [E:S] -aN

Clear drive

MegaCli -PDClear -Start -PhysDrv [E:S] -aN
MegaCli -PDClear -Stop -PhysDrv [E:S] -aN
MegaCli -PDClear -ShowProg -PhysDrv [E:S] -aN

Bad to good

MegaCli -PDMakeGood -PhysDrv[E:S] -aN

Changes drive in state Unconfigured-Bad to Unconfigured-Good.

Hot spare management

Set global hot spare

MegaCli -PDHSP -Set -PhysDrv [E:S] -aN

Remove hot spare

MegaCli -PDHSP -Rmv -PhysDrv [E:S] -aN

Set dedicated hot spare

MegaCli -PDHSP -Set -Dedicated -ArrayN,M,... -PhysDrv [E:S] -aN

Walkthrough: Rebuild a Drive that is marked ‘Foreign’ when Inserted:

Bad to good

MegaCli -PDMakeGood -PhysDrv [E:S] -aALL
Clear the foreign setting

MegaCli -CfgForeign -Clear -aALL
Set global hot spare

MegaCli -PDHSP -Set -PhysDrv [E:S] -aN

Walkthrough: Change/replace a drive

1.Set the drive offline, if it is not already offline due to an error

MegaCli -PDOffline -PhysDrv [E:S] -aN

2.Mark the drive as missing

MegaCli -PDMarkMissing -PhysDrv [E:S] -aN

3.Prepare drive for removal

MegaCli -PDPrpRmv -PhysDrv [E:S] -aN

4.Change/replace the drive

5.If you’re using hot spares then the replaced drive should become your new hot spare drive

MegaCli -PDHSP -Set -PhysDrv [E:S] -aN

6.In case you’re not working with hot spares, you must re-add the new drive to
your RAID virtual drive and start the rebuilding

MegaCli -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN -aN
MegaCli -PDRbld -Start -PhysDrv [E:S] -aN

Gathering Standard logs

On every instance of a hard drive problem with an MSP server, we need to run the
following commands to have any information about the problem:

shell> rm –f MegaSAS.log

shell> /opt/MegaRAID/MegaCli/MegaCli -adpallinfo -a0

shell> /opt/MegaRAID/MegaCli/MegaCli -encinfo -a0

shell> /opt/MegaRAID/MegaCli/MegaCli -ldinfo -lall -a0

shell> /opt/MegaRAID/MegaCli/MegaCli -pdlist -a0

shell> /opt/MegaRAID/MegaCli/MegaCli -adpeventlog -getevents -f lsi-events.log -a0 -nolog

shell> /opt/MegaRAID/MegaCli/MegaCli -fwtermlog -dsply -a0 -nolog > lsi-fwterm.log

Collect the MegaSAS.log, lsi-events.log, and the lsi-fwterm.log files from the
directory where the commands are run (they can be run from any directory on the
MSP server) and attach the logs to the service request. You may use a program
such as WinSCP (freeware) to pull the files off of the server.