2016-04-30

Work diary on 2015

2015-11-15

study plugin architecture

2015-11-05

Handle DTS
Fixed vimperator 3.9 not work on firefox 41.
- download source from https://github.com/vimperator/vimperator-labs
- unzip and do make
- open FireFox, install add-on from file, select vimperator-3.10.1.xpi

2015-11-01

learn how to modify hosts file to surf on the net

2015-10-31

TODO: do monthly summary
extract version info from object
check all regression test case
check ST an UT

2015-10-26

add more unit test for data model module
gtest: test assert statement, EXPECT_DEATH(function_call, "")
thinking: how the S&N module implementation

2015-10-25

And from the redis book, I have an idea that create structure image by graphiviz
next time for my design of feature story.
Maybe I will write something about VIM by phinx in some time, and maybe for company project also.
Study an example write book with phinx, see https://github.com/huangz1990/redisbook/
Study Phinx, a tool that make it easy to create intelligent and beautiful documentation.
Study Redis from https://github.com/huangz1990/redisbook/
- The SDS type give me a method to slove memory issue for company project

2015-10-24

try to write more unit test for my own module.
try to write more unit test for my own module.
check DTS for iterator release.

2015-10-23

Handle four DTS.
Fix ST bug for can’t stop after start.
Fix a bug cause by my logic error of the submit code.
Rollback my submit code, to fix two issue which cause for another wrong fix.
How to mock memory allocate failure? Using macro to re-defined malloc is a possible way.

2015-10-09

vim, get current value of setting, set dictionary?
think about version manager
download and build Lua, Readline

2015-10-08

using spf13-vim
- http://blog.misilences.com/vim/2015/05/03/vim-accelerator-key
- http://www.cnblogs.com/zhea55/archive/2012/07/19/2598892.html
- set scrolloff=0
- let g:spf13_no_fastTabs=1 allow H to screen TOP and L to screen bottom
- TODO : support dictionary
install vim 7.4 by source code
- download v7.4.891 from https://github.com/vim/vim/releases
- make, if require terminal library “ncurses”, do yum install ncurses-devel
- make install
- compile vim support lua http://blog.angluca.com/post/69566488641/编译vim和macvim带python和lua支持
  - ./config --with-features=huge --enable-luainterp --with-lua-prefix=/usr/local
  - ./src/vim --version |grep lua, result is -lua, but expect +lua/dyn
- ./configure --prefix=/usr --with-features=huge --enable-rubyinterp --enable-pythoninterp --enable-luainterp --with-lua-prefix=/usr/local
make plan of 10 month
- AC some OJ
- schema version manager

2015-07-25

flex
- 词法分析器生成工具flex的简单使用

2015-07-23

yacc lex

2015-07-22

编写schema解析工具
计划json文件自动生成工具
c实现自定义语法的词法扫描器
- 如何定义一套既简单又能丰富表达要创建什么样的json文件的配置语言
- 在定义好之后，如果读取分析
- 分析完毕后调用jansson库自动生成json文件

2015-07-21

schema解析代码分享

2015-07-20

看jansson代码，在avro中有

2015-06-24

O记 phone interview
- 进程、线程最大区别，进程间通讯，fork，锁
  - 独立的内存空间
  - 进程间同步: 消息队列、共享内存、管道、套接字(socket)
  - 线程间同步: Event, Critical section, Mutex, Semaphore
- 内核数据通道，open系统调用过程，fd和磁盘文件关系
  - 描述层次关系
  - 描述文件系统各个结构体关系
- makefile三个主要组成，环境变量操作影响
  - 目标、依赖、命令
- grep、sed、awk、find、统计代码行数，正则表达式
  - wc -l 统计文件函数
  - 统计source code代码行数可以使用find, xargs, wc, tail来实现
  - grep主要用来在某个文件或目录范围内查找字符串
  - sed主要用来修改文件
  - awk主要用来搜索匹配某个文件的信息
  - find查找文件
- 变长数组实现
- 动态库和静态库的实现区别,如何定位要取的信息(地址)
  - 动态库，相对位置
- 内存操作
- gdb调试，常用功能命令，断点、单步，堆栈原理
  - break
  - st
  - bt
- 设计模式、单例, 如何获取唯一的指针
  - 构造函数私有, 类对象指针私有
  - 定义一个public接口函数，在实现中使用一个静态的初始值为false的bool变量，
    做if判断，如果第一次调用该函数就new一个该类的对象，赋给私有的对象指针，
    然后修改布尔值为真，返回对象指针。
- 网络编程，client与server通讯过程，轮询的使用
  - connect，receive/send, bind,listen,accept,select,read/write,close,poll,epoll
  - select vs epoll
  - how epoll work
- 存储性能瓶颈排查，top、vmstat、iostat、iftop、netstat
- cpu高是否就表示软件有问题
- 网络问题，各种参数设定
- 举一个工作中性能优化的例子

2015-06-11

重温《C专家编程》
重温《C++ Primer》

2015-06-10

How to write a good unit test

2015-06-09

写python代码, 调用动态库接口函数
- 然后通过函数参数回传值
- 如何获取动态方法的结构体参数数据
- 理解regex的match方法使用
python写单元测试代码

2015-06-08

写python代码, 调用动态库接口函数

2015-06-05

ramcloud
- ramcloud
- RAMCloud：内存云存储的内存分配机制

2015-05-21

清理电脑

2015-05-20

请假

2015-05-19

继续交接usnmp, zd15

2015-05-18

转手上的两台Dell服务器加一个光纤网卡
继续讲解交接SCM
上班

2015-05-15

请假

2015-05-14

请假

2015-05-13

请假

2015-05-12

交接SDK、SCM和CI
上午请假

2015-05-11

请调休假

2015-05-08

请调休假

2015-05-07

准备固定资产转账
提交离职信
Hon SDK
- 安装Cygwin, 选择gcc，make包
- 下载ipmitool 1.8.15
  - ./configure –enable-intf-lanplus
  - make
- 安装Cygwin
- BIOS BMC 设置 ipmi ip address, userid and password
  - ipmitool -I lan -H 172.16.60.72 -U root -P 123456 sdr type ‘Power Unit’
  - ipmitool -I lan -H 172.16.60.72 -U root -P 123456 sdr type Temperature
  - ipmitool -I lan -H 172.16.60.72 -U root -P 123456 sdr type Fan
- 参考 http://stackoverflow.com/questions/12907005/ipmitool-for-windows
- 参考 Data Center Management: Windows users can use IPMI tools too!

2015-05-06

写工作交接文档
Hon SDK
- 装windows server 2008
- 装MSM

2015-05-05

理解 TCP/IP 网络栈 & 编写网络应用
Understanding TCP/IP Network Stack & Writing Network Apps
讲解snmp给yangzc
研究怎么获取Intel主板硬件信息(温度、风扇、电源)
- ipmiutil, windows 安装，但没有办法用，说找不到imbdrv.sys或ipmidrv.sys. 可是明明就有.

2015-05-04

研究怎么获取Intel主板硬件信息(温度、风扇、电源)
- OpenHardwareMonitor 7.0 也获取不到风扇信息

2015-04-30

hostname问题
- ssh登陆刚装完系统的机器，做如下操作:
  [root@yys ~]# hostname
  yys
  [root@yys ~]# hostname aaaa
  [root@yys ~]# hostname
  aaaa
  [root@yys ~]# ip a
  ….（省略N行）
  [root@yys ~]#
  [root@yys ~]# ifconfig eth1 down
  [root@yys ~]# hostname
  yys
- 可以看到执行ifconfig eth1 down后hostname又变回去了.
- rpm -qf $(which hostname)
- download net-tools-1.60.tar.bz2 http://www.linuxfromscratch.org/blfs/view/6.3/basicnet/net-tools.html
  - hostname.c
  - ifconfig.c
- 查看hostname.c, 发现是通过gethostname系统调用获取的
  - gethostname定义在内核文件kernel/sys.c中, 通过调用u = utsname();获取。
  - utsname定义在include/linux/utsname.h中，函数返回的是struct new_utsname
  - 所以最终，hostname命令得到的是结构体struct net_utsname的nodename成员
- 利用strace命令追查系统调用，strace hostname, strace ifconfig
- 系统重启了，无法重新问题
- 涉及的配置文件和命令
  - /etc/hosts
  - /etc/sysconfig/network
  - /proc/sys/kernel/hostname
  - sysctl kernel.hostname
  - /etc/rc.d/rc.sysinit
- 参考深入理解Linux修改hostname
- 参考 Linux struct utsname 结构详解
三才五格数理剖象法起名不靠谱

2015-04-29

OpenHardwareMonitor, 放到172.16.60.57上跑，没有CPU温度以及Fan信息
八字研究

2015-04-28

清理110.125和110.121(现在是60.75)两台Dell服务器中的资料
Hon SDK 4 Intel mainboard
- 一种可以尝试的方法是通过OpenHardwareMonitor获取风扇、电源和温度信息
ustor centos-7 继续修改另外涉及的Makefile文件

2015-04-27

ustor centos-7, 解决LIO的一个编译警告”implicit declaration of function”. 原因
是，当前目录有个iscsi.h，还有个lio/iscsi.h, 两个文件都是使用#ifndef _ISCSI_H_
这就造成后面编译的lio/iscsi.h的函数声明直接被跳过了。
解决方法就是lio/iscsi.h的宏修改为#ifndef _LIO_ISCSI_H_即可
Hon SDK 4 Intel mainboard
- 原来的方法是通过读取超微的工具在系统日志中写的信息来获取
- 新的主板是Intel，超微的工具不能使用了。
vim加密解密
- 加密方法一:set key=123456, 然后保存退出:wq
- 加密方法二:X, 然后输入密码两次，最后保存退出:wq
- 删除密码:set key=, 然后保存退出:wq

2015-04-24

解决网卡顺序错误问题ucli nic_map
- 使用到awk数组，使用字符串做数组序号，赋值后得到的数组顺序是乱的，要重新排序。
  就用到asorti函数
- cat /tmp/.nic |awk '{print $1,$2}'|awk '{aa[$1]=aa[$1]","$2;asorti(aa,tA);}END{for(i in tA)print aa[tA[i]]}'|sed 's/^,//'
- 参考 http://www.cnblogs.com/chengmo/archive/2010/10/09/1846696.html
HUS
- 提交性能测试记录

2015-04-23

HUS
- 8:55确认一切正常，昨天的10路回放已播放完毕，自动停止。
- 10:40停止原来的10路回放，开启另外10路的回放。
- 14:26检查一切ok, 已截图
- 17:13检查一切ok, 已截图
- 17:28 抽查最后一个扩展柜的视频文件播放ok

2015-04-22

相关工作总结
1. 完成snmp windows开发
2. 完成zd15 Linux版本开发
3. 完成短信猫Linux接口研发
4. 万兆网卡性能测试和支持
HUS
- 9:20确认一切正常，后面把10路回放给跑起来, 已截图
- 14:00查看发现有一路回放出现视频流播放失败
- 15:00最后一个扩展柜的视频文件播放ok, 已截图
- 17:30检查一切ok, 已截图
完成2015第一季度考核表格填写

2015-04-21

HUS
- 存储100路, 转发60路，2Gbps 66%， CPU 44%, 内存1.57GB
- 重新调整机柜磁盘
- 原来的存储，一拖二
  - 主柜磁盘空
  - 扩展柜1磁盘满, 24盘
  - 扩展柜2磁盘满, 24盘
- 调整后，一拖三
  - 主柜磁盘空
  - 扩展柜1磁盘, 10盘
  - 扩展柜2磁盘, 14盘
  - 扩展柜3磁盘满, 24盘
- 创建磁盘组, 设置如下:
  - RAID 5
  - Strip Size 64KB
  - Disk Cache Policy : Disable
  - Read Policy : Always Read Ahead
  - IO Policy: Direct IO
  - Current Write Policy: Write Back
  - Default Write Policy: Always Write Back
- 第1个磁盘组: 第1个扩张柜，10块磁盘，使用 9块建RAID，剩下1块作热备盘, 大小14.551TB
- 第2个磁盘组: 第2个扩张柜，14块磁盘，使用13块建RAID，剩下1块作热备盘, 大小21.826TB
- 第3个磁盘组: 第3个扩张柜，24块磁盘，使用23块建RAID，剩下1块作热备盘, 大小40.015TB

2015-04-20

申离确认
HUS
- 周五下午17:50测试到今天早上9:00，没有发现问题。视频回放估计因为已经播放完毕了
  所以停止播放。没有发现视频流播放失败现象。
- 10:57, 录像增加到80路，此时2Gbps占用50%，CPU 40%，内存1.47GB, 看视频回放，发现
  在10:49出现视频流播放失败，鉴于此先停止两路回放，当前回放路数为12路。但是当前的
  网络差不多也就900Mbps左右，还不到1024Mbps
- 16:04, 转发30存储80回放12路，共122路，检查没有发现问题，CPU占用60%，内存1.50GB，网络930Mbps
  根据这个记录，看起来是回放不能多(看起来超过14路就有问题), 存储路数可以增加。

2015-04-17

HUS
- 测试记录
  - 回放15路，转发30路，NVR 2Gbps网络占用45% CPU占用63%, 内存使用1.43GB
    经过一个晚上和一个早上的测试，下午过来发现出现很多的视频流播放失败。
  - 回放12路，转发30路，NVR 2Gbps网络占用45% CPU占用63%, 内存使用1.48GB
    14:30开始测试, 网络大概850Mbps
  - 15:00, 码流模拟服务器(60.75)、控制中心VMS(60.224)被关机重启了，存储服务
    器NVR(60.72)被关机了.实验室环境太恶劣。。。
  - 15:30, 重新做测试, 回放12转发30存储64, CPU 75%+, 比较重，网络44%，内存1.22GB,
  - 16:40, 重新做测试, 回放12转发30存储64, CPU 50%+, 网络44%，内存1.27GB,
  - 17:50, 重新做测试, 回放12转发30存储64, CPU 50%+, 网络44%，内存1.25GB,

2015-04-16

HUS
- 128路1080P@8Mbps并发录像+转发+回放, 128*8/8=128MB/s
- 512路D1@2Mbps的并发录像+转发+回放, 512*2/8=125MB/s
- 128路是指并发录像+转发+回放的总路数,
  - 比如说录像设备64路，转发30路，回放16路，那么就110路
  - 其中对于存储设备来说，录像的速率属于下行速率，转发+回放属于上行速率
- 参考 http://www.zhihu.com/question/22877157
- 参考高清视频存储几大主流技术及存储难题
- 参考数字视频监控技术基本术语
- 客户端看视频回放和转发是否正常，不能用远程桌面去看
- 测试记录
  - 使用64路设备做码流模拟
  - 回放 0路，转发 0路，NVR 2Gbps网络占用26% CPU占用50%
  - 回放21路，转发35路，NVR 2Gbps网络占用48% CPU占用75%, 视频流播放失败.
  - 回放16路，转发30路，NVR 2Gbps网络占用45% CPU占用68%, 回放出现视频流创建失败错误.
  - 回放12路，转发30路，NVR 2Gbps网络占用42% CPU占用60%, 视频播放ok.
  - 回放15路，转发30路，NVR 2Gbps网络占用45% CPU占用63%, 内存使用1.43GB, 视频流播放失败.

2015-04-15

HUS
- 怀疑cpu性能不足，直接把系统硬盘换到另外一台CPU比较好的机器上，但出现无法做网口绑定,
  原来的分组无法删除.
- 新机器:
  - Intel Xeon CPU E3-1230 v3 3.30GHz, 4核8线程
  - Intel I210 Gigabit Network

2015-04-14

HUS
- 网络视频服务器下的设备修改存储时长，增加视频存储时间
- 测试回放：客户端的实时视频，右键选择设备，播放视频 -> HUS-NVR(不要选择直连视频)
- 不要看路数，看转发+存储+回放总码率是否达到1Gbps, 要做到这个还要做网口绑定
- windows 2008 server r2 intel 82574L 网卡绑定
  - 驱动更新https://downloadcenter.intel.com/product/32210/Intel-82574-Gigabit-Ethernet-Controller
  - 参考http://struggle.blog.51cto.com/333093/202363
- 转发工具使用:
  - server：NVR的IP地址
  - port：保持默认，不要修改
  - Cfg File：选择桌面上的VIPC_64或者VIPC_15，如果需要其他路数，请从VIPC_64中
    选取需要的数量，另存为一个文件即可
  - Mode：选择TCP或者UDP均可，建议使用TCP
  - Client Count：填写数量要与使用的文件中记录的条数一致，比如选用VIPC_15的文
    件，就填写15个，选用
  - VIPC_64的文件就填写64
  - ReConnect Time：保持默认0，不要修改
  - PS：STOP按钮有BUG，如果需要停止，请直接点击“Exit”按钮退出后重启打开
    如果发现关闭转发测试工具后，仍然有转发码流存在，重启下NVR即可
- 要求: 客户端实时视频，画面清晰，时间无跳秒, 存储2Gbps带宽使用率50%只是一个期望值
- 测试记录
  - 使用64路设备做码流模拟
  - 回放 10路，转发16路，2Gbps网络占用36%, NVR CPU占用(没记录), 转发视频抽查是否跳帧(否)
  - 回放 16路，转发16路，2Gbps网络占用38%, NVR CPU占用(没记录), 转发视频抽查是否跳帧(是)
  - 回放 12路，转发16路，2Gbps网络占用37%, NVR CPU占用(没记录), 转发视频抽查是否跳帧(否)
  - 回放 12路，转发31路，2Gbps网络占用43%, NVR CPU占用75%, 转发视频抽查是否跳帧(是)
  - 回放 12路，转发21路，2Gbps网络占用39%, NVR CPU占用55%, 转发视频抽查是否跳帧(否)
  - 回放 12路，转发20路，2Gbps网络占用39%, NVR CPU占用65%, 转发视频抽查是否跳帧(是)

2015-04-13

HUS 测试, 客户端无法播放多路视频
- AMD FirePro V3900, 装了9.003.3WinServer2008R2版本驱动,但dxdiag查不到驱动信息
- 安装了Win7, 发现远程桌面查看的dxdiag不支持一些功能，但是直接去机器上操作是支持的。
- 目前只能播放2路实时视频，无法做视频回放

2015-04-10

HUS 测试, 1080P@8M 128路存储
- 测试机器IP信息:
  - 存储服务器(NVR): 172.16.60.72, Intel Core i3-3220 CPU 3.30GHz, RAM 8GB
  - 码流模拟服务器(VIPC): 172.16.60.75
  - 管理中心服务器(VMC): 172.16.60.224
  - 客户端机器: 172.16.60.57
- 码流服务器开了7个服务窗口，理论可以提供20x7=140路录像
- 当前测试路数是80, 有60路还没有加入录像规则
- 问题: 为什么码流服务器出现”live555 media server crash,please check the miniDump file”
- 存储服务器背板需要修改，先关闭系统。
- 1080P最多只能64路存储，模拟码流工具的文件如果用1080.264就是模拟1080P的码流
- D1最多只能256路存储，模拟码流工具的文件如果用D1.264就是模拟D1的码流
- 注意路数限制，1080P的码流不能开太多，模拟前端和NVR都会崩溃的
- 码流模拟工具使用:
  - 打开VIPCServer.exe, 出现配置多个IP的对话框不管，直接关闭即可。
  - MediaServer，注意端口号，确定可用，然后点击”Start”
  - OnvifServer, 注意端口号，确定可用，取消”用户认证”
    主码流和辅码流注意MediaServer选项跟左边MediaServer配置的端口号一致
    文件名称选择1080.264或D1.264
  - 每启动一个MediaServer和OnvifServer,只支持22路码流, 通常只做20路支持。
    所以要测试1080P@8M64路支持，就要设置4个不同的端口，打开4次服务，共8个控制台窗口。
- NVR设置
- VMC设置
- 要求带宽(throughput)=1024Mbps，能够支持128路1080P@8Mbps并发录像+转发+回放或者512路D1@2Mbps的并发录像+转发+回放

2015-04-09

LIO
- CHAP 权限登陆解决了，需要再tpg1下设置userid和password, 我原来时再acls下的iqn下设置的
  这种方式是不对的。
- 好像iscsiadm 设置权限无法直接再/etc/iscsi/iscsid.conf设置生效，要用命令行操作:
  iscsiadm -m node -T iqn.2007-10.lio.com:dg3.liolv1 -p 172.16.130.100 -o update –name node.session.auth.authmethod –value=CHAP
  iscsiadm -m node -T iqn.2007-10.lio.com:dg3.liolv1 -p 172.16.130.100 -o update –name node.session.auth.username –value=alan
  iscsiadm -m node -T iqn.2007-10.lio.com:dg3.liolv1 -p 172.16.130.100 -o update –name node.session.auth.password –value=123555
  iscsiadm -m node -T iqn.2007-10.lio.com:dg3.liolv1 -p 172.16.130.100 –login
- 下一个问题，解targetcli sessions问题
Windows Server 2008 Standard Edition 64位版本最多支持32GB内存
http://jingyan.baidu.com/article/22a299b52469ce9e19376ac8.html

2015-04-08

Windows Server 2008 R2 Standard: 4GGC4-9947F-FWFP3-78P6F-J9HDR, 试用期180天
http://www.kwstu.com/ArticleView/419895180_2014123134259111
HUS
- 重新安装服务端

2015-04-07

HUS
- 服务端配置
- 模拟码流工具,也要需要单独机器，比较耗CPU，如果和服务端一起跑，估计跑不到256路
- 客户端需要单独显卡，用于测试录像回放, 每起一个server，重新设置一个端口
- 码流工具
  - 一个server只支持22路，一般设置20路
  - 1080P, 8M, 最多64路
  - D1, 2M, 最多256路

2015-04-03

寻找DDoS攻击GitHub的幕后组织
- TTL作用，用途
HUS环境搭建
- U盘安装windows server 2008 standard
- 码流模拟服务器
- 存储阵列: 1.装“网络视频录像机” 2.MSM
- 服务器:
- 1. 安装Web Server IIS
- 1. 安装消息队列
- 1. SQL Server 2008 Standard
  - 功能全选
  - 服务选择下拉框第一个
  - 混合认证
- 1. 装HUS服务组件
  - 不选 “网络视频录像机”
  - 不选 “容灾备份服务”
  - 不选 “数据库管理工具”
- 客户端: 全选HUS客户端
- Licence

2015-04-02

百度云盘默认只支持上传4G大小以内的文件，如想上传超过4G的大文件，必须开通VIP且
下载云管家客户端才能支持。参考帮助中心
LIO 打开pr_debug调试信息
- windows login成功了.
  - iSCSI Initiator Properties –> Logon
  - Log On to Target –> Advanced…
  - Advanced Setting -> “CHAP logon information”, 填写”User name:”和”Target secret”即可
如何在 Linux 下大量屏蔽恶意 IP 地址
- 过滤单个IP地址使用iptables -A INPUT -s 1.1.1.1 -p TCP -j DROP
- 过滤单个IP段使用iptables -A INPUT -s 1.1.2.0/24 -p TCP -j DROP
- 使用ip集ipset
  - ipset create banthis hash:net 创建
  - ipset list 显示所有
  - ipset add banthis 1.1.1.1 增加IP
  - ipset add banthis 1.1.2.0/24 增加IP段
  - ipset add banthis 1.1.5.0/24 增加IP段
  - iptables -I INPUT -m set --match-set banthis src -p tcp --destination-port 80 -j DROP
  - ipset save banthis -f banthis.txt 保存IP集到本地
  - ipset destroy banthis 删除IP集
  - ipset restore banthis -f banthis.txt 从本地IP集恢复
- IP地址黑名单: i-blocklist

2015-04-01

linux kernel 打开 pr_debug调试信息
- http://blog.csdn.net/helloanthea/article/details/25330809
HUS 4.3
- SQL Server 2008 版本太低
- 图解SQL Server 2008安装和配置过程
  - 选择混合认证，密码:XXX!XXX!NNNN
- 安装sql server 2008时出现”microsoft.sql.chainer.package.PropertiesTypeProperty”
  主要是因为在压缩包里点击的安装，解压后再安装就没有问题了。
  参考 http://blog.chinaunix.net/uid-10239851-id-2967893.html
- rdesktop 连接 windows server 2008 出现要求CredSSP的问题，修改远程属性:
  选择 “允许运行任意版本远程桌面的计算机连接(较不安全)(L)“

2015-03-31

HUS 4.3
- 下载 sql server 2008 http://www.microsoft.com/en-us/download/confirmation.aspx?id=5023
- 下载 net framework 3.5 http://www.microsoft.com/zh-cn/download/confirmation.aspx?id=25150
- 下载 Windows Server 2008 R2 多语言支持http://www.microsoft.com/zh-cn/download/confirmation.aspx?id=1246
- 安装 net framework 3.5 sp1
- 安装 teamviewer
- sql server 2008 chs 无法安装，系统语言不支持

2015-03-30

学习使用HUS4.3软件。
- Install MSMQ: https://msdn.microsoft.com/zh-cn/library/aa967729(v=vs.110).aspx)
- 文档:
  - Manual: 用户使用手册
  - MI: 安装指导手册
- 要预备3台机器
  - 安装 Windows 2008 R2 64bit standard、SQL server 2008 standard、HUS-SVM,
  - 安装 HUS-Client
  - 安装 Windows 2008 R2 64bit standard、HUS-NVR
使用DDEBUG方式编译iscsi_target_mod模块，但LIO所在的centos7挂掉了，启动不了。需要重装。

2015-03-28

关于debugfs, Linux内核里的DebugFS
服务器的terminal中使用vim，在其他地方复制代码过来时候，会出现自动注释注释之后的代码
解决方法是在.vimrc中添加set pastetoggle=<F9>,用来固定粘贴的格式. gvim没有发现这个问题.
- 参考vim黏贴代码格式混乱的解决方法
★★★调试LIO内核代码:
- 使用printk, 类似: printk(KERN_ERR "Illegal value %d\n", flag);
  #define KERN_EMERG KERN_SOH “0” / system is unusable /
  #define KERN_ALERT KERN_SOH “1” / action must be taken immediately /
  #define KERN_CRIT KERN_SOH “2” / critical conditions /
  #define KERN_ERR KERN_SOH “3” / error conditions /
  #define KERN_WARNING KERN_SOH “4” / warning conditions /
  #define KERN_NOTICE KERN_SOH “5” / normal but significant condition /
  #define KERN_INFO KERN_SOH “6” / informational /
  #define KERN_DEBUG KERN_SOH “7” / debug-level messages /
  打算使用KERN_DEBUG宏
- 由于最终是根据auth的naf_flags栏位来判断的，所以看函数调用过程中该段的设置情况。
  iscsi_target_nego.c的函数iscsi_handle_authentication
- [root@Ustor linux-3.10.0-123.el7]# modinfo iscsi_target_mod 查询模块路径
  /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/target/iscsi/iscsi_target_mod.ko
- cp /lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/target/iscsi/iscsi_target_mod.ko{,.bak}
  对原来的模块进行备份
- 停止target服务service target stop, 确认停止状态service target status
- lsmod |grep target 查看所有的target相关模块
  [root@Ustor linux-3.10.0-123.el7]# lsmod |grep target
  target_core_pscsi 18810 0
  target_core_file 18030 0
  target_core_iblock 18177 0
  iscsi_target_mod 278732 1
  target_core_mod 299412 5 target_core_iblock,target_core_pscsi,iscsi_target_mod,target_core_file
- rmmod iscsi_target_mod 无法删除模块, rmmod: ERROR: Module iscsi_target_mod is in use
- lsmod 查看到iscsi_target_mod的被使用(used by)数字是1，但是名单是空的，类似的
  还有dm_mod,iptable_nat,不知道被谁用，就没有办法使用rmmod来删除。。。
- 抄了一个内核模块代码，检查iscsi_target_mod模块被使用造成无法删除的问题
- 强力卸载内核模块
  - 注意以下两点:
  - modules_which_use_me 已经不是module的成员，需要修改为source_list
  - local_set(module_ref_addr(mod,cpu),0); 不能使用,module_ref_addr已经不存在
  - 现在没有使用代码修改引用次数，因为可能真的被使用了，强制修改可能导致内核挂掉.
- Makefile可以这样写: 参考http://blog.csdn.net/ghostyu/article/details/6869138
  1 obj-m := mymod.o
  2 KERNEL_DIR := /lib/modules/$(shell uname -r)/build
  3 PWD := $(shell pwd)
  4 all:
  5 make -C $(KERNEL_DIR) SUBDIRS=$(PWD) modules
  6 clean:
  7 rm .o .ko *.mod.c
  8
  9 .PHONY:clean
- linux 强制删除内核模块(由于初始化错误导致rmmod不能删除)
- Linux内核模块的强制删除-结束rmmod这类disk sleep进程
- 内核一直在报xfs的错误:
  [root@Ustor iscsi]# dmesg |tail
  [169231.988879] used by NULL
  [169231.988880] name:iscsi_target_mod state:0 refcnt:1
  [169256.555688] XFS (dm-6): xfs_log_force: error 5 returned.
  [169286.606509] XFS (dm-6): xfs_log_force: error 5 returned.
  [169316.657311] XFS (dm-6): xfs_log_force: error 5 returned.
  [169341.992366] [rmmod mymod] name:mymod state:2
  [169346.708131] XFS (dm-6): xfs_log_force: error 5 returned.
- 代码中有很多的pr_debug函数打印信息，怎么把开关打开呢?
  - https://www.kernel.org/doc/local/pr_debug.txt, add CFLAGS_[filename].o := -DDEBUG to makefile
  - http://blog.chinaunix.net/uid-20746260-id-3044842.html

2015-03-27

明天加入调试信息看看为什么CHAP验证不通过！
LIO的iscsi内核模块单独编译:
- 搜索下载内核源码rpm: kernel-3.10.0-123.el7.src.rpm, 安装提取内核包源码, 解压
- /root/linux-3.10.0-123.el7
- cd /root/linux-3.10.0-123.el7
- make oldconfig && make prepare
- make scripts
- make CONFIG_ISCSI_TARGET=m -C /root/linux-3.10.0-123.el7 M=drivers/target/iscsi
  - 注意，这里CONFIG_ISCSI_TARGET就是相对路径drivers/target/iscsi/下的Makefile中定义的
  - -C 参数值使用内核源码的绝对路径
  - M= 参数值使用模块源码的相对路径
iSNS作用和工作原理
解LIO Initiator CHAP auth login失败问题:
- iSCSI CHAP 认证过程
- 昨天看到/dev/shm/core/messages 打印 “CHAP user or password not set for Initiator ACL”
- grep -rl “Initiator ACL” 找到文件 ./iscsi/iscsi_target_auth.c
  pr_err(“CHAP user or password not set for”
```
" Initiator ACL\n");  
```
- 由于这个打印是两个字符串拼凑起来的，所以如果使用grep整个字符串匹配查询会失败！
- if (!(auth->naf_flags & NAF_USERID_SET) || !(auth->naf_flags & NAF_PASSWORD_SET))
  检查struct iscsi_node_auth->naf_flags, 查看 iscsi_target_core.h, 共有四种值:
  NAF_USERID_SET = 0x01,
  NAF_PASSWORD_SET = 0x02,
  NAF_USERID_IN_SET = 0x04,
  NAF_PASSWORD_IN_SET = 0x08,
  这里NAF是取自Node Auth Flag的首字母。
- 函数调用过程: iscsi_target_start_negotiation -> iscsi_target_do_login
  -> iscsi_target_handle_csg_zero -> iscsi_target_do_authentication
  -> iscsi_handle_authentication -> chap_main_loop -> chap_server_open
- iscsi_target_configfs.c
- iscsi_target_tpg.c
  - iscsit_tpg_add_network_portal
- iscsi_target.c
  - iscsit_add_np
- 登陆 iscsi_target_login.c
  - iscsi_target_login_thread
  - __iscsi_target_login_thread
- 协商 iscsi_target_nego.c
  - iscsi_target_start_negotiation
  - iscsi_target_do_login
  - iscsi_target_handle_csg_zero
- 权限 iscsi_target_auth.c
  - chap_main_loop
  - chap_server_open
- LIO工作原理:
  LIO对于一个LU，分配一个recv线程与一个send线程，recv线程接收Initiator发来的iSCSI PDU，
  解析成SCSI请求后交给send线程，send线程将请求发给LU，并将LU返回的结果返回给Initiator。
  LIO的send线程与recv线程使用一个队列进行通信，该队列中的SCSI请求，有些不关心顺序，有些却关心，
  这些都是在send线程遍历队列时才进行处理的。
  参考分布式存储支持iSCSI协议调研
ucli 进去执行 cifs_service -a -c "nas server"... 发现双引号的参数值被分割了
经过检查代码，发现是main函数在处理参数时就使用空格做了分割来token，然后再传递
给cifs_serivce函数，当然出现问题了。但是直接执行ucli cifs_service …就不会有
问题，因为这种方式再main中不会对参数处理，直接传递给cifs_service处理，所以不会有问题.
Set terminal title on Fedora 21
- 我的版本GNOME Terminal 3.14.2，貌似3.14版后移除了设置标题的功能. 替代方式是在
  .bashrc中加入代码来实现。
- How to rename terminal tab in Fedora 21
- How to rename terminal tab title in gnome-terminal?

2015-03-26

LIO
- 解决命令行设置CHAP AUTH没有生效问题
- 修改block为fileio
- 解chap登陆失败问题:
  - /dev/shm/core/messages 打印 “CHAP user or password not set for Initiator ACL”
使用wireshark发现 172.16.50.10一直发送SSDP封包
- SSDP 协议是简单服务发现协议 (Simple Service Discovery Protocol) ，该协议定
  义了如何在网络上发现网络服务的方法。
Test vmware tool on Fedora x86_64
- https://my.vmware.com/web/vmware/info/slug/desktop_end_user_computing/vmware_horizon_clients/3_0
- 安装： root权限, 然后执行sh ./VMware-Horizon-Client-3.2.0-2331566.x86.bundle
- 卸载： root权限,
  - sh ./VMware-Horizon-Client-3.2.0-2331566.x86.bundle -l 显示产品名
  - sh ./VMware-Horizon-Client-3.2.0-2331566.x86.bundle -t 显示模块名
  - sh ./VMware-Horizon-Client-3.2.0-2331566.x86.bundle -u vmware-horizon-client 卸载
- 更多信息sh ./VMware-Horizon-Client-3.2.0-2331566.x86.bundle --help
- 或者gvim ./VMware-Horizon-Client-3.2.0-2331566.x86.bundle直接查看shell代码
- 这边下载安装了，但是命令行运行vmware-view提示缺少库，所以又卸载掉了。。。
重新做磁盘组，建逻辑卷，LIO，内核还是报错，文件/dev/shm/core/messages：
- 检查iscsi的连接：发现很多TIME_WAIT
  [root@Ustor ~]# netstat -apn |grep 3260
  tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN -
  tcp 0 0 172.16.130.100:3260 172.16.70.17:59983 TIME_WAIT -
  tcp 0 0 172.16.130.100:3260 172.16.70.17:59981 TIME_WAIT -
  tcp 0 0 172.16.130.100:3260 172.16.130.5:57426 TIME_WAIT -
  tcp 0 0 172.16.130.100:3260 172.16.70.17:59982 TIME_WAIT -
  tcp 0 0 172.16.130.100:3260 172.16.130.5:57391 TIME_WAIT -
- 初步判断出现那么多的iSCSI Login negotiation failed. 是因为该iscsi卷不存在。
- 剩下的问题是登陆这些Initiator端机器确认下是否做了错误的连接。
- 已确认，确实是这两台机器的Initiator一直次重连，因为软件之前的做的连接还保留
  但是T端已经修改了磁盘组和逻辑卷，导致重连失败，但是I端会一直发送连接登陆请求。
  问题是LIO内核碰到这种情况能怎么办?只好一直打印lun不存在的事实。

2015-03-25

一次无法连接https://github.com问题的解决过程
- 把commit的结果提交到origin master分支出现如下错误:
  [dennis@localhost matrix207.github.com]$ git push origin master
  fatal: unable to access ‘https://github.com/matrix207/matrix207.github.com.git/‘:
  Could not resolve host: github.com
- 无法ping通www.baidu.com
  [dennis@localhost ~]$ ping www.baidu.com
  ping: unknown host www.baidu.com
- 浏览器无法打开baidu主页
- 可以ping通网关172.16.50.1
- 可以ping通机器172.16.50.30
- 由于之前有做过traceroute www.baidu.com，所以知道网关的上层路由，以及外网的一些路由
  - ping 219.134.89.153 ok
  - ping 10.1.100.37 ok
  - ping 121.15.130.18 ok
  - ping 8.8.8.8 ok
- 通过以上分析，得到一个主要信息，通过ip地址是可以连通到外网，但是通过域名无法访问。
  说明域名解析出现问题，查看网络设置，发现使用的dns有两个172.16.1.255,8.8.8.8
  询问同事使用的DNS，是172.16.1.250. 修改255为250，重启网络service network restart，
  问题解决！
系统重启后，无/dev/dgXXX, 导致无法启动LIO服务
[root@Ustor ~]# grep -rn “/dev/“ /etc/target/saveconfig.json
338: “dev”: “/dev/dg3/liolv2”,
369: “dev”: “/dev/dg3/liolv1”,
400: “dev”: “/dev/dg5/iscsi1”,
[root@Ustor ~]# ucli dg_query_all
dg5
dg3
count: 4
[root@Ustor ~]# ls /dev/r
random raw/ rtc rtc0
[root@Ustor ~]# ls /dev/d
dg5/ disk/ dm-0 dm-1 dri/
[root@Ustor ~]# service target start
Redirecting to /bin/systemctl start target.service
Job for target.service failed. See ‘systemctl status target.service’ and ‘journalctl -xn’ for details.
[root@Ustor ~]# service target status
Redirecting to /bin/systemctl status target.service
target.service - Restore LIO kernel target configuration
Loaded: loaded (/usr/lib/systemd/system/target.service; disabled)
Active: failed (Result: exit-code) since Wed 2015-03-25 15:39:56 CST; 5s ago
Process: 4606 ExecStart=/usr/bin/targetctl restore (code=exited, status=1/FAILURE)
Main PID: 4606 (code=exited, status=1/FAILURE)
Mar 25 15:39:56 Ustor systemd[1]: target.service: main process exited, code=exited, status=1/FAILURE
Mar 25 15:39:56 Ustor systemd[1]: Failed to start Restore LIO kernel target configuration.
Mar 25 15:39:56 Ustor systemd[1]: Unit target.service entered failed state.
查看哪些进程打开了 /dev/shm/core/messages
[root@Ustor ~]# lsof |grep ‘/core/messages’
rsyslogd 612 root 5w REG 0,17 145664 16152 /dev/shm/core/messages
in:imuxso 612 747 root 5w REG 0,17 145664 16152 /dev/shm/core/messages
in:imklog 612 748 root 5w REG 0,17 145664 16152 /dev/shm/core/messages
rs:main 612 749 root 5w REG 0,17 145664 16152 /dev/shm/core/messages
参考 http://www.ibm.com/developerworks/cn/aix/library/au-lsof.html
关于/dev/shm目录
- 该目录是不存在磁盘，只存在内存中，
- 通过df -h和cat /proc/meminfo, 可以看到/dev/shm的大小是内存大小的一半.
- 参考: http://www.xifenfei.com/1605.html
分析LIO，核态报错问题: 文件 /dev/shm/core/messages (r5,rd5,raid5这些磁盘组谁建的?)
Ustor kernel: [400406.256525] Unable to locate Target IQN: iqn.2007-10.lio.com:rd5.sc6 in Storage Node
Ustor kernel: [400406.256537] iSCSI Login negotiation failed.
Ustor kernel: [400406.256569] Unable to locate Target IQN: iqn.2007-10.lio.com:r5.sc3 in Storage Node
Ustor kernel: [400406.256576] iSCSI Login negotiation failed.
Ustor kernel: [400408.255101] Unable to locate Target IQN: iqn.2007-10.lio.com:raid5.sc1 in Storage Node
Ustor kernel: [400408.255115] iSCSI Login negotiation failed.
继续对nfsd占用CPU较高问题进行回复:
- 按照当前的CPU状态，系统的负载是比较重了，但是这个是iozone压力/性能测试，
  实际业务环境是不是也是这样的情况呢？(也会有那么大的压力？)
  这个可能客户那边才清楚，所以50台或更多客户端，如果不是都这样使用iozone跑，应该不会有问题。
- 即使使用iozone，按照万兆网卡跟磁盘的速度比，网络数据的流量没有上去(没有跑满万兆)，
  磁盘的读写速度应该可以处理。
- 随着客户端的增加，数据量增加，系统压力增加，cpu就更繁忙，nfsd的cpu占用率也应该
  会上涨，但不会说跟着客户端的数量呈线性增长，如果这样nfs这样的文件系统也不会到
  现在还一直流行；可能的后果是系统会比较缓慢，但死机就应该不会.
- 还有几个问题需要深入了解下:
  - nfsd只有8个，如果客户端增加，进程数不变吗?
  - nfs最多支持多少客户端，限制条件是什么?如何计算?
  - 我们的存储产品都有哪些极限?带宽，磁盘IO，IOPS, 能承受的数据压力等等.
NFS 源码分析

2015-03-24

回复技术支持:
分析如下：
通过top命令看到wa的值，高的时候达到60%多，这说明CPU大部分时间都再等待IO完成。
测试环境是两台服务器使用iozone通过万兆网卡的连接，对存储机器进行读写测试。
读写速度有800MB/s以上，这样的负荷应该算蛮高了。
通过iostat查看到%util有时候达到100，说明磁盘已处于饱和状态。
这边找了一台其他的服务器测试，也是用iozone，使用nfs共享挂载方式，查看到nftd的使用率比较高（10%+）
所以，
1.nfsd占用cpu使用率的问题，在这样的测试环境下(大规模的读写测试)，应该属于正常。
2.wa值高，对于超过800MB的磁盘读写速度的测试环境来说，应该也差不多，至于是否
同型号其他的产品比较低，只有客户这里的比较高，多少算合理，需要花时间研究整理。
top显示当中wa值过高，可能的原因:
- 首先可以确认的是，wa偏高，大于50%,说明cpu大部分时间都是再等IO操作完成
- 可能是由于磁盘性能瓶颈造成的
- 参考: http://bbs.51cto.com/thread-953213-1.html
x86、i386、amd64、ia64的区别

2015-03-23

ps 中的D和Z
- D,往往是由于I/O资源得不到满足，而引发等待，在内核源码 fs/proc/array.c里，其文字定义为D(disk sleep)
- Z,Zombie(僵尸)进程. ps -ef|grep defunc, 清除ZOMBIE（僵尸）进程可以使用如下方法：
  - kill –18 PPID （PPID是其父进程）
  - kill –15 PID1 PID2(PID1,PID2是僵尸进程的父进程的其它子进程)。然后再kill父进程：kill –15 PPID
  - linux进程状态D和Z的处理
- Linux进程的Uninterruptible sleep(D)状态
iostat
- iostat
top wa 占用58%, iostat %util 达100%, 两台服务器连接存储，使用IOZONE, 万兆网卡测试.
- 第二行含义:
  - us, user : time running un-niced user processes
  - sy, system : time running kernel processes
  - ni, nice : time running niced user processes
  - wa, IO-wait : time waiting for I/O completion
  - hi : time spent servicing hardware interrupts
  - si : time spent servicing software interrupts
  - st : time stolen from this vm by the hypervisor
- top的wait高,平均负载高很明显就是CPU等待数据处理,瓶颈应该是在硬盘上
- iostat中的%util代表一秒内IO操作所占的比例，计算公式是(r/s+w/s)*(svctm/1000)
- iostat命令各列含义
  - %util：i/o请求提交到设备期间的cpu时间百分比。近于100%则表明设备饱和
- NFS通常运行于2049端口
- top wa占用高，用iostat探个究竟
- TOP里 %wa、load average偏高的故障处理
- iozone测试 nfs
- IOZONE 3 397, 下载地址
  - http://www.iozone.org/src/current/
  - http://www.iozone.org/src/current/iozone-3-397.src.rpm
  - make linux-AMD64, 注意这里确认cpu类型，参考 http://www.361way.com/cpuinfo/1510.html
- IOZONE 测试 nfs
  - showmount -e 172.16.130.111
  - ls /mnt
  - mkdir /mnt/nastest
  - mount -t nfs 172.16.130.111:/share/nfs1 /mnt/nastest
- 客户测试参数: ./iozone -az -b /nas1/wuliji.xls -g 256G -y 32k -i 0 -i 1
- 本地测试机，存储，神威，
  - top : Cpu(s): 4.0%us, 32.0%sy, 0.0%ni, 35.5%id, 15.5%wa, 0.0%hi, 13.0%si, 0.0%st
- 编译iostat : ./configure –build=sw_64-unknown-linux-gnu , make
  - sysstat-10.1.5-4.el7.x86_64
  - http://sebastien.godard.pagesperso-orange.fr/download.html
  - wget http://pagesperso-orange.fr/sebastien.godard/sysstat-11.0.2.tar.xz
  - wget https://github.com/sysstat/sysstat/archive/v11.0.2.tar.gz
  - tar zxvf sysstat-8.0.4.1.tar.gz
    cd sysstat-8.0.4.1
    ./configure –build=sw_64-unknown-linux-gnu
    make
    make install
- The precise meaning of I/O wait time in Linux
- Can anyone explain precisely what IOWait is?
- 对于wa过高,最后我会给出一个便利的工具排查问题,你也可以通过下面这篇blog排除
- ★★★Troubleshooting High I/O Wait in Linux
更新简历, 安装pandoc发现以来很多ghc包，看来pandoc可能使用haskell编写.
coding
汇报LIO工作进展:
通过上周加周六的努力，LIO代码已经上传至svn仓库的centos7分支，但代码整个功能
还有没完成的地方，总结如下：
- iscsi卷创建/删除/显示【已完成】
- chap用户创建/删除/显示【已完成】
- iscsi服务启用/禁用【已完成】
- io类型，目前使用的blockio，需要修改为fileio【修改代码即可】【难度系数0.1】
- chap模式下Initiator登陆有问题【待解决】【难度系数0.7+】
- session查询显示有问题，好像不支持【待核实】【难度系数0.7+】
- 估计需要3个工作日来处理以上问题

2015-03-21

LIO，新加的代码好像有点问题，甚至CHAP好像没有生效！
iscsiadm
- 查看session iscsiadm -m session
- 设置CHAP, vim /etc/iscsi/iscsid.conf
- LIO 设置了CHAP权限，linux 使用 iscsiadm无法连接上，为什么?
- 通过修改配置文件 /etc/iscsi/iscsid.conf， login失败.
- 通过命令操作，login 失败
  [root@localhost ~]# iscsiadm -m node -T iqn.2007-10.lio.com:dg3.liolv1 -p 172.16.130.100 -o update –name node.session.auth.authmethod –value=CHAP
  [root@localhost ~]# iscsiadm -m node -T iqn.2007-10.lio.com:dg3.liolv1 -p 172.16.130.100 -o update –name node.session.auth.username –value=alan
  [root@localhost ~]# iscsiadm -m node -T iqn.2007-10.lio.com:dg3.liolv1 -p 172.16.130.100 -o update –name node.session.auth.password –value=123555
  [root@localhost ~]# iscsiadm -m node -T iqn.2007-10.lio.com:dg3.liolv1 -p 172.16.130.100 –login
  Logging in to [iface: default, target: iqn.2007-10.lio.com:dg3.liolv1, portal: 172.16.130.100,3260] (multiple)
  iscsiadm: Could not login to [iface: default, target: iqn.2007-10.lio.com:dg3.liolv1, portal: 172.16.130.100,3260].
  iscsiadm: initiator reported error (11 - iSCSI PDU timed out)
  iscsiadm: Could not log into all portals
- https://wiki.debian.org/SAN/iSCSI/open-iscsi
- http://www.rootop.org/pages/2396.html
编辑iscsi配置文件，添加initiator单向认证
root@localhost ~]# vi /etc/iscsi/iscsid.conf
node.startup = automatic
node.session.auth.username = admin
node.session.auth.password = admin1234567890
[root@localhost ~]# service iscsi restart
LIO 无法查询到session信息！！！
- targetcli sessions 显示的是 (no open sessions)
- 通过netstat -apn |grep 3260 |grep ESTABLISHED可以看到连接的IP
- 执行 sessions 命令的代码:/lib/python2.7/site-packages/targetcli/ui_root.py, 函数ui_command_sessions
- 追寻到文件: /lib/python2.7/site-packages/rtslib_fb/root.py
  - 从代码上看，sessison是归属于node_acls,而node_acls属于tpgs,tpgs再属于targets
  - 而当前的配置中,没有设置acls, 那么就不会有session了，如果设置就有了吗？
- 追寻到文件: /lib/python2.7/site-packages/rtslib_fb/target.py
原来的页面如何获取连接状态(客户端IP, InitiatorName, 存储IP，状态)
- 可能的显示172.16.110.119 iqn.1994-05.com.redhat:4635fed69b24 172.16.130.114 active
- 使用firebug和grep定位文件: /opt/html/iscsi/iscsi_edit.php, 再找到函数iscsi_get_access_status
- /opt/html/public/inc/iscsi_manager.inc, 执行vd_query_conns -d $vgname -v $lvname
- vd_mngt.c -> iscsi.c, 读取/proc/net/iet/session信息
加班[9:30 ~ 17:30]，LIO

2015-03-20

C语言宏：如果对宏参数使用了#或##，那么宏参数不会展开.
- C语言宏的特殊用法和几个坑
  - 字符串化: #define TOSTRING(x) #x
  - 连接: #define COMMAND(NAME) { #NAME, NAME ## _command }
LIO 功能实现和代码提交
- iscsi服务
  - 页面文件: option_iscsi.php, iscsi_set_status
  - 页面文件: iscsi_manager.inc,iscsi_get_status->iscsi_service -l,iscsi_set_status->iscsi_service -s
  - 配置文件: /etc/cf/conf/iscsi.conf
  - 代码文件: sys_mngt.c
- 查询服务: ucli iscsi_service -l
- 启用服务: ucli iscsi_service -s enable
  - system (“/etc/init.d/iscsi-target stop >&/dev/null”);
  - system (“/etc/init.d/iscsi-target stop >&/dev/null”);
  - system (“/etc/init.d/iscsi-target start”);
- 禁用服务: ucli iscsi_service -s disable
  - system (“/etc/init.d/iscsi-target stop >&/dev/null”);
  - system (“/etc/init.d/iscsi-target stop >&/dev/null”);
- 网络设置后，还要做iscsi服务启动的操作。
- 保留原有流程和配置，修改/etc/init.d/iscsi-target star|stop为service target start|stop
  - 不存在/etc/init.d/target，但是可以通过service target start|stop运行，最后是
    重定向到/bin/systemctl start|stop|status target.service
  - 出现/etc/init.d/iscsi-target 都要修改
- 修改获取iscsi服务状态代码:get_iscsi_service_status
- 另外激活iscsi服务的时候还要跑vd_recover (0, NULL)
  - vd_recover(0,NULL) ->vd_recover_iscsi (vd->dg, vd->name) -> vgchange -ay 设置卷组为活动状态
  - iscsi_get_attr
  - iscsi_target_recovert

2015-03-19

★★★JavaScript 标准参考教程（alpha）
磁盘容量
- 柱面
- 磁道
- 扇区
- 磁盘容量 = 柱面 * 磁道 * 扇区 * 512 Bytes
- hdparm -I /dev/sda 查看磁盘信息
- 例子:
  [root@Ustor ~]# hdparm -I /dev/sda
  Logical max current
  cylinders 15525 15525
  heads 16 16
  sectors/track 63 63
  [root@Ustor ~]# echo 155251663512/1024/1024 |bc
  7641
  根据磁盘信息计算得到磁盘的容量为7641MB, 可以通过fdisk -l确认下。
  [root@Ustor ~]# fdisk -l /dev/sda
  Disk /dev/sda: 8012 MB, 8012390400 bytes
  255 heads, 63 sectors/track, 974 cylinders
  Units = cylinders of 16065 512 = 8225280 bytes
  Sector size (logical/physical): 512 bytes / 512 bytes
  I/O size (minimum/optimal): 512 bytes / 512 bytes
  但是发现fdisk -l得到的是8012MB, 不对啊，原来这个计算的是十进制的值！奸商！
  [root@Ustor ~]# echo 155251663512 |bc
  8012390400
  [root@Ustor ~]# echo 155251663512/1000/1000 |bc
  8012
  其实hdparm有显示了:
  [root@Ustor ~]# hdparm -I /dev/sda |grep -i “device size with M”
  device size with M = 10241024: 7641 MBytes
  device size with M = 10001000: 8012 MBytes (8 GB)
存储机器网络无法连接
- ip addr, 有ip地址，启动状态，route -n, 路由信息ok
- ping 172.16.130.1，无法ping通网关
- service network restart, 重新启动网络服务，再使用ifconfig 配置ip，以及route add
  加入路由信息，网络正常了。
- 查看内核信息 vim /dev/shm/core/messages, 发现如下信息:
  Mar 19 02:19:28 Ustor kernel: frm_size=56, mbx_size=256
  Mar 19 02:19:29 Ustor kernel: set event class to 0
  Mar 19 02:19:29 Ustor kernel: megasas_start_aen: seq_num=12737
  Mar 19 02:19:29 Ustor kernel: megasas_register_aen[6]: already registered
  Mar 19 02:19:29 Ustor kernel: megasas_register_aen ret=0
  Mar 19 03:22:42 Ustor kernel: irq 33: nobody cared (try booting with the “irqpoll” option)
  Mar 19 03:22:42 Ustor kernel: Pid: 0, comm: swapper Not tainted 2.6.32-279.el6.x86_64 #3
  Mar 19 03:22:42 Ustor kernel: Call Trace:
  Mar 19 03:22:42 Ustor kernel: [] ? __report_bad_irq+0x2b/0xa0
  Mar 19 03:22:42 Ustor kernel: [] ? note_interrupt+0x18c/0x1d0
  Mar 19 03:22:42 Ustor kernel: [] ? handle_edge_irq+0xf5/0x180
  Mar 19 03:22:42 Ustor kernel: [] ? handle_irq+0x49/0xa0
  Mar 19 03:22:42 Ustor kernel: [] ? do_IRQ+0x6c/0xf0
  Mar 19 03:22:42 Ustor kernel: [] ? ret_from_intr+0x0/0x11
  Mar 19 03:22:42 Ustor kernel: [] ? acpi_idle_enter_c1+0xa3/0xc1
  Mar 19 03:22:42 Ustor kernel: [] ? acpi_idle_enter_c1+0x82/0xc1
  Mar 19 03:22:42 Ustor kernel: [] ? cpuidle_idle_call+0xa7/0x140
  Mar 19 03:22:42 Ustor kernel: [] ? cpu_idle+0xb6/0x110
  Mar 19 03:22:42 Ustor kernel: [] ? start_secondary+0x22a/0x26d
  Mar 19 03:22:42 Ustor kernel: handlers:
  Mar 19 03:22:42 Ustor kernel: [] (e1000_msix_other+0x0/0x1f0 [e1000e])
  Mar 19 03:22:42 Ustor kernel: Disabling IRQ #33
  Mar 19 03:22:57 Ustor kernel: frm_size=56, mbx_size=256
  Mar 19 03:23:01 Ustor kernel: set event class to 0
  Mar 19 03:23:01 Ustor kernel: megasas_start_aen: seq_num=12738
- 搜查内核代码
  [dennis@localhost linux-2.6.32-279.el6]$ grep -rl e1000_msix_other ./
  ./drivers/net/e1000e/netdev.c
  [dennis@localhost linux-2.6.32-279.el6]$ grep -rl “nobody cared “ ./
  ./kernel/irq/spurious.c
  [dennis@localhost linux-2.6.32-279.el6]$ grep -rl irqpoll ./
  ./Documentation/kernel-parameters.txt
  ./Documentation/kdump/kdump.txt
  ./drivers/net/tg3.c
  ./kernel/irq/spurious.c
- Pid: 0, comm: swapper Not tainted 2.6.32-279.el6.x86_64 #3 由arch/x86/kenel/process.c的
  函数show_regs_common打印.
- eth0使用的是e1000e驱动
  [root@Ustor ~]# ethtool -i eth0
  driver: e1000e
  version: 2.1.4-NAPI
  firmware-version: 2.1-2
  bus-info: 0000:03:00.0
- ./drivers/net/e1000e/netdev.c : e1000_netpoll -> e1000_intr_msix -> e1000_msix_other
- 查看系统中断信息,可以看到33号中断是由eth0产生.
  [root@Ustor ~]# grep 33: /proc/interrupts
  33: 53026 146978 PCI-MSI-edge eth0
  [root@Ustor ~]# ls /proc/irq/34
  affinity_hint megasas node smp_affinity smp_affinity_list spurious
  [root@Ustor ~]# ls /proc/irq/33
  affinity_hint eth0 node smp_affinity smp_affinity_list spurious
  [root@Ustor ~]# ls /proc/irq/32
  affinity_hint eth0-tx-0 node smp_affinity smp_affinity_list spurious
  [root@Ustor ~]# ls /proc/irq/31
  affinity_hint eth0-rx-0 node smp_affinity smp_affinity_list spurious
  [root@Ustor ~]# ls /proc/irq/30
  affinity_hint eth3 node smp_affinity smp_affinity_list spurious
  [root@Ustor ~]# ls /proc/irq/29
  affinity_hint eth2 node smp_affinity smp_affinity_list spurious
  [root@Ustor ~]# ls /proc/irq/28
  affinity_hint eth1 node smp_affinity smp_affinity_list spurious
- http://lxr.free-electrons.com/source/Documentation/zh_CN/IRQ.txt
- Linux多核下绑定硬件中断到不同CPU（IRQ Affinity）
- http://rfyiamcool.blog.51cto.com/1030776/1335700
- http://blog.csdn.net/lucien_cc/article/details/7522618
- http://michaelkang.blog.51cto.com/1553154/1265232
- e1000e驱动源码分析
- http://zh.wikipedia.org/zh/中斷
- 出现这样的信息是不是就表示网口eth0出现问题，驱动模块出问题了？网络出问题了？
  会不会出现该问题，网络驱动模块退出了？下次记得检查完了再重新启动网络
  构建再网络上的应用就无法正常通讯了？待查！
正确用DD测试磁盘读写速度
- dd bs=1M count=256 if=/dev/zero of=test 只是得到把数据写入内存的速度。最快
- dd bs=1M count=256 if=/dev/zero of=test; sync 得到的速度同上，sync是得到速度后的操作。
- dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync 数据写入内存，加最后一次性同步到磁盘的速度。合理
- dd bs=1M count=256 if=/dev/zero of=test oflag=dsync 每次数据写入内存后，同步写磁盘。最慢

2015-03-18

关于raid
- raid0, 多块磁盘并排成一个大的磁盘使用，读写速度是单盘的n倍
- raid1，
  - 两块盘的情况下, 写性能最高只能达到一块盘的速度，读性能可以达到单盘的两倍.
  - 多盘，写速度=n/2, 假设有8块盘，a1,a2~a8, 写速度只有4*a1,(8块同样的磁盘)
  - 多盘，读速度=n, 假设有8块盘，a1,a2~a8, 读速度有8*a1,(8块同样的磁盘)
- raid5
- raid6
- raid10
- http://www.dbabeta.com/2009/io-performence-02_cache-and-raid.html
linux下的CPU、内存、IO、网络的压力测试
- CPU简单测试： echo "scale=5000; 4*a(1)" | bc -l -q
- 内存简单测试： memtester 1G 5
- IO简单测试：
  - time dd if=/dev/zero of=/tmp/test bs=1M count=4096
  - fio
  - iozone
- 网络测试工具iperf
shenwei net-snmp
- ./configure –prefix=/usr/ –host=sw_64-unknown-linux-gnu
磁盘速度就是磁盘带宽?
单个磁盘可提供的带宽有多少？
- FC
  - FC 15 krpm: 12 MB/s and 180 IOPS
  - FC 10 krpm: 10 MB/s and 140 IOPS
  - SATA2 7.2 krpm: 8 MB/s and 80 IOPS
  - ATA 7.2 krpm: 7 MB/s and 60 IOPS
  - ATA 5.4 krpm: 7 MB/s and 50 IOPS
  - EFD (flash): 100 MB/s and 2500 IOPS
- SAS:
  - Flash drives(Solid State Device): capable of around 250MB/s and 3000 IOPS
  - 15K SAS dirves: capable of around 50 MB/s and 180 IOPS
  - 10K SAS dirves: capable of around 35 MB/s and 140 IOPS
  - 7200 rpm Near-Line SAS drives: capable of around 25 MB/s and 90 IOPS
关注磁盘的两个指标：IOPS和传输带宽（吞吐量）
- 如何才算一次IO呢?
  因为系统由一个个层次模块组合而成,每个模块都有各自的接口,而在接口间流动的数据就是IO
  但是各个模块都各自为政,都有自己一次IO的定义,所以一次IO要在具体模块内才有意义.
- 总结:
  - 高传输带宽在传输大块连续数据时具有优势
  - 高IOPS在传输小块不连续的数据时具有优势
吞吐量(带宽)
- 数据吞吐量(Throughput)
- 单个磁盘可提供的带宽有多少?
- 网卡带宽: 一块千兆网卡的实际速率=1024Mb/s / 8=128MB/s
IOPS 计算
- IOPS = Input Output Per Second, 即每秒的输入输出量(读写次数)
- IOPS是指单位时间内系统能处理的I/O请求数量，一般以每秒处理的I/O请求数量为单位，I/O请求通常为读或写数据操作请求
- 与IOPS有直接关系的部分: 寻道时间tseek，旋转延迟trotation，数据传输时间transfer
  - tseek: Tseek是指将读写磁头移动至正确的磁道上所需要的时间, 一般磁盘寻道时间范围3~15ms
  - trotation:是指盘片旋转将请求数据所在扇区移至读写磁头下方所需要的时间, 对于
```
转速是7200rpm(Revolutions Per minute)的, 60*1000/7200/2=4.17ms,  
转速是15000rpm, 则60*1000/15000/2=2ms,
```
  - transfer:是指完成传输所请求的数据所需要的时间, IDE/ATA能达到133MB/s，SATA II可达到300MB/s的接口数据传输率
```
Transfer = IO Chunk Size / Max Transfer Rate  
假设IO大小为64KB,磁盘接口速率100MB/s，那么Transfer = 64KB/100MB/s = 0.000625s = 0.625ms  
由于一般IO大小不会很大(小于64KB?), 且磁盘接口速率会越来越快，所以transfer的值一般小于1
```
- IOPS=1000/(tseek+trotation+transfer)
  - 对于常用的机械硬盘, IOPS = 1000/(5ms + 2ms) = 142 (5ms取个比较小的值，2ms对应15krpm, transfer小于1可以忽略
  - 对于固态硬盘, 没有旋转，忽略了寻道，传输速率可以达到300MB/s，
    IOPS = 1000ms/(64KB/300MB/s) = 4687 (这里假设IO大小64KB, SSD接口速率为300MB/s)
  - 可见对于SSD，每秒完成的IO操作比常规硬盘快的多.
- 对于机械硬盘，对于顺序读写(即几乎不需要寻道和旋转操作的情况), IOPS会怎样呢?, 假设磁盘接口速率100MB/s
  - 4K : 1000/(4/100) = 25000
  - 16K : 1000/(16/100) = 6250
  - 32K : 1000/(32/100) = 3125
  - 64K : 1000/(64/100) = 1562
  - 可见对于小的IO大小(是否代表对应小文件?), IOPS最大
- 磁盘IOPS计算与测量
- http://dadaru.blog.51cto.com/218979/481394
- 系统性能分析工具&&一些我对磁盘IOPS的简单认识
★★★IO系统性能之一：衡量性能的几个指标
- 物理块是数据在磁盘上的存取单位，也就是每进行一次I/O操作，最小传输的数据大小
- 随机访问指的是本次IO所给出的扇区地址和上次IO给出扇区地址相差比较大，这样的话
  磁头在两次IO操作之间需要作比较大的移动动作才能重新开始读/写数据。
- IO Chunk Size
- IO系统性能之一：衡量性能的几个指标
- 硬盘驱动器
blktrace命令学习
- blktrace -d /dev/sda -o - | blkparse -i -
- btrace /dev/sda 与上同效
- blktrace /dev/sda /dev/sdb, blkparse sda sdb
- 更多参考 more blktrace
- blktrace 深度了解linux系统的IO运作
- Linux下block层的监控工具blktrace
fio工具学习
- fio是个非常强大的IO性能测试工具，可以毫不夸张的说，如果你把所有的fio参数都搞明白了，
  基本上就把IO协议栈的问题搞的差不多明白了，原因在于作者Jens Axboe是linux内核IO部分的maintainer.
- fio性能测试工具新添图形前端gfio
- linux 使用FIO测试磁盘iops
SystemTap(stap)工具学习
- 安装
  - yum install systemtap-devel systemtap-runtime kernel-devel
  - 参考 Systemtap On Fedora
- 测试 stap -ve 'probe begin { log("hello world") exit() }'
- 关于stap的中文文章, 在https://sourceware.org/systemtap/wiki页面上搜索”chinese”
- SystemTap Homepage
- wikipedia
- Linux 自检和 SystemTap
- stap example
LIO

2015-03-17

LIO 以下功能ok
- ./ucli vd_iscsi -l
- ./ucli vd_iscsi -C -d dg2 -v lv1 -s 409600
- ./ucli vd_iscsi -D -d dg2 -v lv1
- ./ucli iscsi_chap –list
- ./ucli iscsi_chap –add -d dg2 -v lv1 -u user_name -p user_passwd
- ./ucli iscsi_chap –del -d dg2 -v lv1 -u user_name
LIO 执行 service target restart 启动失败，dmesg发现如下信息:
[434707.208964] Rounding down aligned max_sectors from 4294967295 to 4294967288
[434707.209653] emulate_write_cache cannot be changed when underlying HW reports WriteCacheEnabled, ignoring request
[434707.213707] kernel_bind() failed: -98
通过netstat -apn |grep 3260 发现3260端口被ietd占用了, 把ietd进程kill掉就好了
CentOS 7 版本
- 3.10.0-123.el7.x86_64
- http://vault.centos.org/centos/7.0.1406/os/Source/SPackages/
- 内核源码: http://vault.centos.org/centos/7.0.1406/os/Source/SPackages/kernel-3.10.0-123.el7.src.rpm
- targetcli源码: http://vault.centos.org/centos/7.0.1406/os/Source/SPackages/targetcli-2.1.fb34-1.el7.src.rpm

2015-03-16

BufferedIO和DirectIO混用导致的脏页回写问题
git diff 发现出现大片文件修改，原来是file mode的问题:
- git config core.fileMode false
- git file mode
分析IET 是如何实现Type(fileio,blockio), IOMode(wr,wt), Sector(512,4096)

2015-03-13

判断当前平台的字节序：
int x = 1;
if ((char )&x == 1)
/ little endian /
else
/ big endian /
Linux内核学习——网络设备
VFS – File System – Block device
Linux内核学习——文件系统与块设备
- VFS只关心如下的结构体:
  - superblock
  - inode
  - dentry
  - file
- 块设备
  - 块设备是指可以随机读写访问，并且每次最小只能访问固定数量的数据。硬件的最
    小单位是扇区sector（通常512字节或更大）
  - 对于内核来说，块设备只是一块块sector的数据
  - request方法，request队列，reordering, bio
- 页缓存
  - 直接写磁盘，并标志页无效
  - 不光写磁盘，而且写（更新）写，这个叫write-through
  - 写页，不立即写磁盘，标记页为dirty（其实指的是磁盘上的数据为dirty），并将
    dirty的页放入dirty页list，周期性的回写至磁盘（相关的flusher内核线程来做），
    这个叫write-back
  - 一个页（page）可以缓存多个磁盘上的block，因为一个page通常为4KB，磁盘上的
    一个block可能只有512B。一个文件在页缓存上的描述用struct address_space来表示。
- Linux内核学习——文件系统与块设备
Linux的文件I/O是如何工作
- open 系统调用, open("test.file",O_WRONLY|O_APPDENT|O_SYNC))
  - O_DSYNC告诉内核，当向文件写入数据的时候，只有当数据写到了磁盘时，写入操作
    才算完成（write才返回成功）。和O_DSYNC同类的文件标志，还有O_SYNC,O_RSYNC，O_DIRECT。
  - O_SYNC比O_DSYNC更严格，不仅要求数据已经写到了磁盘，而且对应的数据文件的属
    性（例如文件长度等）也需要更新完成才算write操作成功。可见O_SYNC较之O_DSYNC要多做一些操作。
  - O_RSYNC表示文件读取时，该文件的OS cache必须已经全部flush到磁盘了
  - 如果使用O_DIRECT打开文件，则读/写操作都会跳过OS cache，直接在device（disk）
    上读/写。因为没有了OS cache，所以会O_DIRECT降低文件的顺序读写的效率
- write 系统调用, write(fd,buf,6)
- flush 系统调用, fdatasync(fd)
  - fsync和fdatasync的区别等同于O_SYNC和O_DSYNC的区别。
  - sync函数表示将文件在OS cache中的数据排入写队列，并不确认是否真的写磁盘了，所以sync并不可以靠。
- 忽略文件打开的过程，通常我们会说“写文件”有两个阶段，
  - 调用write我们称为写数据阶段（其实是受open的参数影响），
  - 调用fsync（或者fdatasync）我们称为flush阶段。
- innodb_flush_method 与 File I/O
- http://man7.org/linux/man-pages/man2/open.2.html
man 2 open,man 2 write,man 2 fork 查看系统调用函数
- 系统调用跟我学(1)

2015-03-12

关于VFS层是如何实现write-back的
- VFS 缓冲: page cache, buffer cache, inode cache, dictory cache
- fs/open.c, fs/namei.c
- sys_open() ->do_sys_open() ->do_filp_open() ->do_last() ->nameidata_to_filp() ->_dentry_open()
- fs/read_write.c, mm/filemap.c
- sys_read() ->vfs_read() ->do_sync_read() ->generic_file_aio_read() ->do_generic_file_aio_read()
- sys_write() ->vfs_write() ->do_sync_write() ->generic_file_aio_write() ->
- fs/buffer.c, fs/fs-writeback.c
- fs/sync.c, mm/filemap.c
- do_fsync() ->vfs_sync() ->vsf_fsync_range() ->filemap_write_and_wait_range()
- sync() ->vfs_sync() ->vsf_fsync_range() ->filemap_write_and_wait_range()
- 《存储技术原理分析 – 第8章文件系统》
- 细看INNODB数据落盘
- The Linux kernel’s VFS Layer
MIT牛人解说数学体系
Buffer与Cache区别
- 首先从翻译上，Buffer应该翻译为“缓冲”，Cache应该翻译为“缓存”
- 在硬件这一层看,Buffer应该为内存，Cache为CPU集成的高速缓存
- 从软件这一层来说，Buffer是块设备的缓冲，Cache是文件系统的缓存
  - Buffer(Buffer Cache)以块形式缓冲了块设备的操作，定时或手动的同步到硬盘，
    它是为了缓冲写操作然后一次性将很多改动写入硬盘，避免频繁写硬盘，提高写入效率。
  - Cache(Page Cache)以页面形式缓存了文件系统的文件，给需要使用的程序读取，
    它是为了给读操作提供缓冲，避免频繁读硬盘，提高读取效率。
- 总而言之，Buffer里面的东西是为了写到别处去，Cache里面的东西是为了给别处读
出现一个问题不解决，下次碰到这个问题也还是解决不了，还要从头开始解一遍，费时费力。
不如一次性下够功夫去解，尽量解决掉，虽然花了比较多时间，但是下次碰到问题就节约时间和精力了。
继续分析昨天的问题: ping 172.16.60.151 没有响应
- 发现从151的网口eth1都无法ping通网关172.16.60.1
- 还是搞不定 2015/03/12 15:26

2015-03-11

又出现另外一台机器ping没有响应, 172.16.60.151
- [root@localhost ~]# ip addr
  1: lo:
  mtu 16436 qdisc noqueue state UNKNOWN
```
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00  
inet 127.0.0.1/8 scope host lo  
inet6 ::1/128 scope host   
   valid_lft forever preferred_lft forever  
```
  2: eth0:
  mtu 1500 qdisc mq state UP qlen 1000
```
link/ether 00:1e:67:c9:9a:f4 brd ff:ff:ff:ff:ff:ff  
inet 172.16.60.150/24 brd 172.16.60.255 scope global eth0  
inet6 fe80::21e:67ff:fec9:9af4/64 scope link   
   valid_lft forever preferred_lft forever  
```
  3: eth1:
  mtu 1500 qdisc mq state UP qlen 1000
  link/ether 00:1e:67:c9:9a:f5 brd ff:ff:ff:ff:ff:ff inet 172.16.60.151/24 brd 172.16.60.255 scope global eth1 inet6 fe80::21e:67ff:fec9:9af5/64 scope link valid_lft forever preferred_lft forever
  [root@localhost ~]# route
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric Ref Use Iface
  172.16.60.0 255.255.255.0 U 0 0 0 eth0
  172.16.60.0 255.255.255.0 U 0 0 0 eth1
  link-local 255.255.0.0 U 1002 0 0 eth0
  link-local 255.255.0.0 U 1003 0 0 eth1
  default 172.16.60.1 0.0.0.0 UG 0 0 0 eth0
- 在想一个问题，使用tcpdump -n -e -i eth0 icmp是可以看到收到”ICMP echo request”, 但是
  没有发送ICMP echo reply, 应该是route设置问题,导致响应的数据包不知道怎么发送给
  运行ping程序的机器(172.16.50.39). 注意这里执行route得到eth0的default网关，
  现在是ping 172.16.60.151, 这个是eth1的IP，内核怎么根据路由规则操作呢?
172.16.60.121机器不能组IP访问, 学习ping,ICMP协议以及工作原理
- [root@localhost ~]# ip addr
  2: eth0:
  mtu 1500 qdisc mq state UP qlen 1000
```
link/ether 00:07:ec:01:00:79 brd ff:ff:ff:ff:ff:ff  
inet 172.16.60.121/24 scope global eth0  
inet 172.16.60.120/24 scope global secondary eth0  
```
  3: eth1:
  mtu 1500 qdisc mq state UP qlen 1000
```
link/ether 00:07:ec:01:00:7a brd ff:ff:ff:ff:ff:ff  
inet 172.16.60.122/24 scope global eth1  
inet 172.16.60.120/24 scope global secondary eth1  
```
  4: eth2:
  mtu 1500 qdisc mq state UP qlen 1000
  link/ether 00:07:ec:01:00:7b brd ff:ff:ff:ff:ff:ff inet 172.16.60.123/24 scope global eth2 inet 172.16.60.120/24 scope global secondary eth2
  5: eth3: mtu 1500 qdisc mq state UP qlen 1000
  link/ether 00:07:ec:01:00:7c brd ff:ff:ff:ff:ff:ff inet 172.16.60.124/24 scope global eth3 inet 172.16.60.120/24 scope global secondary eth3
- 从172.16.50.39 ping 60.121没有问题, ping 60.120 没有回应.
- 检查系统配置 sysctl -a |grep net, 和常用的存储机器比较，没有发现异常。
- linux系统内核参数说明
- linux-2.6.32-279.el6/include/linux/icmp.h
  - define ICMP_ECHOREPLY 0 回显回答
  - define ICMP_ECHO 8 回显请求
  - struct icmphdr, icmp报文结构.
- linux-2.6.32-279.el6/net/ipv4/icmp.c 中编码了如何操作ICMP
  - icmp_rcv函数处理进来的icmp报文, 调用函数处理icmp_pointers[icmph->type].handler(skb)
  - icmp_pointers是个函数指针结构体，根据不用的ICMP代号执行不用的处理函数
    ICMP_ECHOREPLY, 调用icmp_discard, 这是个空函数，不做实际操作
    ICMP_ECHO, 调用icmp_echo, 处理回显请求的函数
  - icmp_echo, 首先判断是否忽略全部icmp请求，如果不是就填充ICMP结构，调用icmp_reply(&icmp_param, skb);
- 目前看到大概的网络流程是这样的：
  - ip_input.c: ip_local_deliver函数 => ip_local_deliver_finish函数 => ipprot->handler()
  - af_inet.c中定义了struct net_protocol类型的icmp_protocol，其中handler指针指向icmp_rcv
  - af_inet.c中的inet_init函数 => inet_add_protocol(&icmp_protocol, IPPROTO_ICMP)
  - protocol.c中的inet_add_protocol函数把icmp_protocol加入数组inet_protos
  - 上面ip_input.c中的ipprot指针指向inet_protos数组
  - 另外只看到ipmr.c中的ip_mr_input函数调用ip_local_deliver, 而ip_mr_input函数只在
    route.c中被调用.
- ping 来自声呐脉冲(soar ping), 参考《TCP/IP详解卷2:实现》 page: 252
- 总结一下:
  - A机器:ping是用户态程序，自己构建ICMP协议封包，然后通过内核网络把数据发送给B机器.
  - B机器:内核网络接收到数据，检查是ICMP协议，就转由net/ipv4/icmp.c的icmp_rcv函数处理
```
该函数直接构建ICMP响应数据包，直接发往A机器
```
  - 所以整个ping过程是用户态跟内核态的通讯过程，其中发送请求是由用户态完成，响应是由核态完成
  - 内核态只处理icmp请求，不处理”icmp回应数据包”, 所以在ping所在的机器的内核不处理ECHOREPLY封包
LIO
- 昨天看2015-01-23的日记，说LIO块模式不支持write-back, 其实准确来说是这样的：
  - 当时有看CentOS 7的targetcli代码，这里的版本是支持write-back的. 直接运行
    命令targetcli可以看到”targetcli shell version 2.1.fb34”
  - 也有看我主机(Fedora21)的代码, 这里的版本是不支持write-back的. 运行targetcli --version
    可以看到”/usr/bin/targetcli version 2.1.fb39”
  - 所以是比较新的版本把write-back的支持去掉了.
  - 那为什么要去掉呢?难道块设备不需要write-back?不支持缓存写吗?
- ★★★问题延伸: VFS层是如何实现write-back的, block块设备又如何才能做到write-back?

2015-03-10

解一存储机器eth1消失问题，
- 现象: ifconfig和ip addr都没有eth1信息, 且网络接口灯不亮。
- 查看dmesg，发现”[ 5.508843] udev: renamed network interface eth1 to rename3”
  但是在这行之前是可以发现eth1的。检查一下udev的配置！
  [root@localhost ~]# cat /etc/udev/rules.d/70-persistent-net.rules
  SUBSYSTEM==”net”, ACTION==”add”, DRIVERS==”?“, ATTR{address}==”00:07:ec:01:00:7c”,
  ATTR{type}==”1”, KERNEL==”eth“, NAME=”eth2”
  [root@localhost ~]# dmesg |grep eth1
  [ 5.441580] igb 0000:04:00.0: eth1: (PCIe:2.5GT/s:Width x1)
  [ 5.441581] igb 0000:04:00.0: eth1: MAC: 00:07:ec:01:00:7a
  [ 5.441597] igb 0000:04:00.0: eth1: PBA No: Unknown
  [ 5.508843] udev: renamed network interface eth1 to rename3
- http://unix.stackexchange.com/questions/91085/udev-renaming-my-network-interface
- 检查udev规则，ifcfg-ethXX的配置，和系统启动时得到的MAC信息
  [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 |grep HWADDR
  HWADDR=00:07:EC:01:00:79
  [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 |grep HWADDR
  HWADDR=0c:c4:7a:0b:a3:95
  [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth2 |grep HWADDR
  HWADDR=00:07:ec:01:00:7b
  [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth3 |grep HWADDR
  HWADDR=00:07:ec:01:00:7c
  [root@localhost ~]# dmesg |grep MAC
  [ 5.409752] igb 0000:03:00.0: eth0: MAC: 00:07:ec:01:00:79
  [ 5.441581] igb 0000:04:00.0: eth1: MAC: 00:07:ec:01:00:7a
  [ 5.473946] igb 0000:05:00.0: eth2: MAC: 00:07:ec:01:00:7b
  [ 5.504549] igb 0000:06:00.0: eth3: MAC: 00:07:ec:01:00:7c
  可以看到ifcfg-eth1的配置的MAC不对，修改了应该OK.
处理MSM ip地址显示问题:
- regedit, 查找192.168.23.107, 找到修改即可。
  - 路径: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\JavaSoft\Prefs\com\/L/S/I\/Vivaldi\71244
  - 名字: PERSISTENT_REM_FW_IP
  - reg query HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\JavaSoft\Prefs\com\/L/S/I\/Vivaldi\71244 /s
  - reg query HKEY_LOCALMACHINE\SOFTWARE\Wow6432Node\JavaSoft\Prefs\com\/L/S/I\/Vivaldi\71244 /v
    /P/E/R/S/I/S/T/E/N/T/R/E/M/F/W/I/P
  - reg add HKEY_LOCALMACHINE\SOFTWARE\Wow6432Node\JavaSoft\Prefs\com\/L/S/I\/Vivaldi\71244 /v
    /P/E/R/S/I/S/T/E/N/T/R/E/M/F/W/I/P /t REG_SZ /d “” /f
- 如果使用都在本机器操作，修改为127.0.0.1即可.
- 如果清空值(Data), 就可以获取正确的本地IP
- 为什么有些机器在注册表记录了这个IP，可是MSM还是能够获取更新后的IP地址呢？

2015-03-09

处理MSM ip地址显示问题:
- 客户现场又反馈出现有问题的机器了，请客户帮忙提供远程环境(teamviewer)
- 经验证测试，证实问题存在, MSM显示的IP是192.168.23.107
- 共4个网络接口, 其中192.168.0.151为开启，其他3个禁用且没有连接网线，
- 开启3个禁用的网络，设置静态IP 172.16.110.5/6/7, 并且连接网线
- MSM显示依然是192.168.23.107
- 会不会MSM软件在哪里记录这个IP地址，在安装目录下搜索内容”192.168.23.107”，找不到匹配文件…
  可能IP地址保存的格式不是文本形式，这样就无法搜索了.
- 现场的MSM版本8.17.200
- 公司这边的服务器上有11.08.02.07版本，尝试拷贝过去看看，是不是软件有问题，新版验证OK
  可以正确获取机器IP，即随机器改变.
- 那到底是什么问题？版本问题？那为什么只有一些机器出现问题？
- 拷贝现场有问题的软件到本地运行，没有发现问题, 搜索”192.168.23.107”字符串，只在
  ./Framework/aaskfdjvuosd.dhdkhsc中找到，貌似也没有问题.
MegaRAID Storage Manager(MSM) windows 软件下载地址这里
- 最新版本 14.11.01.00
使用 netsh 命令关闭 IPv6 隧道适配器
- netsh interface teredo set state disable
- netsh interface 6to4 set state disabled
- netsh interface isatap set state disabled
- 关闭 Windows 7中的 6to4 隧道
IET设置IOMode和Sector
- ietadm –op new –tid=%d –lun=%d –params Path=/dev/%s/%s,IOMode=%s,Sector=%s,Type=%s
  - IOMode: wb or wt or ro
  - Sector: 512 or 4096
  - Type: fileio or blockio
- ietadm.c: main() -> lunit_handle() -> ietd_request() -> ietd_connect(),ietd_request_send(),ietd_response_recv()
  - ietd_connect() -> socket(AF_LOCAL, SOCK_STREAM, 0), connect()
  - ietd_request_send() -> write(fd, req, sizeof(*req))
  - ietd_request_recv() -> readv(), read()
- ietd.c: main() -> message.c: ietadm_request_listen() ->
  - ietadm_request_listen() -> socket(AF_LOCAL, SOCK_STREAM, 0), bind(), listen()

2015-03-06

查看IET代码iscsitarget-1.4.20
- grep -rin iomode ./
- opt_ignore, “iomode=%s”
- opt_ignore都会break，即忽略设置
- ietadm –op new –tid= –lun= –params
  - 调用 ietadm.c:lunit_handle函数 ->ietd_request函数 ->ietd_request_send函数
  - ietd.c的event_loop函数监听到有消息，执行message.c:ietadm_request_handle函数
  - message.c:ietadm_request_handle函数 -> ietadm_request_exec函数
    类型为C_LUNIT_NEW, 执行cops->lunit_add
  - plain.c:plain_lunit_create ->__plain_lunit_create -> ki->lunit_create
  - ctldev.c:iscsi_lunit_create -> ioctl(ctrl_fd, ADD_VOLUME, &info)
  - config.c:add_volume -> volume.c:volume_add -> volume.c:parse_volume_params
  - volume.c:parse_volume_params 函数检查发现是opt_iomode类型，
    如果是ro类型执行SetLUReadonly(volume);
    如果是wb类型执行SetLUWCache(volume);
    如果是wt类型则类型不可用
  - SetLUReadonly和SetLUWCache都在iscsi.h中定义，都只是修改struct iet_volume结构体中的flags的值。
  - 另外块大小blocksize保存在结构体struct iet_volume的blk_shift中,且块大小必须是
    2的幂次方，大于等于512，小于4096. 注意保存的是2的幂的次数，即如果是1024，则blk_shift是10
  - 如果fileio,进入file-io.c:fileio_attach函数, 调用parse_fileio_params, 这里会忽略好些参数的值.
    之用path参数会有处理。
  - 如果blockio,进入block-io.c:blockio_attach函数, 调用parse_blockio_params, 这里会忽略好些参数的值.
    之用path参数会有处理。
  - file-io.c 和 block-io.c要好好看看，这个逻辑卷到底怎么创建的.
LIO要解的几个问题:
- 怎么设置IO模式,直写或缓冲写, wb or wt (write-back or write-through)
- 怎么设置sector大小? IET可以设置为4096
- IET又是怎么做的呢？
存储服务器有两个网口绿灯不亮，且该问题网口速度千兆变百兆
- 橙色灯表示有数据传送，绿色灯表示网络连通
- ethtool eth2 查询确认可以支持千兆, 但是当前速度却是百兆
- ethtool -s eth2 speed 1000 修改无变化
- 把千兆网口的网线换到百兆的网口，现象调换了！后面询问得知实验室有两个交换机，
  一个是百兆一个是千兆，把连接百兆交换机的网线接到机器上，绿灯就灭，速度变百兆…
- 当接上百兆网线时
  - dmesg信息: igb: eth3 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
  - ethtool eth3 得到 MDI-X: on
- 当接上千兆网线时
  - dmesg信息: igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
  - ethtool eth3 得到 MDI-X: off
- http://www.anxue.net/jisuanji/Linux/2013/1021/216006.html
通过邮件发送日志信息
- 产生要邮件发送的日志时，写入指定文件，如/tmp/mail.log
- 开一个线程，一个loop，不停的读取日志文件内容，
- 执行命令cat 日志信息|mutt -s Err/Warn/Info 邮件地址
- 函数调用truncate(日志文件, 0)，清理日志信息
ietd进程是怎么启动的？
- 查询进程ietd的父进程
  [root@localhost ~]# ps axjf
  PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
```
0     1     1     1 ?       -1 Ss       0   0:00 /sbin/init  
1  2717  2717  2717 ?       -1 Ss       0   0:00 /usr/sbin/ietd  
```
  可以看到ietd的父进程是/sbin/init, 即由系统init启动的。
- 搜索可能的启动命令
  grep -rl ietd /etc/
- 可疑文件/etc/rc3.d/S88ustor
  - grep -rn ietd /etc/rc3.d/S88ustor 这个得不到有用信息
  - grep -rn iscsi /etc/rc3.d/S88ustor, 有行代码 ucli iscsi_service –apply
- 查看ucli代码sys_mngt.c, –apply执行函数iscsi_service_apply,该函数执行:
  - system (“/etc/init.d/iscsi-target stop >& /dev/null”);
  - system (“/etc/init.d/iscsi-target start”);
- 总结:
  - 系统启动, 允许级别level 3
  - 执行/etc/rc3.d/下的脚本S88ustor
  - 执行ucli iscsi_service –apply
  - /etc/init.d/iscsi-target start

2015-03-05

tar -xvf 1425279081.tar.gz var/log/message只解压一个文件
处理MSM ip地址显示问题:
- 跟客户那边沟通过了;
  环境是这样：存储机器直接连接交换机操作。
  问题是这样：客户直接修改主机的IP地址，但是MSM软件打开后显示的iP地址没有随主机IP变换。
  现在客户那边有问题的机器已出货，没有办法查看！
  客户反馈北海这边有重现问题，但跟北海电话确认，北海这边MSM软件显示的IP是会随主机IP变化的。
- MSM软件显示IP地址的方式应该跟命令ipconfig是一致的，所以如果修改了主机IP，且使用
  ipconfig命令确认生效,这时重新打开MSM软件应该得到的新的IP地址。
  下次碰到这个问题，可以使用ipconfig查询下。
- 这边有在172.16.110.121跑MSM软件，显示的IP是192.168.56.1。但是使用ipconfig查看发现56.1
  这是Ethernet adapter VirtualBox Host-Only Network的IP。这说明MSM软件取的ip跟ipconfig
  获取的顺序不一样。但是至少显示的ip在ipconfig中有发现找到。而如果北海那边设置了IP，但
  在客户这边修改的不是北海的IP，且恰好MSM取的是北海设置的IP，就可能出现客户这边怪异的现象。
- 另外修改了IP，要记得重新启动MSM软件。
- 如果设置的是静态IP，机器没有接网线，MSM得到的还是原来设置的静态IP
- 如果设置的是DHCP动态IP，机器没有接网线，MSM得到的是127.0.0.1
- 客户那边就怀疑，他们设置的IP不是北海使用的IP。总共有四个网口，客户有在网络中心去屏蔽不用的IP。
分析ietd进程退出问题，拿到诊断日志分析:
- 查看diagnosis/sys/dmesg文件, 发现如下信息:
  proc_mkdir net/iet
  iscsi_trgt: Registered io type fileio
  iscsi_trgt: Registered io type blockio
  iscsi_trgt: Registered io type nullio
  iscsi_trgt: open_path(140) Can’t open /dev/001/v1 -2
  iscsi_trgt: fileio_attach(240) -2
- 上面的警告信息格式如下:
  - iscsi_trgt是iscsi_dbg.h定义的打印信息头，
  - open_path是函数名，
  - 140表示行号，
  - 接着是警告信息，最后是错误代码
- 大概的调用过程如下
  - usr/ietd.c: main, ietd启动，执行代码cops->init(config, &isns, &isns_ac)，实际是调用plain_init
  - usr/plain.c: plain_init, 调用plain_target_init, 调用plain_target_create, plain_lunit_create
```
__plain_lunit_create调用内核接口的lunit_create, 实际是函数iscsi_lunit_create 
```
  - usr/ctldev.c: iscsi_lunit_create, 执行内核函数调用 ioctl(ctrl_fd, ADD_VOLUME, &info)，进入内核
  - kenerl/config.c: ioctl, 调用 add_volume, add_volume 执行 volume_add
  - kenerl/volume.c: volume_add, 调用 attach, 该卷类型为fileio的话，就跑到file-io.c
  - kenerl/file-io.c: fileio_attach, 调用parse_fileio_params, parse_fileio_params又调用open_path,
```
open_path判断路径指针有问题(IS_ERR)，执行eprintk("Can't open %s %d\n", path, err) 
```
- IS_ERR宏的理解, 参考linux内核中的IS_ERR
  - IS_ERR用来判断指针是否到达内核空间的最后一个page
- 上面解析了为什么出现这些警告，但是查看了代码，打开不了这些设备也不会造成进程退出，所以还需要分析。
- 还有如下信息:
  md: md0 stopped.
  md: bind
  md: bind
  - md模块属于linux内核代码的路径 linux-x.y.z/drivers/md
  - 搜索位置
    [root@localhost md]# pwd
    /home/dennis/work/kernel/linux-2.6.32-279.el6/drivers/md
    [root@localhost md]# grep -rn “md: bind” ./
    ./md.c:2056: printk(KERN_INFO “md: bind<%s>\n”, b);
    [root@localhost md]# grep -rn “md: .* stop” ./
    ./md.c:5162: printk(KERN_INFO “md: %s stopped.\n”, mdname(mddev));
  - md0 stopped的原因，可能是内核接收到STOP_ARRAY这样的ioctl命令. 也有可能是其他原因，
    但是对这个md.c代码不熟悉，先不分析了.
- 后面又输出了一行”iSCSI Target - version 1.2.1”, 说明IETD是有重新运行，可是怎么确定
  前面有异常退出ietd进程呢? 可以看到出现这行版本信息前有一行：
  “iscsi_trgt: Removing all connections, sessions and targets”
  ../kernel/target.c:318: iprintk(“Removing all connections, sessions and targets\n”);
  config.c: release函数会调用 target.c: target_del_all函数
  但是找不到哪里调用config.c: release函数, 奇怪！
- 查看日志文件 diagnosis/sys/log/ucli.log
  - 系统运行始于[2011/04/06-16:56:13]，最后的日志日期是[2015/03/04-15:25:09]
  - 搜索全部关于iscsi的信息，发现只有2011/08/04创建了dg=001,vd=v1到v14的逻辑卷
    还有[2015/03/02-17:13:40]的set iscsi disable和set iscsi enable
  - 注意到当前的日志包名:diagnosis-1425453916.tar.gz, 解析时间戳出来的是
    date -d @1425453916 输出 Wed Mar 4 15:25:16 CST 2015, 即昨天下午抓的日志。
    那么上面出现输出两次ietd版本信息的原因是有人在3月2日周一的下午五点手动禁止和启用了iscsi服务.
- 查看了下2015/03/03的邮件内容“现在发现2个问题，一台是ISCSI服务昨天下午发现自行退出；另外一台是磁盘组找不到了。”
  - 昨天下午，那就是2015/03/02的时间
邮件回复MSM软件Host IP地址问题,这边没有存储可以验证，但是提交了可能的解决方法！

2015-03-04

客户反馈IETD进程退出，这个会影响存储服务器iscsi卷的使用的。但是通常什么情况下会
造成ietd进程退出呢? 系统异常? 是程序自己退出的还是被系统杀掉的? 也许只有在查看
到相关的系统日志才能回答！
验证MSM软件Host IP修改，重新启动后又要重新输入问题.
David的阿里EMC机器上无法运行ActiveMQ,因为需要root权限.
神威上编译net-snmp
- /root/rpmbuild/SOURCES/net-snmp-5.7.2
for index in seq 1 10;do ucli vd_iscsi -C -d r5w -v vd_test${index} -s 102400 ; done
- 会出现”fail(invalid parameter)”, 不能使用vd_test, 修改为vdtest就OK

2015-03-03

神威webgui的“设置”–“网络设置”和“设置属性”显示的网络接口不匹配，如“网络设置”中
设置的只有一个eth1的IP，但是“设置属性”中显示的是“IP地址[接口0]”，应该显示“IP地址[接口1]”.
- 设置页面属于源代码/opt/html/option/option_pro.php, 调用getNicInfo获取网络信息，
  其实就是调用ucli network –conf获取IP信息
- 但是显示接口信息是和代码”._OPTIONS_NETWORK_INTERFACE.$i.”有关，注意这里使用的是序号变量i,
  也就是说，只要是只设置一个非eth0的网口，到时都会显示“IP地址[接口0]”, 这应该是一直都有的问题.
  除非，不要把“接口0”和”eth0”这两个名词对应起来就没有问题.
LIO
- [root@localhost ucli]# ./ucli vd_iscsi -l
  lvm.c, 788, get_all_lv, 65544
  dg_name vd_name vd_size type create_date sector target_name
  dg2 lv2 10240M iscsi 1970/01/01-08:00:00 iqn.2007-10.lio.com:dg2.lv2
  dg2 lv3 20480M iscsi 2015/02/17-09:55:51 512 iqn.2007-10.lio.com:dg2.lv3
  dg2 lv1 20480M iscsi 2015/02/17-09:56:11 512 iqn.2007-10.lio.com:dg2.lv1
  dg2 iscsi 100000M iscsi 2015/02/17-18:33:39 iqn.2007-10.lio.com:dg2.iscsi
- Windows服务器(110.121)使用IOMeter测试，一个卷，1M 0%read 0%random，速度70MB/s左右
- 已完成的命令如下:(创建/删除dg，创建/删除auth, 罗列iscsi或auth)
  - ./lio -A -o iscsi -d dg2 -v lv1
  - ./lio -D -o iscsi -d dg2 -v lv1
  - ./lio -A -o auth -d dg2 -v lv1 -u dennis -p 123456
  - ./lio -D -o auth -d dg2 -v lv1 -u dennis -p 123456
  - ./lio -L -o auth -d dg2 -v lv1
解读SQL 内存数据库的细节
- 内存数据库是把整个数据库放到内存中的吗？
  - 不是!
  - 内存数据库其实就是将指定的表放到内存中，而不是整个数据库；
  - 内存数据库用文件流的方式组织磁盘中的数据文件；
  - 内存数据库的数据文件分data file和delta file，而且是成对出现；
- 数据都在内存里面，那宕机或者断电了，数据不是没有了吗？
  - 不是!内存数据库通过两种方式保证数据的持久性：事务日志和chcekpoint。
- 数据在内存是怎么存放的，还是按照页的方式吗，一行的大小有限制吗？
  - 不是按照页的方式，一行的限制大小为8060Bytes
- 内存数据库号称无锁式设计，SQL是如何处理并发冲突的呢？
  - 内存数据库用行版本来处理冲突
学习数据库开发将来有哪些就业方向
- 数据库应用开发
- 数据建模专家
- 商业智能专家
- ETL开发使用
- 数据构架师
- 数据库管理员
- 数据仓库专家
- 数据仓库专家
- 性能优化工程师
- 高级数据库管理员
内存池完整实现代码及一些思考
内存数据库内核开发-内存索引实现原理

2015-03-02

C++ 应用程序性能优化，第 6 章：内存池
LIO
- 130.200 无法启动target服务，但下午17:40的时候又试一次，发现可以启动了，不明白为什么!
  [root@localhost ~]# service target status
  Redirecting to /bin/systemctl status target.service
  target.service - Restore LIO kernel target configuration
  Loaded: loaded (/usr/lib/systemd/system/target.service; disabled)
  Active: active (exited) since Fri 2015-03-13 15:41:16 CST; 1h 58min ago
  Process: 3317 ExecStart=/usr/bin/targetctl restore (code=exited, status=0/SUCCESS)
  Main PID: 3317 (code=exited, status=0/SUCCESS)
  CGroup: /system.slice/target.service
  Mar 13 15:41:16 localhost.localdomain systemd[1]: Starting Restore LIO kernel target configuration…
  Mar 13 15:41:16 localhost.localdomain systemd[1]: Started Restore LIO kernel target configuration.
  Mar 13 15:41:29 localhost.localdomain systemd[1]: Started Restore LIO kernel target configuration.
  Mar 13 17:39:38 localhost.localdomain systemd[1]: Started Restore LIO kernel target configuration.
- 从上面的信息看，该服务在15:41:16就处于active状态了

2015-02-28

Nmap扫描原理与用法
神威机器snmp问题
- mib下载失败是因为/etc/cf/conf/目录下不存在mib.tar文件
- 需要加入本公司的mib库源码重新编译, 才能得到磁盘组和各磁盘以及系统信息

2015-02-27

LIO要做计划，预计两周完成
年终最少2个月
继续研究下神威的网络问题，无果.
年后第一天上班

2015-02-17

Nim语言引发关注
今年最后一天上班, 年前不发年终奖，打酱油！！！

2015-02-16

神威机器, 现在外面又可以连接到里面了，但是会出现有时候卡的问题，而且莫名其妙
130.103IP不能访问了，只能从130.106访问.
- eth0: 172.16.130.106
- eth1: 172.16.130.103
- ping 172.16.50.39 -I eth0 成功
- ping 172.16.50.39 -I eth1 失败
- [root@yys ~]# route
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric Ref Use Iface
  default 172.16.130.1 0.0.0.0 UG 0 0 0 eth0
  default 172.16.130.1 0.0.0.0 UG 0 0 0 eth1
  172.16.130.0 255.255.255.0 U 0 0 0 eth1
  172.16.130.0 255.255.255.0 U 0 0 0 eth0
- 路由应该没有问题，因为前面正常访问两个IP的时候路由也是这样的.
- [root@yys ~]# ping 172.16.130.107 -I eth0 失败
  [root@yys ~]# ping 172.16.130.107 -I eth1 成功
  [root@yys ~]# ping 172.16.50.39 -I eth0 成功
  [root@yys ~]# ping 172.16.50.39 -I eth1 失败
  [root@yys ~]# arp -a
  ? (172.16.130.107) at 00:24:ec:20:01:61 [ether] on eth0
  ? (172.16.130.107) at 00:24:ec:20:01:61 [ether] on eth1
  ? (172.16.130.1) at 00:0f:e2:b1:c7:5d [ether] on eth0
  ? (172.16.130.1) at 00:0f:e2:b1:c7:5d [ether] on eth1
  [root@yys ~]# route
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric Ref Use Iface
  default 172.16.130.1 0.0.0.0 UG 0 0 0 eth0
  default 172.16.130.1 0.0.0.0 UG 0 0 0 eth1
  172.16.130.0 255.255.255.0 U 0 0 0 eth1
  172.16.130.0 255.255.255.0 U 0 0 0 eth0
- 为什么出现上面问题, 103 ping 107 OK, 106 ping 50.39 OK 其他失败.
- 通过ifconfig发现两者的 Bcast值不一样，eth0(106)的是172.16.130.255, eth1(103)的是0.0.0.0
- 通过ifconfig发现两者的 Bcast值不一样，eth0(106)的是172.16.130.255, eth1(103)的是0.0.0.0
  ifconfig eth1 172.16.130.103 netmask 255.255.255.0通过该命令修改成一样。
  route add default gw 172.16.130.1 eth1增加默认网关.
  现在发现，50.39都无法访问130.106和130.103, 只能50.39先访问130.107,然后130.107访问130.106或130.103
- [root@yys ~]# who
  root pts/0 2015-02-16 15:16 (172.16.130.105)
  root pts/1 2015-02-16 12:40 (172.16.130.109)
  root pts/2 2015-02-16 15:46 (172.16.130.109)
  root pts/3 2015-02-16 15:49 (172.16.130.109)
  root pts/4 2015-02-16 15:59 (172.16.50.39)
  root pts/5 2015-02-16 16:15 (172.16.50.108)
  root pts/6 2015-02-16 16:21 (172.16.130.109)
- pkill -9 -t pts2/X, 只剩下当前的连接
- ip addr 和 ifconfig没有看出问题，route命令发现default网关是eth1在前面，和这有关系吗?
  route del default gw 172.16.130.1 eth0
  route add default gw 172.16.130.1 eth0
  先删除后增加，确保eth0的default网关在前面. 这个时候可以在50.39ping通130.106, 但103还不通.
- 哎，无解了！！！
Linux中增加软路由的两种方法/删除的方法
[root@yys ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 172.16.130.1 0.0.0.0 UG 0 0 0 eth0
default 172.16.130.1 0.0.0.0 UG 0 0 0 eth1
172.16.0.0 255.255.0.0 U 0 0 0 eth0
172.16.130.0 255.255.255.0 U 1 0 0 eth1
[root@yys ~]# route del -net 172.16.0.0 netmask 255.255.0.0 dev eth0
wireshark filter:
- 按照某一IP地址范围过滤报文：ip.src >172.16.130.1&&ip.src<172.16.130.250
- http://www.yacer.cn/jishu/doc-128.html
tcpdump在神威平台上编译，alpha平台
- [root@yys tcpdump-4.6.2]# history |grep configure 查询到使用过的configure命令
  看到有用到–build=sw_64-unknown-linux-gnu参数，试一试.
- [root@yys tcpdump-4.6.2]# ./configure –build=sw_64-unknown-linux-gnu
  - configure: error: see the INSTALL doc for more info
  - 查看了下INSTALL.txt，没有安装Libpcap也会有问题
- [root@yys libpcap-1.6.2]# ./configure –build=sw_64-unknown-linux-gnu
- [root@yys libpcap-1.6.2]# make
- [root@yys libpcap-1.6.2]# make install
- [root@yys tcpdump-4.6.2]# ./configure –build=sw_64-unknown-linux-gnu 成功
- [root@yys tcpdump-4.6.2]# make
- 测试
  - 本地机器50.39 [dennis@localhost ~]$ ping 172.16.130.106
  - 问题机器130.106 [root@yys tcpdump-4.6.2]# ./tcpdump -n -e -i eth0 icmp
    09:43:38.260219 00:0f:e2:b1:c7:5d > 00:aa:00:64:10:74, ethertype IPv4 (0x0800),
    length 98: 172.16.50.39 > 172.16.130.106: ICMP echo request, id 938, seq 1, length 64
    09:43:39.259572 00:0f:e2:b1:c7:5d > 00:aa:00:64:10:74, ethertype IPv4 (0x0800),
    length 98: 172.16.50.39 > 172.16.130.106: ICMP echo request, id 938, seq 2, length 64
    09:43:40.259551 00:0f:e2:b1:c7:5d > 00:aa:00:64:10:74, ethertype IPv4 (0x0800),
    length 98: 172.16.50.39 > 172.16.130.106: ICMP echo request, id 938, seq 3, length 64
  - 发现: 只收到请求(request),没有回应数据(reply), 为什么?
- 如果只出现request，那么做如下两个检查，看看是否ping被屏蔽了.
  - 1、检查sysctl -a | grep icmp_echo，确认net.ipv4.icmp_echo_ignore_all=0
  - 2、检查iptables -vL，确认-p icmp为ACCPET
  - 上面的检查都没有发现问题，但是就是只收到request，没有发出reply!!!
  - 通过tcpdump确认Linux系统是否收到和响应ping包
- 其实想一想，上面的检查可以不用做，因为之前从130.107是可以ssh登陆到这台机器的，
  当然也是可以ping通的，如果做了屏蔽ping的配置，那么应该也会屏蔽调130.107.
  当然有可能遗漏一点，就是可能设置iptables的规则，刚好对130.107放行.
- get mac with NetBIOS, 根据之前写的代码，查询172.16.130.1/24IP段的mac地址，没有发现重复.
  - nmap -p139 192.168.50.1/24 | grep -B 3 ‘open’ | grep ‘scan’ | awk ‘{print $NF}’ | xargs -n1 ./netbios

2015-02-15

tcpdump无法在神威平台上编译，因为是alpha平台，执行./configure出现错误
- checking build system type… ./config.guess: unable to guess system type
- configure: error: cannot guess build type; you must specify one
- [root@yys ~]# uname -a
  Linux yys 3.8.0-sw2f #1 SMP Fri Jan 16 08:37:13 CST 2015 sw_64 sw_64 sw_64 GNU/Linux
- 估计是要在./configure后面增加什么参数才可以.
- tcpdump source code download page
Linux踢出其他正在SSH登陆用户
- who
- who am i
- pkill -kill -t pts/1
- pkill -9 -t pts/1
问题: webgui上设置普通模式两ip失败，无法访问
- ssh login 130.106，确认IP(130.107)设置成功
- IP信息
  [root@yys ~]# ip addr
  1: lo:
  mtu 65536 qdisc noqueue state UNKNOWN
```
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00  
inet 127.0.0.1/8 scope host lo  
inet6 ::1/128 scope host   
   valid_lft forever preferred_lft forever  
```
  2: eth0:
  mtu 1500 qdisc pfifo_fast state UP qlen 1000
```
link/ether 00:aa:00:64:10:74 brd ff:ff:ff:ff:ff:ff  
inet 172.16.130.106/24 scope global eth0  
inet6 fe80::2aa:ff:fe64:1074/64 scope link   
   valid_lft forever preferred_lft forever  
```
  3: eth1:
  mtu 1500 qdisc pfifo_fast state UP qlen 1000
  link/ether 00:aa:00:64:10:75 brd ff:ff:ff:ff:ff:ff inet 172.16.130.87/16 brd 172.16.255.255 scope global eth1 inet6 fe80::2aa:ff:fe64:1075/64 scope link valid_lft forever preferred_lft forever
  4: eth2: mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
  link/ether 00:aa:00:64:10:76 brd ff:ff:ff:ff:ff:ff
  5: eth3: mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
  link/ether 00:aa:00:64:10:77 brd ff:ff:ff:ff:ff:ff
- 从我的机器50.39无法ping通上面两个IP(130.106,130.87)
- 从另外一台同一个交换机的机器130.107，可以ping通上面两个IP(130.106,130.87)
- 路由表信息
  [root@yys ~]# route
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric Ref Use Iface
  default 172.16.130.1 0.0.0.0 UG 0 0 0 eth0
  172.16.0.0 255.255.0.0 U 0 0 0 eth1
  172.16.130.0 255.255.255.0 U 0 0 0 eth0
- 从上面的路由表信息可以看到eth0配置了default信息, eth1没有
- 下面是可以正常连线的另外一台机器的路由表信息
  [root@DCN ~]# route
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric Ref Use Iface
  172.16.130.0 255.255.255.0 U 0 0 0 eth1
  172.16.130.0 255.255.255.0 U 0 0 0 eth0
  default 172.16.130.1 0.0.0.0 UG 0 0 0 eth0
  default 172.16.130.1 0.0.0.0 UG 0 0 0 eth1
- 网络上学习IP路由工作原理
  - linux下路由表详解
- 使用命令arp -a, 后路由信息如下, 本地(50.39)还是无法ping通.
  [root@yys ~]# route
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric Ref Use Iface
  default 172.16.130.1 0.0.0.0 UG 0 0 0 eth1
  172.16.130.0 255.255.255.0 U 0 0 0 eth0
  172.16.130.0 255.255.255.0 U 1 0 0 eth1
- [root@yys ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
  DEVICE=”eth0”
  BOOTPROTO=none
  NM_CONTROLLED=”yes”
  ONBOOT=”no”
  TYPE=”Ethernet”
  UUID=”3e138003-0cbc-4bc9-b1ca-9f96cc905ebc”
  IPADDR=172.16.110.105
  PREFIX=24
  GATEWAY=172.16.110.1
  DEFROUTE=yes
  IPV4_FAILURE_FATAL=yes
  IPV6INIT=no
  NAME=”System eth0”
  HWADDR=00:AA:00:64:10:74
- 是否可以使用tcpdump看看是否外面的IP(50.39)的ping包(icmp包)可以发送到达130.106
  - 但是，现在无法在130.106机器上编译tcpdump，该机器是alpha架构。
  - 虽然无法从130.106 ping到50.39,但是反过来是通的(50.39可以ping通130.106).
  - 屏蔽ping的操作: echo 'net.ipv4.icmp_echo_ignore_all=1'>> /etc/sysctl.conf
    http://daddysgirl.blog.51cto.com/1598612/1185170
  - 屏蔽别人ping的iptables设置
  - grep -i icmp /etc/sysctl.conf, 没有发现做屏蔽ping的限制
  - iptables -L |grep -i icmp, 没有发现做屏蔽ping的限制
- 参考链接
- 查询路由信息
  - route -n
  - netstat -rn
- reboot重启系统看看, 发现重新启动后可以正常连接了，页面也可以正常访问了。
- [root@yys ~]# route -n
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric Ref Use Iface
  0.0.0.0 172.16.130.1 0.0.0.0 UG 0 0 0 eth0
  0.0.0.0 172.16.130.1 0.0.0.0 UG 0 0 0 eth1
  169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 eth1
  172.16.130.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
  172.16.130.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
- 路由表信息变化了
- 页面设置eth1ip后，访问出现问题了。。。是不是调用的命令有问题? 而且从也无法
  ping到50.39了
问题: iscsi 不支持 “SCSI-3 Persistent Reservation”功能”
- Persistent Reservation, 是持久保留的意思
- Windows Server 故障转移群集
  - 验证过程中的其中一项存储测试将验证此功能。如果某个存储解决方案并不支持 SCSI-3 (PR) 命令，
    则“故障转移群集”也将不支持它。
- 当前最新版本的LIO支持SCSI-3 Persistent Reservation
- IET、STGT不支持, http://linux-iscsi.org/wiki/Features
- 参考IET 2008年的邮件内容 http://sourceforge.net/p/iscsitarget/mailman/message/19984639/
  - Ross 说在不远的将来会支持
  - Ming Zhang 暂不支持，也还没有计划.
- 另外，可以搜索邮件列表中关于Persistent Reservation的信息
  - http://sourceforge.net/p/iscsitarget/mailman/search/?q=Persistent+Reservation&mail_list=iscsitarget-devel
  - http://sourceforge.net/p/iscsitarget/mailman/search/?q=Persistent+Reservation&mail_list=iscsitarget-commits
  - 通过在commits中搜索，发现是在2011年才有PRs
  - http://sourceforge.net/p/iscsitarget/mailman/message/28022461/ 可以看到在2011-08-31才有patch贡献出来.
  - 查看最新版svn rev 503的source code, kernel/persis.{h,c}, 发现copyright也是2011年
  - 说明Persistent Reservation应该是在2011年才加入IET
- 再看IET正式的Release版本
  - http://sourceforge.net/projects/iscsitarget/files/iscsitarget/ 最新版本1.4.20.2是2010-07-15，
    说明最新版是不支持Persistent Reservation的.
- SCSI-3 PR就是一组SCSI命令集，用在多个系统访问一个共享存储的时候协调之用。
  并且于旧的命令集不兼容。Windows 2008用SCSI-3 PR，而Windows 2003用SCSI-2。
  当Windows 2003访问SCSI-3的共享存储的时候，会发生错误，MS KB911030解释了这个bug
  - http://blog.sina.com.cn/s/blog_537d1d300100lka6.html
- SCSI锁
  - 通常来讲目前SCSI锁有两种类型：SCSI-2Reservation和SCSI-3 Reservation，这里
    SCSI-3Reservation也称之为Persistent Reservation。这两种类型的的锁是不能共存在一个Lun上的。
- SCSI访问控制原理介绍
- SCSI访问控制原理介绍
- 存储SCSI锁解读：Windows Cluster篇

2015-02-13

linux 下C语言实现读取网卡速度
为什么客户获取万兆网卡的速度是10M呢?
- [dennis@localhost net-snmp-5.4.4]$ grep -rn –include=.{h,c} ifSpeed ./
  发现一个可疑文件./agent/mibgroup/if-mib/ifXTable/ifXTable.c:1271:
  (ifHighSpeed_val_ptr) = rowreq_ctx->data.ifSpeed / 1000000;
- 打开该c文件, 可以通过OID获取”.1.3.6.1.2.1.31.1.1.1.15”
- [dennis@localhost net-snmp-5.4.4]$ snmpwalk -v2c -c public 172.16.130.90 ifspeed
- [dennis@localhost net-snmp-5.4.4]$ snmpwalk -v2c -c public 172.16.130.90 .1.3.6.1.2.1.31.1.1.1.15
  IF-MIB::ifHighSpeed.1 = Gauge32: 10
  IF-MIB::ifHighSpeed.2 = Gauge32: 0
  IF-MIB::ifHighSpeed.3 = Gauge32: 1000
  IF-MIB::ifHighSpeed.4 = Gauge32: 10
  IF-MIB::ifHighSpeed.5 = Gauge32: 10
  IF-MIB::ifHighSpeed.7 = Gauge32: 10
- [dennis@localhost net-snmp-5.4.4]$ snmpwalk -v2c -c public 172.16.130.90 ifdesc
  IF-MIB::ifDescr.1 = STRING: lo
  IF-MIB::ifDescr.2 = STRING: eth1
  IF-MIB::ifDescr.3 = STRING: eth0
  IF-MIB::ifDescr.4 = STRING: eth2
  IF-MIB::ifDescr.5 = STRING: eth3
  IF-MIB::ifDescr.7 = STRING: bond0
- 得到的eth2和eth3是10
- 在存储服务器上查询速度:
  [root@localhost ~]# ethtool eth3 |grep Speed
  Speed: 10000Mb/s
  [root@localhost ~]# ethtool eth2 |grep Speed
  Speed: 10000Mb/s
  [root@localhost ~]# ethtool eth0 |grep Speed
  Speed: 1000Mb/s
- 得到的速度是10000
- 也就是说这个值rowreq_ctx->data.ifSpeed有问题，那么哪里给它赋值的呢?
- ./agent/mibgroup/if-mib/ifTable/ifTable_defs.h:4:#define ifSpeed ifentry->speed
  注意这里做了宏替换
- 调用关系:
  - ./agent/mibgroup/host/hr_network.c:221: Interface_Scan_Init();
  - ./agent/mibgroup/if-mib/data_access/interface.c:159:
    rc = netsnmp_arch_interface_container_load(container, load_flags);
  - ./agent/mibgroup/if-mib/data_access/interface_sysctl.c:489:
    netsnmp_sysctl_get_if_speed(entry->name, &entry->speed,
- 最终:
  - s = socket(AF_INET, SOCK_DGRAM, 0)
  - ioctl(s, SIOCGIFMEDIA, (caddr_t)&ifmr)
- 还有可能是 ./agent/mibgroup/if-mib/data_acces/interface_linux.c中的函数
  netsnmp_linux_interface_get_if_speed通过 ioctl(fd, SIOCETHTOOL, &ifr) 获取的网速
- [root@localhost ~]# snmpd –version
  NET-SNMP version: 5.4.2
- 版本5.4.2是否不支持万兆?该版本是2008发行的
  - http://sourceforge.net/projects/net-snmp/files/OldFiles/net-snmp-5.4.x/5.4.2/
  - 该版本netsnmp_linux_interface_get_if_speed 函数:
    if (edata.speed != SPEED_10 && edata.speed != SPEED_100 &&
```
edata.speed != SPEED_1000) {  
DEBUGMSGTL(("mibII/interfaces", "fallback to mii for %s\n",  
            ifr.ifr_name));  
/* try MII */  
return netsnmp_linux_interface_get_if_speed_mii(fd,name);  
```
    }
- 想办法查看 5.4.2 DEBUGMSGTL输出
  - /usr/sbin/snmpd -c /etc/snmp/snmpd.conf -Le
  - 这样跑: /usr/sbin/snmpd -c /etc/snmp/snmpd.conf -d -L -DmibII/interfaces
    mibII/interfaces: SIOCGMIIREG on eth1 failed
    mibII/interfaces: No link…
    mibII/interfaces: ETHTOOL_GSET on eth0 speed = 1000
    mibII/interfaces: fallback to mii for eth2
    mibII/interfaces: SIOCGMIIPHY on eth2 failed
    mibII/interfaces: fallback to mii for eth3
    mibII/interfaces: SIOCGMIIPHY on eth3 failed
    mibII/interfaces: ETHTOOL_GSET on bond0 failed
    mibII/interfaces: Auto-negotiation disabled.
  - 下载5.7.3版本，编译，执行./agent/snmpd -c /etc/snmp/snmpd.conf -d -L -DmibII/interfaces
    mibII/interfaces: ETHTOOL_GSET on eth1 speed = 0xffffffff -> 0
    mibII/interfaces: ETHTOOL_GSET on eth0 speed = 0x3e8 -> 1000
    mibII/interfaces: ETHTOOL_GSET on eth2 speed = 0x2710 -> 10000
    mibII/interfaces: ETHTOOL_GSET on eth3 speed = 0x2710 -> 10000
    mibII/interfaces: ETHTOOL_GSET on bond0 failed (-1 / 0)
  - [dennis@localhost net-snmp-5.4.4]$ snmpwalk -v2c -c public 172.16.130.90 ifhighspeed
  - [dennis@localhost net-snmp-5.4.4]$ snmpwalk -v2c -c public 172.16.130.90 .1.3.6.1.2.1.31.1.1.1.15
    IF-MIB::ifHighSpeed.1 = Gauge32: 10
    IF-MIB::ifHighSpeed.2 = Gauge32: 0
    IF-MIB::ifHighSpeed.3 = Gauge32: 1000
    IF-MIB::ifHighSpeed.4 = Gauge32: 10000
    IF-MIB::ifHighSpeed.5 = Gauge32: 10000
    IF-MIB::ifHighSpeed.7 = Gauge32: 10
  - 可见新版本获取的是万兆速度了。
net-snmp, 获取网络速度
- snmpwalk -v2c -c public 172.16.130.90 ifSpeed
- snmpwalk -v2c -c public 172.16.130.90 .1.3.6.1.2.1.2.2.1.5
- ./agent/mibgroup/if-mib/ifTable/ifTable.c, 函数ifSpeed_get
- 使用snmp取得主机网络流量信息
linux tar 命令，报错 tar: Removing leading `/‘ from member names
- 不是错误，是tar删除了绝对路径最开始 / 而进行的提搜索示
- cd /var/home && tar -zcf aa.tar.gz cc.wav dd.wav
用shell获取目录/文件夹/文件的时间戳
- date +%s -r 目录名/文件名
Linux使用shell脚本定时备份文件（夹）并删除一定日期之前的备份和日志
- http://xiongyingqi.com/linux/2014/01/04/linux-backup-shell.html

2015-02-12

编写完iscsi问题系统信息搜集脚本, 日志目录/tmp/diagnosis/iscsi-log，这样页面运行
系统诊断时，可以直接下载日志。
★★★最牛B的编码套路
tcpdump
- 截获主机210.27.48.1 和主机210.27.48.2 或210.27.48.3的通信
  tcpdump host 210.27.48.1 and \ (210.27.48.2 or 210.27.48.3 \)
- 截获主机hostname发送的所有数据 tcpdump -i eth0 src host hostname
- 监视所有送到主机hostname的数据包 tcpdump -i eth0 dst host hostname
- http://blog.h2ero.cn/wiki/tcpdump.html
- http://www.cnblogs.com/ggjucheng/archive/2012/01/14/2322659.html

2015-02-11

tcpdump实时保存每个数据包到文件，增加-U参数
- -U使得当tcpdump在使用-w 选项时, 其文件写入与包的保存同步.(nt: 即,当每个数据
  包被保存时,它将及时被写入文件中,而不是等文件的输出缓冲已满时才真正写入此文件)
- 参考 http://blog.h2ero.cn/wiki/tcpdump.html
编写诊断(diagnosis)脚本, 参考以前写的dmon.sh
- tcpdump -U port 3260 -w /root/tcpdump.pcap 立即保存每一个数据包到文件
分析为什么出现
- EXT3-fs (sda3): orphan cleanup on readonly fs
- ext3_orphan_cleanup: truncating inode 8083 to 0 bytes
- EXT3-fs (sda3): 1 truncate cleaned up
- 查找
  [dennis@localhost ext3]$ pwd
  /home/dennis/work/kernel/linux-2.6.32-279.el6/fs/ext3
  [dennis@localhost ext3]$ grep -rn “truncating inode . to . bytes” ./
  ./super.c:1501: “%s: truncating inode %lu to %Ld bytes\n”,
  ./super.c:1503: jbd_debug(2, “truncating inode %lu to %Ld bytes\n”,
- 分析super.c的函数ext3_orphan_cleanup
- 文件系统orphan inode机制分析
  - lsof /tmp |grep delete
    gnome-she 1757 dennis 18u REG 0,35 12288 26789 /tmp/ffi8DfHM7 (deleted)
    gnome-ter 2058 dennis 18u REG 0,35 0 29398 /tmp/#29398 (deleted)
    gnome-ter 2058 dennis 20u REG 0,35 0 29400 /tmp/#29400 (deleted)
  - [dennis@localhost ext3]$ ll /tmp/ffi8DfHM7
    ls: cannot access /tmp/ffi8DfHM7: No such file or directory
  - /tmp/ffi8DfHM7文件就成为orphan文件，12288是文件的inode，称之为orphan inode
  - Orphan inode，顾名思义就是孤儿节点，是Linux ext系列文件系统结构中节点的一种，
    orphan意指的是被删除，无主的节点。
CentOS 2.6.32-279源代码下载
- http://vault.centos.org/6.3/os/Source/SPackages/
- http://vault.centos.org/6.3/os/Source/SPackages/kernel-2.6.32-279.el6.src.rpm
Linux下使用360随身WiFi, Fedora 21验证, 以下为操作步骤:
- 插入usb wifi
- lsusb
  - Bus 001 Device 007: ID 148f:760b Ralink Technology, Corp. MT7601U Wireless Adapter
- git clone https://github.com/eywalink/mt7601u.git
- yum install kernel-devel -y
- yum install dhcp -y
- vim /etc/dhcp/dhcpd.conf, 加入如下配置:
  subnet 192.168.199.0 netmask 255.255.255.0 {
  range 192.168.199.10 192.168.199.20;
  option routers 192.168.199.1;
  option domain-name-servers 8.8.8.8;
  }
- cd mt7601u
- sh miwifi_build.sh
- vim miwifi_work.sh, 修改eth0为你的机器上网的网口, 例如我这边要修改为enp2s0
  - iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
- sh miwifi_work.sh
- ifconfig 确认ra0起来,并且分配了IP
- 后期如果需要修改SSID和密码，修改/etc/Wireless/RT2870AP/RT2870AP.dat
  - ifconfig ra0 down
  - SSID=WiFi_TEST 这是wifi名
  - WPAPSK=TEST2015 这是密码
  - ifconfig ra0 up
- http://www.freemindworld.com/blog/2013/131010_360_wifi_in_linux.shtml
- http://blog.csdn.net/sumang_87/article/details/38168877
vim 本次搜索忽略大小写, /\cYOUR_SEARCH_WORDS, 或者, /YOUR_SEARCH_WORDS\c
继续DCN
- 查看dmesg
  - EXT3-fs (sda3): recovery required on readonly filesystem
  - EXT3-fs (sda3): write access will be enabled during recovery
  - kjournald starting. Commit interval 5 seconds
  - EXT3-fs (sda3): orphan cleanup on readonly fs
  - ext3_orphan_cleanup: truncating inode 8083 to 0 bytes
  - EXT3-fs (sda3): 1 truncate cleaned up
  - EXT3-fs (sda3): recovery complete
  - EXT3-fs (sda3): mounted filesystem with ordered data mode
- 查看本地测试正常的dmesg
  - EXT3-fs (sda3): recovery required on readonly filesystem
  - EXT3-fs (sda3): write access will be enabled during recovery
  - kjournald starting. Commit interval 5 seconds
  - EXT3-fs (sda3): recovery complete
  - EXT3-fs (sda3): mounted filesystem with ordered data mode

2015-02-10

feedly.com, google reader 后继版本
升级工作原理
- 找一个升级包，解压看看内容
- 升级包解压密码文件:/opt/scripts/common/sys_clear_all
关于万兆网卡网口频繁断开又连接问题
- diagnosis/logfile, 查看到bond0当前mac是eth3的， bond1的mac是eth1的
  - bond0, 192.168.10.10, 邮件里说是192.168.1.0应该, 查看了下，应该是后面又修改为普通IP的。
  - bond1, 10.24.2.190,
由于这些日志都是在存储连接正常的情况下获取的，不能从中得到有用信息，那么如何在
存储出现问题的时候自动得到一些信息呢?
- 增加wireshark抓包
- 技术支持拿回来的包是怎么抓取的?
  - 页面选择，系统->系统诊断->诊断
  - sysCheck.php call add_sys_check()
  - system_check.inc, add_sys_check() run exec_common(“diagnosis $bkpath$timestamp.’tar.gz’”)
  - /opt/scripts/common/diagnosis
  - pvs >> ${dspath}/lvm/pvs 2>&1 &
  - lvs >> ${dspath}/lvm/lvs 2>&1 &
  - ps auxf >${dspath}/sys/psfile
  - netstat -an >> ${dspath}/logfile
  - ip addr >> ${dspath}/logfile
  - netstat -anp | grep 3260 >${dspath}/iscsi/iscsi_conninfo
继续分析DCN iscsi连接问题的日志
- diagnosis/iscsi/iscsi_conninfo, 存储IP:192.168.1.101, 服务器:192.168.1.16
- diagnosis/iscsi/initiator.allow, 允许192.168.1.16访问iqn.2007-10.DCN.com:zhu1.1
- diagnosis/iscsi/session, 当前会话:
  - tid:1 name:iqn.2007-10.DCN.com:zhu1.1
  - sid:562950876233792 initiator:iqn.1991-05.com.microsoft:win-779iu2vuh5u
  - cid:1 ip:192.168.1.16 state:active hd:none dd:none tip:192.168.1.101
- diagnosis/iscsi/volume,
  - tid:1 name:iqn.2007-10.DCN.com:zhu1.1
  - lun:0 state:0 iotype:fileio iomode:wb blocks:5126866432 blocksize:4096 path:/dev/zhu1/1
- diagnosis/logfile
  - mdadm information
  - [>………………..] resync = 2.9% (86468096/2930266432) finish=1115.2min speed=42500K/sec
  - 看起来，raid正在作同步工作?
  - nestat information: 192.168.1.16正连接到192.168.1.101的3260端口
  - ip addr information: 只配置了192.168.1.101一个IP地址
- diagnosis/sys/dmesg
  - ADDRCONF(NETDEV_UP): eth0: link is not ready
  - ADDRCONF(NETDEV_UP): eth1: link is not ready
  - ADDRCONF(NETDEV_UP): eth2: link is not ready
  - ADDRCONF(NETDEV_UP): eth3: link is not ready
  - ADDRCONF(NETDEV_UP): eth4: link is not ready
  - ADDRCONF(NETDEV_UP): eth5: link is not ready
  - ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
  - 因为只有eth0是连接的，所以后面出现这个就没有问题了
- diagnosis/sys/psfile
  - 只有3个和ist相关进程,istd1,istiod1,istiod1
- diagnosis/sys/uitos-release: NCS3600_1.0.2_19769_20140218
- diagnosis/sys/log/ucli.log, 为什么会出现chap错误, 理论只有页面查看iscsi卷信息
  并且磁盘组或逻辑卷有问题才有可能出问题，涉及文件/etc/{ietd.conf,iscsi.user},
  - [2015/01/30-15:00:43][error]0x00020006iscsi chap query err[0x00020006]:
  - [2015/02/02-13:42:09][error]0x00020006iscsi chap query err[0x00020006]:
  - [2015/02/09-15:38:43][error]0x00020006iscsi chap query err[0x00020006]:

2015-02-09

学习消息队列中间件
低空领域的开发，将原来新的一轮IT发展高潮
IET是如何设置Sector大小的?
- 1.ietadm –op new –tid=%d –lun=%d –params Path=/dev/dg_name/vd_name,IOMode=%s,Sector=%s,Type=%s
- 2.iet/usr/
- 3.iet/kernel/
smsb(DCN) iscsi issue
- 查看./diagnosis/sys/log/ucli.log, 发现有”iscsi chap query err”
- chap error
  - vd_mngt.c iscsi_chap –list
  - iscsi.c iscsi_target_query_user -> find_conf_target(trgt_name) -> regex_cmp_file (ISCSI_CONF_FILE, pattern)
  - iscsi.c iscsi_target_query_user -> fopen (ISCSI_USER_FILE, “r”)
  - iscsi.h define ISCSI_CONF_FILE “/etc/ietd.conf”
  - iscsi.h define ISCSI_USER_FILE “/etc/iscsi.user”
- 查看日志目录文件 ./diagnosis/iscsi/ietd.conf ./diagnosis/iscsi/iscsi.user，
  文件存在，且有正则表达式匹配内容。
- 总结，当执行ucli vd_chap –list -d dg_name -v lv_name 时才会出现chap error。
  而且只有当在ietd.conf中找不到和target名匹配的字符串时，或打开iscsi.user失败
  时才会报错。而这个命令应该只有在页面查看iscsi卷信息时才会被调用。
- 查看出现错误时间，date +%s -d”Feb 2, 2015 13:42:09”，得到对应的时间戳1422855729
- 解压可能包含错误信息的日志包: tar xvf 1422859264.tar.gz，查看./dev/shm/core/messages
  没有发现可疑的地方。
Linux 修改物理扇区Secotr大小
- sg_format –format –size=520 /dev/sdd

2015-02-06

Linux yes命令用途, 用在交互命令中自动选择. yes默认是输出y
- for el in seq 1 10;do read -p ‘continue?[y/n]’;echo ${el}; done
- yes |for el in seq 1 10;do read -p ‘continue?[y/n]’;echo ${el}; done
- yes n |for el in seq 1 10;do read -p ‘continue?[y/n]’;echo ${el}; done
- http://blog.chinaunix.net/uid-24981550-id-3342765.html
read source code of coreutils-8.23
- basename.c
- cat.c
- cp.c
- dd.c
- yes.c
- 一个都看不懂…
[root@localhost ~]# find /sys/devices/ -name “eth“
/sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/net/eth4
/sys/devices/pci0000:00/0000:00:02.2/0000:02:00.1/net/eth5
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0/net/eth0
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.1/net/eth1
/sys/devices/pci0000:00/0000:00:1c.0/0000:07:00.0/net/eth2
/sys/devices/pci0000:00/0000:00:1c.0/0000:07:00.1/net/eth3
[root@localhost ~]# find /sys/class/ -name “eth“
/sys/class/net/eth4
/sys/class/net/eth5
/sys/class/net/eth0
/sys/class/net/eth1
/sys/class/net/eth2
/sys/class/net/eth3
其中/sys/class/net/ethX都是软链接到/sys/device中
代码审查清单
- 常规项
  - 代码能够工作么？它有没有实现预期的功能，逻辑是否正确等。
  - 所有的代码是否简单易懂？
  - 代码符合你所遵循的编程规范么？这通常包括大括号的位置，变量名和函数名，行的长度，缩进，格式和注释。
  - 是否存在多余的或是重复的代码？
  - 代码是否尽可能的模块化了？
  - 是否有可以被替换的全局变量？
  - 是否有被注释掉的代码？
  - 循环是否设置了长度和正确的终止条件？
  - 是否有可以被库函数替代的代码？
  - 是否有可以删除的日志或调试代码？
- 安全
  - 所有的数据输入是否都进行了检查（检测正确的类型，长度，格式和范围）并且进行了编码？
  - 在哪里使用了第三方工具，返回的错误是否被捕获？
  - 输出的值是否进行了检查并且编码？
  - 无效的参数值是否能够处理？
- 文档
  - 是否有注释，并且描述了代码的意图？
  - 所有的函数都有注释吗？
  - 对非常规行为和边界情况处理是否有描述？
  - 第三方库的使用和函数是否有文档？
  - 数据结构和计量单位是否进行了解释？
  - 是否有未完成的代码？如果是的话，是不是应该移除，或者用合适的标记进行标记比如‘TODO’？
- 测试
  - 代码是否可以测试？比如，不要添加太多的或是隐藏的依赖关系，不能够初始化对象，测试框架可以使用方法等。
  - 是否存在测试，它们是否可以被理解？比如，至少达到你满意的代码覆盖(code coverage)。
  - 单元测试是否真正的测试了代码是否可以完成预期的功能？
  - 是否检查了数组的“越界“错误？
  - 是否有可以被已经存在的API所替代的测试代码？

2015-02-05

开机只启动一个网口网络的脚本,路径为: /etc/rc.d/init.d/network
rc_prev=$rc
action$”Bringing up interface$i:” ./ifup$i boot
rc=$((rc+$?))
rc_cur=$rc
if[ $rc_cur -eq $rc_prev ];then
```
action$"Only start one ethernet, 2015-02-04"  
break  
```
fi
dns功能，使用dns_conf操作
- 相关命令
  - dns_conf -a 202.96.134.133 增加域名地址
  - dns_conf -d 202.96.134.133 增加域名地址
  - dns_conf -l 罗列系统域名配置
  - dns_conf -s 3 202.96.134.133 202.96.134.134 202.96.134.135 增加多个域名地址
  - dns_conf -c 清理dns配置
- 涉及代码文件sys_mngt.c
- 涉及系统配置文件/etc/resolv.conf, 其实就是对该配置文件执行读写操作
获取Linux的MAC地址
- cat /sys/class/net/eth0/address
- ip -o link show eth0 |awk ‘{ print toupper(gensub(/.link\/[^ ] ([[:alnum:]:])./,”\1”, 1)); }’
- from /etc/sysconfig/network-scripts/network-functions
修改Linux的MAC地址
- 方法一
  - ifconfig eth0 down
  - ifconfig eth0 hw ether MAC地址
  - ifconfig eth0 up
  - 上面命令加入/etc/rc.local
- 方法二
  - ./etc/sysconfig/network-scripts/ifcfg-eth0中加入下面一句话： MACADDR=00:AA:BB:CC:DD:EE
- http://linuxguest.blog.51cto.com/195664/676152
- http://linux-wiki.cn/wiki/zh-hans/Linux更改网卡物理地址(Mac_Address)

2015-02-04

LIO:
- 要罗列iscsi卷, 不能使用cache的方式，在lvm.c中的get_all_lv函数中直接调用get_all_lv_nocache()即可.
- 要先在targetcli中删除存储设备，才能删除对应的逻辑卷
- targetcli /backstores/block detele dg2lv1
- lvremove –force /dev/dg2/lv1
- 所以ucli中删除iscsi卷的顺序要调换一下
一次Linux服务器被hack的过程分析
- http://www.iuvotech.com/analysis-of-a-hack/
- http://www.iuvotech.com/analysis-of-a-hack-part-2/
linux ps 使用技巧
linux 启动过程分析
rc0-rc6目录下脚本：
K ##只要是以K开头的文件均执行stop工作
S ##只要是以S开头的文件均执行start工作
0-99 (执行次序，数字越小越先被执行)
用户自定义开机启动程序(/etc/rc.d/rc.local)
可以根据自己的需求将一些执行命令或是脚本写到/etc/rc.d/rc.local里，当开机时，就可以加载啦
[root@localhost ~]# runlevel
N 3
[root@localhost ~]# ls /etc/rc3.d/*network
/etc/rc3.d/S10network
这里https://www.centos.org/forums/viewtopic.php?t=26389说ifcfg-eth0.bak是由
kudzu[i.e. /etc/init.d/kudzu, from /etc/rc3.d/S05kudzu or /etc/rc5.d/S05kudzu]
侦测到MAC地址改变了产生的。
根据启动时出现的字符:
- Bringing up loopback interface: [ OK ]
- Bringing up interface eth2 : [ OK ]
- Bringing up interface eth3 : 这里停止不动
- 使用命令查询grep -rl "Bringing up interface" /etc/,
- 网络的启动脚本都是软链接，指向/etc/init.d/network
- [root@localhost ~]# grep -rn “Bringing up loopback interface” /etc/init.d/network
  68: action $”Bringing up loopback interface: “ ./ifup ifcfg-lo
- [root@localhost ~]# grep -rn “Bringing up interface” /etc/init.d/network
  142: action $”Bringing up interface $i: “ ./ifup $i boot
  154: action $”Bringing up interface $i: “ ./ifup $i boot
- 实际是执行命令/etc/sysconfig/network-scripts/ifup eth3 boot

2015-02-03

七牛云存储使用指南-开发者中心, 可以参考学习如何做SDK，
支持各种语言的API接口设计
centos固定多网卡启动顺序, CentOS 5.8
- dmesg中看到intel的pci网卡先于内置网卡
- cat /etc/udev/rules.d/60-net.rules
  - ACTION==”add”, SUBSYSTEM==”net”, IMPORT{program}=”/lib/udev/rename_device”
  - SUBSYSTEM==”net”, RUN+=”/etc/sysconfig/network-scripts/net.hotplug”
- 查看driver和bus-info, 使用 ethtool -i ethX
- 编辑顺序
  - DRIVER指driver: e1000e
  - ID是指bus-info:PCI ID
- vi /etc/udev/rules.d/60-net.rules
  - DRIVER==”bnx2”,ID==”0000:01:00.0”,NAME=”eth0”
  - DRIVER==”bnx2”,ID==”0000:01:00.1”,NAME=”eth1”
  - DRIVER==”e1000e”,ID==”0000:03:00.0”,NAME=”eth2”
  - DRIVER==”e1000e”,ID==”0000:03:00.1”,NAME=”eth3
- reboot
- 查看message确认启动顺序
或者直接在现场的机器解开Linux-2.6.37.6源代码，下载最新的igb驱动源代码过去直接编译
如果编译成功，就直接替换，重启看是否问题解决。
/boot/下面有一个config文件，如:config-2.6.18-308.el5, 可以考虑将config-2.6.37-6.el5
拷贝到我们下载的linux-2.6.37.6目录中重新编译。
从新编译的内核启动，无法正常启动系统。
kernel: linux-2.6.37.6, directory: drivers/net/igb/, file: igb_main.c
define DRV_VERSION "2.1.0-k2"
char igb_driver_name[] = "igb";
char igb_driver_version[] = DRV_VERSION;

cat /boot/grub/grub.conf
default=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.37.6)

root (hd0,0)  
kernel /vmlinuz-2.6.37.6 ro root=/dev/VolGroup00/LogVol00  
initrd /initrd-2.6.37.6.img

title CentOS (2.6.18-308.el5)

root (hd0,0)  
kernel /vmlinuz-2.6.18-308.el5 ro root=/dev/VolGroup00/LogVol00  
initrd /initrd-2.6.18-308.el5.img

[root@localhost linux-2.6.37.6]# ethtool -i eth0
driver: igb
version: 3.0.6-k2-2
firmware-version: 1.5-2
bus-info: 0000:06:00.0
[root@localhost linux-2.6.37.6]# uname -r
2.6.18-308.el5
[root@localhost linux-2.6.37.6]# make install
sh /usr/src/kernels/linux-2.6.37.6/arch/x86/boot/install.sh 2.6.37.6 arch/x86/boot/bzImage \
```
System.map "/boot"  
```
WARNING: No module ehci-hcd found for kernel 2.6.37.6, continuing anyway
WARNING: No module ohci-hcd found for kernel 2.6.37.6, continuing anyway
WARNING: No module uhci-hcd found for kernel 2.6.37.6, continuing anyway
WARNING: No module ahci found for kernel 2.6.37.6, continuing anyway
WARNING: No module isci found for kernel 2.6.37.6, continuing anyway
WARNING: No module ahci found for kernel 2.6.37.6, continuing anyway
WARNING: No module ehci-hcd found for kernel 2.6.37.6, continuing anyway
WARNING: No module ohci-hcd found for kernel 2.6.37.6, continuing anyway
WARNING: No module uhci-hcd found for kernel 2.6.37.6, continuing anyway
WARNING: No module usb-storage found for kernel 2.6.37.6, continuing anyway
[root@localhost linux-2.6.37.6]# make modules_install
INSTALL arch/x86/kernel/test_nx.ko
INSTALL drivers/scsi/scsi_wait_scan.ko
INSTALL net/netfilter/xt_mark.ko
DEPMOD 2.6.37.6
只是装了3个ko，是不是配置的时候没有选择好?
编译升级kernel前 [root@localhost linux-2.6.37.6]# ls /boot/
- config-2.6.18-308.el5
- initrd-2.6.18-308.el5.img
- message
- System.map-2.6.18-308.el5
- grub
- lost+found
- symvers-2.6.18-308.el5.gz
- vmlinuz-2.6.18-308.el5
编译升级kernel后 [root@localhost linux-2.6.37.6]# ls /boot/
- config-2.6.18-308.el5
- initrd-2.6.37.6.img
- symvers-2.6.18-308.el5.gz
- System.map-2.6.37.6
- vmlinuz-2.6.37.6
- grub
- lost+found
- System.map
- vmlinuz
- initrd-2.6.18-308.el5.img
- message
- System.map-2.6.18-308.el5
- vmlinuz-2.6.18-308.el5
[root@localhost linux-2.6.37.6]# md5sum /usr/src/kernels/linux-2.6.37.6/arch/x86/boot/bzImage
/boot/vmlinuz-2.6.37.6
- 82265730205fda20e75ea9f0be69f10b /usr/src/kernels/linux-2.6.37.6/arch/x86/boot/bzImage
- 82265730205fda20e75ea9f0be69f10b /boot/vmlinuz-2.6.37.6
[root@localhost linux-2.6.37.6]# make install
sh /usr/src/kernels/linux-2.6.37.6/arch/x86/boot/install.sh 2.6.37.6 arch/x86/boot/bzImage \
System.map “/boot”
- 参数
  - $1 - kernel version
  - $2 - kernel image file
  - $3 - kernel map file
  - $4 - default install path (blank if root directory)
- 脚本解析
  - cat $2 > $4/vmlinuz 释放image文件到安装目录的vmlinuz
  - cp $3 $4/System.map 拷贝map文件到安装目录的System.map
CentOS发行版本的内核源代码下载地址 http://vault.centos.org, 但是无法找到2.6.37内核的.
google 搜索 2.6.37 site:http://vault.centos.org, 找不到。
编译升级Linux kernel
- tar xvf linux-2.6.37.6.tar.gz
- cd linux-2.6.37.6
- make defconfig 产生默认的config文件
- yum install ncurses-devel
- make menuconfig 菜单式配的config
  - y 直接构建为静态模块，n 完全不构建，m 构建到内核中，成为动态模块
- make -j8 编译内核
- make modules 编译模块
- make modules_install 安装模块
- make install 安装内核
- Linux 内核手动编译升级
- http://300second.blog.51cto.com/7582/816758
- http://seanlook.com/2014/10/24/upgrade-centos6_kernel-to-3.10.x/
- http://www.kroah.com/lkn/
- http://www.lenky.info/archives/2012/05/1688

2015-02-02

SFML: Simple and Fast Multimedia Library http://www.sfml-dev.org
- yum install SFML-devel
★★★重新学习C/C++
- 希望能用C++完成下面的所有项目 https://github.com/karan/Projects
- Numbers
- Classic Algorithms
- Graph
- Data Structures
- Text
- Networking
- Classes
- Threading
- Web
- Files
- Databases
- Graphics and Multimedia
- Security
- 这个页面 https://github.com/karan/Projects-Solutions 有一些参考
何为浪费时间
- 就是本来可以直接解决的问题，非要绕着去解决。这个情况下，虽然你工作了半年，
  可能你知识的积累增长只有三四周。OMG。
如果你需要编译一个内核的驱动程序（模块），很可能你并不需要安装整个内核源代码。
也许你只需要安装 kernel-devel 这个组件。
- http://wiki.centos.org/zh/HowTos/I_need_the_Kernel_Source
kernel devel与kernel source的区别
- kernel-devel包只包含用于内核开发环境所需的内核头文件以及Makefile，而kernel-souce包含所有内核源代码
CentOS 内核升级网络无法链接问题
- 预安装内核版本:2.6.18-308.el5.x86_64
- 内核升级后版本:2.6.37-6.el5.x86_64
- 查看 http://download.appexnetworks.com/ls.do?m=availables，2.6.37对应的CentOS5.4
- 升级前网卡驱动igb, 3.0.6, 升级后, 2.0.1，
- 问题应该就出在网卡驱动上，下载个新版的驱动, 在2.6.37-6下重新编译应该没有问题
- 现在的问题是去哪下载2.6.37-6.e15.x86_64的内核树
- https://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.37.6.tar.gz
- http://vault.centos.org/
嵌入式实时OS测试工程师
- 职位名称：Linux高级测试工程师
- 岗位职责：独立承担OS子系统的测试工作。
- 岗位要求:
  - 1、博士1年以上，硕士4年以上，本科5年以上，有测试框架、测试工具开发经验；
  - 2、从事Linux、嵌入式开发、以及各类芯片、驱动测试验证者优先；主持过大型
    软件测试工作，有成功的测试方案、自动化框架输出者优先；从事大型软件的
    可靠性、性能等专项测试工作者优先；
  - 3、可以熟练使用一种或多种编程语言，例如 C/C++等，对shell、python等脚本语言了解者优先。
linux 内核升级步骤，kernel升级
Big Data技术综述
每个Linux用户都应该知道的命令行技巧
十步完全理解SQL
有哪些实用的计算机相关技能，可以在一天内学会？
- 7.做一个爬虫，可以抓取一些网页并能解析一些基本数据
- 38.学习设计模式《23个设计模式的简明教程》
我是如何在SQLServer中处理每天四亿三千万记录的
数据库
- 传统数据库: 磁盘数据库, DRDB: Disk-Resident Database
- 内存数据库: 主存数据库, MMDB:Main Memory Database
内存数据库的目标是通过使用内存实现数据存储来提高吞吐量和降低延迟。这与使用磁
盘存储的传统数据库管理系统不同。由于内部优化算法更简单，而且执行的CPU指令较少，
所以内存内数据的速度比基于磁盘的数据库快。访问内存数据可以提高响应速度。 Mich Talebzadeh
- 主流内存数据库指南 pdf
- 8种Nosql数据库系统对比
- 内存数据库中的索引技术:T树、基于缓存敏感(cacheconscious)的CSS/CSB+树，Trie-tree和Hash
  - http://blog.csdn.net/zhujunxxxxx/article/details/42490335

2015-01-31

关于出现ifcfg-ethX.bak
- 删除linux系统中eth0.bak与多余的网卡方法
- Linux系統中若更改了網卡MAC地址或者更換了網卡，同種配置機器更換了硬盤，都會出現一個.bak的網卡配置文件的備份
- http://fanli7.net/a/caozuoxitong/OS/20131015/433191.html
CentOS 5.X, 6.X rule规则改变
- CentOS 5.x, /etc/udev/rules.d/60-net.rules
- CentOS 5.x, KERNEL=="eth*", SYSFS{address}=="00:30:48:56:A6:2E", NAME="eth0"
- CentOS 6.x, /etc/udev/rules.d/70-persistent-net.rules
- CentOS 6.x, SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="54:52:00:ff:ff:dd", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
- http://www.pwrusr.com/tag/eth0eth0rm-etcudevrules-d70-persistent-net-rules
cv：显示Linux命令运行进度的命令
一件有点愚蠢的事情: 打开了/etc/udev/udev.conf中的debug模式，导致开机时一直打开udev调试消息,
无法进入操作系统。使用单用户模式也无法阻止该调试消息. 应该修改为udev_log=”info”就好了的。
目前的想法是，搞个u盘的live系统，直接修改硬盘的信息.
ustor 系统日志位置:
- /etc/cf/log/message
- /etc/cf/log/sys/XXX.tar.gz
mount -o loop XXX.iso /mnt
普通用户使用wireshark抓包
- groupadd wireshark 增加组
- usermod -a -G wireshark dennis 把当前用户加入wireshark组
- chgrp wireshark /usr/sbin/dumpcap 修改权限
- grep wireshark /etc/group 确认用户在wireshark组中
- logout or reboot, try
关于NAS和SAN，这个文章解释的很好, http://benjr.tw/286
- DAS : Application –> File system –> Disk Storage
  - 就是常用的直接存储，就是硬盘直接挂载到系统上
- NAS : Application –> Networking –> File system –> Disk Storage
  - NFS, 供Linux客户端访问的网络文件系统
  - SAMBA/CIFS, 供Windows客户端访问的网络文件系统
- SAN : Application –> File system –> Networking –> Disk Storage
  - IP-SAN: iSCSI, 在客户端应用程序访问前，需要先格式化成指定格式的文件系统.
蓝汛项目:在安装好的Centos5.8上，打上蓝汛自己的内核后，服务器的网卡识别次序发生混乱，
造成服务器无法访问。可能原因分析:
- 之前出现过，安装好的系统，突然更换主板后无法在页面设置IP，原因是系统启动时候调用
  udev来配置网络，原来的配置中存有规则，就是哪个mac地址对应哪个ethX名，通常来说，
  第一次装的系统的eth名会是eth0和eth1(针对双网口), 现在换了新主板，mac地址就变了。
  那么udev会发现eth0和eth1已经和其他mac地址绑定，所以现在只会分配eth3,eth4…
  - how-to-fix-nic-name-confusion
- 但这里出现的问题过程不是更换主板，而是替换内核
- CENTOS 5.8 对应内核版本 2.6.18-308, 参考 http://blog.csdn.net/zhongyhc/article/details/39205925
- 更换内核方法，参考 http://os.51cto.com/art/201101/242682.htm
- 关于Linux固定eth顺序，参考
- 所有这些都是只是预先的学习和参考，没有案发现场，将无法作出准确的判断。话虽如此，
  目前猜测，应该是跟udev有关, 因为eth顺序是由它来控制的。

2015-01-30

valgrind 使用
- yum install valgrind
- 编译程序: test.c gcc -Wall test.c -g -o test
- 使用Valgrind检查程序BUG: valgrind --tool=memcheck --leak-check=full ./test
- 常见问题:
  - 使用未初始化内存问题
  - 内存读写越界
  - 内存覆盖: strcpy, strncpy, memcpy, strcat 等，这些函数有一个共同的特点就是
    需要设置源地址 (src)，和目标地址(dst)，src 和 dst 指向的地址不能发生重叠，否则结果将不可预期
  - 动态内存管理错误. free要对应malloc，delete要对应new
  - 内存泄漏
- 原始链接: http://bbs.ednchina.com/BLOG_ARTICLE_1772918.HTM
为什么linux下多线程程序如此消耗虚拟内存
- valgrind –leak-check=full –track-fds=yes –log-file=./AuthServer.vlog RUN_YOUR_PROGRAM &
- strace -f -e”brk,mmap,munmap” -p $(pidof AuthServer)
- 查看和设置线程堆栈大小可用ulimit –s
- pmap $(pidof main)

2015-01-29

shell 获取当前的时间戳
- date ‘+%s’ 获取当前的时间戳
- date -d ‘1970-01-01 UTC 946684800 seconds’ +”%Y-%m-%d %T %z” 将时间戳转成日期
- date -d “-30 minute” +%Y%m%d:%H:%M 30分钟前
- da=$(date -d “yesterday” +”%Y-%m-%d”)
- http://kure6.blog.51cto.com/2398286/864929
BIT Coin https://blockchain.info/zh-cn/
LIO
- ./lio -A -o iscsi -d dg2 -v lv1
- ./lio -D -o iscsi -d dg2 -v lv1
- ./lio -A -o auth -d dg2 -v lv1 -u dennis -p 123456
- ./lio -D -o auth -d dg2 -v lv1 -u dennis -p 123456
- ./lio -L -o auth -d dg2 -v lv1
lv content was put on /dev/shm/lvs.conf
Steps for create logic volume:
- if dg not exist, do:
  - pvcreate –metadatasize 4M -ff -y /dev/md0 -M 2
  - rm -f /dev/dg2
  - vgcreate -s 2M dg2 /dev/md0
- lvcreate -p rw -L 10240M -n lv1 dg2 –alias rw –xtype iscsi –addtag @%d

2015-01-28

LIO
- implement chap deletion
- get acl info
OpenSSL加密解密文件
- 1、使用aes-128-cbc算法加密文件：
  - openssl enc -aes-128-cbc -in install.log -out enc.log
    （注：这里install.log是你想要加密的文件，enc.log是加密后的文件，回车后系统会提示你输入密码。）
- 2、解密刚才加密的文件：
  - openssl enc -d -aes-128-cbc -in enc.log -out install.log
    （注：enc.log是刚才加密的文件，install.log是解密后的文件，-d选项实现解密功能。）
- 3、加密文件后使用BASE64格式进行编码：
  - openssl enc -aes-128-cbc -in install.log -out enc.log -a
- 4、使用多种口令输入方式加密：
  - openssl enc -des-ede3-cbc -in install.log -out enc.log -pass pass:111111
使用tar与OpenSSL加密解密打包文件
- pma目录的文件夹
  - 对文件压缩加密：# tar -zcvf - pma|openssl des3 -salt -k password | dd of=pma.des3
  - 对加密文件解压：# dd if=pma.des3 |openssl des3 -d -k password|tar zxf -
Linux下的暴力密码在线破解工具Hydra详解
Hacking
- How to hack
  - Use a *nix terminal for commands
  - Secure your machine first
  - Test the target, ping
  - Determine the operation system(OS), port scan(nmap or pof)
  - Find a path or open port in the system
  - Crack the password or authentication process
  - Get super-user privileges
  - Using various tricks
  - Create a backdoor
  - Cover your tracks
- Top Sites about: Hacking Forums
- Rainbow Hash Cracking
- http://network.chinabyte.com/120/12286120_3.shtml

2015-01-27

合理的iqn “iqn.yyyy-mm.: identifier 识别代号”
network programme
LIO, 上周的问题，windows无法访问的问题没有了，变化的是存储机器130.200关机重启了.
- 增加访问权限功能
screen
- screen -list
- screen -S name
- screen -r id/name
- ctrl+a d
ucli vd_iscsi –create Fail
- lvs
- lvremove /dev/dg2/iscsi01
- [root@localhost ucli]# ./ucli vd_iscsi –create -d dg2 -v iscsi01 -s 10240M
  1139, ietadm –op new –tid=4 –lun=0 –params Path=/dev/dg2/iscsi01,IOMode=wb,Sector=512,Type=blockio
  fail(target failed to run when created)
compile IET
- uname -r 检查内核版本
- ls /usr/src/kernels/ 查询是否有内核源代码目录
- 如果没有匹配的那就yum安装, yum install -y kernel-devel
- [dennis@localhost iscsitarget-code]$ ls /usr/src/kernels/
  3.18.3-201.fc21.x86_64
  [dennis@localhost iscsitarget-code]$ uname -r
  3.18.3-201.fc21.x86_64
- [dennis@localhost iscsitarget-code]$ svn info
  Path: .
  Working Copy Root Path: /home/dennis/work/svn/iscsitarget-code
  URL: svn://svn.code.sf.net/p/iscsitarget/code/trunk
  Relative URL: ^/trunk
  Repository Root: svn://svn.code.sf.net/p/iscsitarget/code
  Repository UUID: 48a34bb2-7106-0410-bc49-8aa7273d22a1
  Revision: 503
  Node Kind: directory
  Schedule: normal
  Last Changed Author: agr1
  Last Changed Rev: 503
  Last Changed Date: 2014-06-18 05:16:41 +0800 (Wed, 18 Jun 2014)
- 在工作PC(Fedora 21)使用503版本，在3.18.3-201的内核上编译通过.
- 在目标机器CentOS7.0 编译 503 版本, 编译conn.c时出错
  [root@localhost iscsitarget-code]# uname -r
  3.10.0-123.el7.x86_64
  [root@localhost iscsitarget-code]# ls /usr/src/kernels/
  3.10.0-123.el7.x86_64
  [root@localhost iscsitarget-code]# make
  …
  /root/iet/iscsitarget-code/kernel/conn.c: In function ‘conn_info_show’:
  /root/iet/iscsitarget-code/kernel/conn.c:51:19: error: ‘struct ipv6_pinfo’ has no member named ‘daddr’
  &inet6_sk(sk)->daddr);
  …
  [root@localhost iscsitarget-code]# vim kernel/conn.c
- 该错误代码在连接信息显示函数中，可以暂时屏蔽，屏蔽后编译通过.

2015-01-26

清理公司工作PC，准备安装Fedora Workstation 21
- 使用大半年，有很多不需要的文件需要清理, 需要重新规划系统目录，删除不用保留的文件.
- 该版本(Fedora 20)使用过程中会出现弹出的窗口变黑的问题
- gvim和ibus有冲突，导致使用gvim并且开启ibus中文输入法的时候，键盘按键无效问题

2015-01-24

LIO
- linux的initiator连接成功
- windows的initiator连接失败
  - T端wireshark抓包，发现I端发送Text Command后，相隔了15ms，T端回了[FIN,ACK],
    正常应该很快就T端回ACK, 然后T端发送Text Response
  - linux的I端发送的Text Command数据包中有Options段，windwos的没有。
TCP的几个状态 (SYN, FIN, ACK, PSH, RST, URG)
- 缩写
  - SYN(synchronous建立联机)
  - ACK(acknowledgement 确认)
  - PSH(push传送)
  - FIN(finish结束)
  - RST(reset重置)
  - URG(urgent紧急)
  - Sequence number(顺序号码)
  - Acknowledge number(确认号码)
- SYN表示建立连接，
- FIN表示关闭连接，
- ACK表示响应，
- PSH表示有DATA数据传输，
- RST表示连接重置
- URG表示紧急数据
- http://rainbow702.iteye.com/blog/2007177
IET, the latest source code is here
- the current laster svn revision is r503
- Fail to compile it on CentOS 7.0.1406, kernel: 3.10.0-123.el7.x86_64. Error message:
  /iscsitarget-code/kernel/conn.c: In function ‘conn_info_show’:
  /iscsitarget-code/kernel/conn.c:51:19: error: ‘struct ipv6_pinfo’ has no member named ‘daddr’
LIO
- Not need to do ip limit for LIO
- tcpdump -n -i enp3s0 host 172.16.50.39 and port 3260
Linux Hosts
- Linode
- Aliyun
- Godaddy
Auto connect vpn by NetworkManager Command Line(nmcli)
- nmcli
- nmcli con help
- nmcli con show
- vpn configure files was put on /etc/NetworkManager/system-connections/VPN_AM
- nmcli connection up VPN_AM15
- nmcli con up uuid e923399-0b97-4dbe-8199-434343437b
- Ref

2015-01-23

LIO coding
- grep -rn ‘vd_iscsi ‘ /opt/html/*
- For IET, io_mode(wb,wt) was set by command ietadm; how to set io_mode for LIO
- [dennis@localhost lio-iscsi]$ grep -rl 'write-thru' /lib/python2.7/site-packages/
  /lib/python2.7/site-packages/targetcli/ui_backstore.pyc
  /lib/python2.7/site-packages/targetcli/ui_backstore.pyo
  /lib/python2.7/site-packages/targetcli/ui_backstore.py
  [dennis@localhost lio-iscsi]$ grep -rn ‘write-thru’ /lib/python2.7/site-packages/targetcli/ui_backstore.py
  447: wb_str = “write-thru”
  460: wb_str = “write-thru”
- vim /lib/python2.7/site-packages/targetcli/ui_backstore.py
  class UIBlockStorageObject(UIStorageObject):
```
def summary(self):  
    so = self.rtsnode  
    if so.write_back:  
        wb_str = "write-back"  
    else:  
        wb_str = "write-thru"  
```
- 检查ui_backstore.py代码class UIBlockBackstore(UIBackstore):,对于block不存在’write_back’选项.
- write_back option only effect for FILEIO
  [root@localhost src]# targetcli /backstores/fileio/ create fileio01.dat /tmp/fileio01.dat 10M write_back=False
  Created fileio fileio01.dat with size 10485760
  [root@localhost src]# ls /tmp/fileio01.dat -lh
  -rw-r–r–. 1 root root 10M Feb 3 15:01 /tmp/fileio01.dat
  [root@localhost src]# targetcli /backstores/fileio/ ls
  o- fileio ……………………………………………… [Storage Objects: 1]
  o- fileio01.dat ………….. [/tmp/fileio01.dat (10.0MiB) write-thru deactivated]
  [root@localhost src]# targetcli /backstores/fileio/ create fileio02.dat /tmp/fileio02.dat 10M write_back=True
  Created fileio fileio02.dat with size 10485760
  [root@localhost src]# targetcli /backstores/fileio/ ls
  o- fileio ……………………………………………… [Storage Objects: 2]
  o- fileio01.dat ………….. [/tmp/fileio01.dat (10.0MiB) write-thru deactivated]
  o- fileio02.dat ………….. [/tmp/fileio02.dat (10.0MiB) write-back deactivated]
- For block type storage object, how can i do wb or wt setting for it?
- Another problem, how to access initiator by specify ip address?
Install Emulex driver for Windows 2008 server R2
- double click elxdrvr-nic-10.2.478.1-5.exe
- unpacked all drivers (remember the install directory)
- after done the installation, go to device manager find the ethernet item
- update driver -> update by local directory, select the installation directory

2015-01-22

LIO coding
★★★ nfs的传输速度优化
- 设置块大小
  - time dd if=/dev/zero of=/testfs/testfile bs=8k count=1024 测试nfs写
  - time dd if=/testfs/testfile of=/dev/null bs=8k 测试nfs读
- 网络传输包的大小
  - ping -s 2048 -f hostname, try different package size
  - nfsstat －o net
  - tracepath node1/端口号
  - ifconfig eth0
  - ifconfig eth0 mtu 16436, modify MTU value
  - /proc/sys/net/ipv4/ipfrag_high_thresh和/proc/sys/net/ipv4/ipfrag_low_thresh
- nfs挂载的优化
  - timeo:如果超时，客户端等待的时间，以十分之一秒计算。
  - retrans：超时尝试的次数。
  - bg：后台挂载，很有用
  - hard：如果server端没有响应，那么客户端一直尝试挂载。
  - wsize：写块大小
  - rsize：读块大小
  - intr：可以中断不成功的挂载
  - noatime：不更新文件的inode访问时间，可以提高速度。
  - async：异步读写。
- nfsd的个数
  - ps -efl |grep nfsd
  - vi /etc/init.d/nfs, modify RPCNFSDCOUNT
  - service nfs restart
- nfsd的队列长度
  - /proc/sys/net/core/rmem_default
  - /proc/sys/net/core/rmem_max
  - /proc/sys/net/core/wmem_default
  - /proc/sys/net/core/wmem_max
  - vi /etc/sysctl.conf
- mount 192.168.1.220:/mnt/nfs /mnt/nfs_t -o nolock, rsize=1024,wsize=1024,timeo=15

2015-01-21

不用翻墙也能用google
Vim
- :%TOhtml translate to html format
- g? translate the selected content by ROT13, using u to cancel operation
安全操作系统
- BackTrack
- Kali linux
  - nmap
  - wireshark
  - John and Ripper
  - Aircrack-ng
  - metasploit
Linux check/modify MTU
- netstat -i
- cat /sys/class/net/eth0/mtu
- echo “1460” > /sys/class/net/eth0/mtu
- http://blog.csdn.net/codejoker/article/details/5997447
Linux NFS服务性能优化
- nfsstat -rc
- http://blog.chinaunix.net/uid-24404943-id-3389539.html
- NFS的队列大小, modify server and client machine: (262144 = 256KB)
  #echo 262144 > /proc/sys/net/core/rmem_default
  #echo 262144 > /proc/sys/net/core/rmem_max
  #echo 262144 > /proc/sys/net/core/wmmen_default
  #echo 262144 > /proc/sys/net/core/wmmen_max
- [root@localhost iperf-2.0.5]# dd if=/dev/zero of=/tmp/test01.dat bs=1M count=10240
  10240+0 records in
  10240+0 records out
  10737418240 bytes (11 GB) copied, 28.2585 s, 380 MB/s
- ping -s 2048 -f 192.16.110.80
- http://www.cnblogs.com/derekchen/archive/2013/01/17/2865207.html
- http://www.lichaozheng.info/2011/10/13/nfs性能优化/
- http://blog.csdn.net/anghlq/article/details/8532312
- tracepath 192.16.110.80
- [root@localhost iperf-2.0.5]# echo “4096” >/sys/class/net/enp10s0f0/mtu
- [root@localhost iperf-2.0.5]# dd if=/dev/zero of=/tmp/test04.dat bs=1M count=10240
  10737418240 bytes (11 GB) copied, 28.0778 s, 382 MB/s
linux find NFS version, using command nfsstat -m on client machine, 找到字符串”vers=4.0”
- 对比分析各个NFS版本的特点
linux NIC performance Testing:
- netperf
  - ./netserver
  - ./netperf -H 192.168.1.40
  - http://www.cppblog.com/fwxjj/archive/2013/11/22/204377.html
- iperf
  - iperf -s
  - iperf -c 192.16.110.1 -f M
  - iperf使用总结
- ./src/iperf -s
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  Server listening on TCP port 5001
  TCP window size: 85.3 KByte (default)
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  [ 4] local 192.16.110.80 port 5001 connected with 192.16.110.50 port 43341
  [ ID] Interval Transfer Bandwidth
  [ 4] 0.0-10.0 sec 9.20 GBytes 7.90 Gbits/sec
  [ 5] local 192.16.110.80 port 5001 connected with 192.16.110.50 port 43342
  [ 5] 0.0-10.0 sec 9.26 GBytes 7.95 Gbits/sec
  [ 4] local 192.16.110.80 port 5001 connected with 192.16.110.50 port 43343
  [ 4] 0.0-10.0 sec 8.98 GBytes 7.71 Gbits/sec
- ./src/iperf -c 192.16.110.80 -f M -i 2
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  Client connecting to 192.16.110.80, TCP port 5001
  TCP window size: 0.02 MByte (default)
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  [ 3] local 192.16.110.50 port 43343 connected with 192.16.110.80 port 5001
  [ ID] Interval Transfer Bandwidth
  [ 3] 0.0- 2.0 sec 1598 MBytes 799 MBytes/sec
  [ 3] 2.0- 4.0 sec 1919 MBytes 960 MBytes/sec
  [ 3] 4.0- 6.0 sec 1892 MBytes 946 MBytes/sec
  [ 3] 6.0- 8.0 sec 1894 MBytes 947 MBytes/sec
  [ 3] 8.0-10.0 sec 1893 MBytes 947 MBytes/sec
  [ 3] 0.0-10.0 sec 9196 MBytes 920 MBytes/se
- 从上面的测试看，网卡在没有做任何优化情况下可以跑到近950MB/s,已经很可以了.
- 既然问题不在磁盘(本地dd达到1.1GB/s)，也不在网卡(iperf测试达到950MB/s)，
  哪到底NFS性能低下(375MB/s)的原因是什么呢?NFS协议本身吗?
yum
- yum clean all
- yum check-update
- yum install net-tools (for command: route)
Emulex万兆网卡NAS共享性能测试(NFS), 服务器系统为: CentOS 7.0.1406
- ethtool -i enp10s0f0, version : 10.0.600.0r
- yum install nfs-utils (for command: showmount)
- dd if=/dev/zero of=/tmp/7test01.file bs=1M count=10240, result:28.7827 s, 373 MB/s
- scp be2net-10.2.470.14-1.src.rpm root@172.16.130.17:/root/
- rpm -ivh be2net-10.2.470.14-1.src.rpm
- yum install rpm-build -y
- cd rpmbuild/SPECS; rpmbuild -ba benet.kmp.spec
- scp CentOS-7.0-1406-x86_64-DVD.iso root@172.16.130.17:/root/
- mkdir /media/cdrom
- mount -t iso9660 CentOS-7.0-1406-x86_64-DVD.iso /media/cdrom
- vim /etc/yum.repos.d/CentOS-Media.repo
- mv /etc/yum.repos.d/CentOS-Base.repo{,.bak}
- yum clean all
- yum makecache
- yum install kernel-devel gcc
- rpmbuild -ba benet.kmp.spec
- lspci
- ipaddr
- ethtool -i enp10s0f0
- modinfo be2net
- scp root@172.16.130.17:/root/be2net.ko ./
- rpm -ivh rpmbuild/RPMS/x86_64/kmod-be2net-10.2.470.14-1.x86_64.rpm
- 安装完毕新的驱动(10.2.470.14-1)，dd测试的速度只有370MB/s，奇怪，不是驱动问题?
- 直接本地dd测试，速度有1.1GB/s, 所以不是raid的问题
- blockdev –getra /dev/sdb, result: 256
- blockdev –setra 16384 /dev/sdb
- blockdev –setra 16384 /dev/mapper/h55-n1
- dd测试的速度只有375MB/s
- iostat 1, get Blk_wrtn/s : 738000.00 (/dev/sdb)
  - Blk_wrtn/s: 每秒写扇区数量 (一扇区为512bytes)
  - echo 738000*512/1024/1024 |bc, result: 360, means 360MB/s
- 会不会是CPU问题，看了下在服务器做dd测试的时候，CPU几乎跑满(kworker/2:0占98%,dd约占59%)
  但查看vmstat 1，没有发现异常，cpu使用占一半
- 回头看看2014-10-17的测试, 之前测试Chelsio和intel的smb共享，速度也只有380MB/s左右.
  所以，搞不好这里375MB/s的速度也差不多了？
从四分钟到两秒——谈谈客户端性能优化的一些最佳实践
- 基本的原则是要遵循的：
  - 站在用户的角度思考问题
  - 永远不要把选择交给用户
  - 必须考虑最极端恶劣的情况
- 快系统响应的基本手法
  - 按需获取
  - 延迟加载
  - 化曲为直
  - 缓存
  - 异步
  - 归并处理
  - 视觉欺骗
    - 给出提示信息或者进度条
    - 偷偷加载
    - 简化数据
- 程序稳定性
  - 使用单元测试
  - 提供完善的日志信息
从机器学习谈起
- 什么是机器学习
  - 机器学习方法是计算机利用已有的数据(经验)，得出了某种模型(迟到的规律)，并
    利用此模型预测未来(是否迟到)的一种方法。
  - 从广义上来说，机器学习是一种能够赋予机器学习的能力以此让它完成直接编程无法完成的功能的方法。
  - 从实践的意义上来说，机器学习是一种通过利用数据，训练出模型，然后使用模型预测的一种方法。
- 机器学习的范围
  - 模式识别，统计学习，数据挖掘，计算机视觉，语音识别，自然语言处理
- 机器学习的方法
  - 回归算法: 线性回归和逻辑回归
  - 神经网络
  - SVM（支持向量机）
  - 聚类算法
  - 降维算法
  - 推荐算法
- EasyPR
- EasyPR是一个中文的开源车牌识别系统，其目标是成为一个简单、高效、准确的车牌识别引擎。
- EasyPR–一个开源的中文车牌识别系统

2015-01-20

Emulex 10G NIC Test(NAS)
- time dd if=/dev/zero of=/tmp/10GB.file bs=1M count=10240
  - 200MB/s
- time dd of=/dev/null if=/tmp/10GB.file bs=1M
  - 170MB/s
- 130.60 storage machine, ethtool -i eth2 |grep version, 10.2.470.14
- 130.17 server machine, ethtool -i eth2 |grep '^version', 4.4.161.0r
  - cat /etc/centos-release, output “CentOS release 6.4 (Final)”
  - echo "proxy=http://172.16.50.39:3128">>/etc/yum.conf, add yum proxy
  - kernel-devel.x86_64 : Development package for building kernel modules to match the kernel
  - yum install kernel-devel kernel-headers kernel -y, download size 60MB
  - yum install redhat-rpm-config
  - yum install gcc, download size 35MB
  - reference http://www.chenjunlu.com/2012/08/how-to-install-nic-driver-on-oracle-vm-server/
  - rpmbuild be2net driver
  - rpm -ivh kmod-be2net-10.2.470.14-1.x86_64.rpm
  - reboot
- 130.17 server machine
  - showmount -e 192.16.110.80
  - mount 192.16.110.80:/share/n1 /tmp
- 130.17 Centos 6.4的机器安装了新内核以及相关开发包和头文件，编译驱动包也成功，
  但是无法从新内核启动，所以达不到统一驱动版本测试的目的。作罢，换系统吧。
  好像错了，应该试一试把该系统的iso文件mount上去，然后装相关安装包，也许可以.
- [dennis@localhost ~]$ ls /mnt/dvd/Packages/ |grep ‘^kernel’
  kernel-2.6.32-358.el6.x86_64.rpm
  kernel-debug-2.6.32-358.el6.x86_64.rpm
  kernel-debug-devel-2.6.32-358.el6.x86_64.rpm
  kernel-devel-2.6.32-358.el6.x86_64.rpm
  kernel-doc-2.6.32-358.el6.noarch.rpm
  kernel-firmware-2.6.32-358.el6.noarch.rpm
  kernel-headers-2.6.32-358.el6.x86_64.rpm
172.16.50.XX ping 172.16.130.17, had not response; After route add default gw 172.16.130.1 eth2,
ssh login ok.
how to cover your tracks
what your learn from Mandiant_APT1_Report.pdf
- when you start doing hack, should register new account(random account), not
  using the real IP address which give your information away.
- tools
- scripts
iscsi mutual CHAP(iscsi双向认证)
- iSCSI CHAP认证不完全攻略

2015-01-19

使用popen执行shell命令，在pclose后出现”Broken pipe”; 原因pclose会马上关闭管道,
但shell命令在另外一个进程(fork出来的)却试图输出信息到该关闭的管道(pipe)。
- 参考 http://stackoverflow.com/questions/15564014/the-issue-when-using-popen-and-pclose
- 参考 http://blog.chinaunix.net/uid-26707720-id-3965207.html
- 另外，关于调用Linux c中targetcli出现”[Errno 32] Broken pipe”, 原因应该是python
  脚本写的问题。参考 http://stackoverflow.com/questions/14207708/ioerror-errno-32-broken-pipe-python
define ISCSI_PARAM_DEFAULT {"wb", "fileio", "512", 2, 0}, default type is fileio
the new iscsi should support fileio too.
Parsing command line options with multiple arguments

2015-01-16

请假

2015-01-15

请假

2015-01-14

请假

2015-01-13

测试人员反馈，一台存储，两台windows的服务器，测试万兆网卡直连情况下的nas性能,
问题是其中一台服务器连不上存储。经查，发现两台服务器都设置静态IP，但是IP地址
都设置成了192.16.110.60,导致其中一台使用ipconfig查到的ip有问题，所以无法连通.
targetcli删除iqn前不用删除luns, targetcli /iscsi delete WWN
linux find geographic location of an ip address
- curl ipinfo.io/IP_ADDRESS
- 另外一种方法，安装geoip, yum install geoip, 如果继续安装GeoIP数据库(MaxMind)
  将可以查到更新详细的地址信息。
  - geoiplookup 23.66.166.151
  - geoiplookup -f /usr/share/GeoIP/GeoLiteCity.dat 23.66.166.151
- bash script
  #!/bin/bash
  !!!NOTICE: run the script with root
  check and install geoip
  which geoiplookup ||yum install geoip
  download and install MaxMind database
  wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
  wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
  wget http://download.maxmind.com/download/geoip/database/asnum/GeoIPASNum.dat.gz
  gunzip GeoIP.dat.gz
  gunzip GeoIPASNum.dat.gz
  gunzip GeoLiteCity.dat.gz
  rm -f GeoIP.dat.gz GeoIPASNum.dat.gz GeoLiteCity.dat.gz
  mkdir -p /usr/share/GeoIP/
  mv GeoIP.dat GeoIPASNum.dat GeoLiteCity.dat /usr/share/GeoIP/
- 根据经纬度信息，在地图上查看具体位置信息.
  - 使用拾取坐标系统 http://api.map.baidu.com/lbsapi/getpoint/index.html
  - 注意上面的坐标反查系统，使用的经纬度跟geoiplookup的是相反的!
    比如得到的是”25.107854,121.549153”, 要使用”121.549153,25.107854”去查
- http://xmodulo.com/geographic-location-ip-address-command-line.html

2015-01-12

上传be2netrpm包和源码到svn
rpm包如何制作
- steps
  - rpm -q rpm-build, 为检查是否安装rpm-build
  - 创建spec文件
  - 将 XXX.tar.gz 文件复制到 SOURCES
  - rpmbuild -ba SPECS/hellorpm.spec
- 运行 rpmbuild -ba filename.spec 时，RPM 都做些什么
  - 读取并解析 filename.spec 文件
  - 运行 %prep 部分来将源代码解包到一个临时目录，并应用所有的补丁程序。
  - 运行 %build 部分来编译代码。
  - 运行 %install 部分将代码安装到构建机器的目录中。
  - 读取 %files 部分的文件列表，收集文件并创建二进制和源 RPM 文件。
  - 运行 %clean 部分来除去临时构建目录
- Reference
查看RPM包里的内容
- 如果只相知道包里的文件列表执行： rpm -qpl packetname
- 如果想要导出包里的内容，而不是安装，那么执行： rpm2cpio pkgname | cpio -iv
- http://blog.csdn.net/yetyongjin/article/details/6735165
Netfilter/Iptables入门
- iptables -t filter -L
- iptables -t nat -L
- iptables -t mangle -L
- iptables -t raw -L
Netfilter example:
- 禁止内网用户访问http://www.website.com
  #iptables –A FORWARD –p tcp –I eth1 –o eth0 –d www.website.com –j DROP
- 匹配目的端口
  #iptables –A FORWARD –i eth1 –o eth0 –p tcp –s 192.168.11.0/24 –dport 21:22 –j REJECT
- Mac地址匹配
  #iptables –A INPUT –p tcp –dport 1433 –m mac –mac-source 00:0c:29:53:ab:60 –j ACCEPT
- IP地址范围匹配
  #iptables –A INPUT –m iprange –src-range 192.168.11.110-192.168.11.150 –j DROP
- 对封包内承载内容进行匹配
  #iptables –A FORWARD –I eth0 –o eth1 –p tcp –d 10.0.0.1 –dport 80 –m string –algo bm –string “system32” –j DROP
Netfilter这三种封包类型
- INPUT
- OUTPUT
- FORWARD
iptables实战
- Iptables语法
  Iptables [-t table] command [chain] [rules] [-j target]
- 禁止192.168.11.100主机ping本机
  #iptables –A INPUT –p icmp –s 192.168.11.100 –j DROP
- 允许192.168.11.100主机可以ssh登录本机
  #iptables –A INPUT –p tcp s0 192.168.11.100 –dport 22 –j ACCEPT

2015-01-09

使用VPN会怎样泻露个人信息
- 论坛登陆时，第一次注册没有使用VPN
- 侦察人员与VPN提供商合作
- …
信息安全:
- http://www.zhihu.com/topic/19561983
- 零基础如何学习 Web 安全: http://www.zhihu.com/question/21606800
- 安全研究员是从哪学到知识 http://www.zhihu.com/question/23073812
- http://www.zhihu.com/question/21680381
暂时汇总:
- 当前已经在存储上看了以下相关信息:
  - 磁盘组，逻辑卷是否有问题
  - 网络(netstat)是否有问题
  - IET是否有明显报错?
- 当前有疑点的信息:
  - 网口 Down/Up
  - 存储出现大量的logout, 为什么?
  - vms_restart.sh的内容是什么
  - 为什么网口速度从1000变为100?什么情况下会发生?
grep -rin error ./搜索错误信息，没有发现明显问题
ifconfig和ip addr add区别:
- 添加地址可以通过两个用户空间程序搞定，一个是ifconfig，另一个是ip addr add，
- ifconfig是基于ioctl进行地址添加的，而ip程序是基于netlink进行地址添加的
- 为何用ip addr add添加的ip地址用ifconfig看不到，而ifconfig设置的地址ip addr show却是可以看到。
  - 取的是这个被找到的ifa的ip地址，而我们知道，所有的ifa链接成一个线性链表，
    那么找到了第一个就不会再往后走了，因此只能得到一个结果，就是链表最前面的那个.
  - ip add show就不同了，具体在函数inet_dump_ifaddr中实现，该函数遍历所有的ifa，并且传到用户空间缓冲区。
- http://blog.csdn.net/dog250/article/details/5303542/
ls ./ |sort -k5 -nr 当前目录文件按大小排序显示
Advanced Bash-Scripting Guide http://www.tldp.org/LDP/abs/abs-guide.pdf
Revision History:
- Revision 6.5 05 Apr 2012 Revised by: mc ‘TUNGSTENBERRY’ release
- Revision 6.6 27 Nov 2012 Revised by: mc ‘YTTERBIUMBERRY’ release
- Revision 10 10 Mar 2014 Revised by: mc ‘PUBLICDOMAIN’ releas
存储sys/message.var, 出现”ietd: Received close MSG” 以及 “kernel: Received logout cmnd:”
- 172.4.35.1 :
  - 12:02 出现, 直到20:33
  - 12:08 至 19:23 频繁出现eth0 Down/Up
- 172.4.35.4 :
  - 20:30出现, log也只有这个时间
- 172.4.35.13 :
  - 19:05出现, 直到19:30
- 172.4.35.141:
  - 21:03出现, log也只有这个时间
发现存储的message中出现logout都是在20:33，而此时服务器会跑vms_restart.sh
- 173.4.35.7
  - Jan 8 20:46:57 出现多个盘的大量I/O error,
- 173.4.35.9
  - Jan 8 12:30:37 出现针对sdbu的大量I/O error,
- 173.4.35.10
  - Jan 8 12:30:35 至 20:56 出现针对sdbu的大量I/O error,
- 173.4.35.11
  - Jan 8 12:30:30 至 20:57 出现针对sdbu的大量I/O error,
- 173.4.35.12
  - Jan 8 12:30:30 至 20:57 出现针对sdbu的大量I/O error,
什么时候存储会dmesg会出现下面的信息:
kernel: Received logout cmnd: Target 34 Initiator: iqn.1996-04.de.suse:01:4ee4974b0d
kernel: Received logout cmnd: Target 6 Initiator: iqn.1996-04.de.suse:01:4ee4974b0dc
…
ietd: conn 0 session 6755400921448960lx target 17, state 0
ietd: Received close MSG: conn 0 session 6755400921448960lx target 17
…
kernel: Received logout cmnd: Target 3 Initiator: iqn.1996-04.de.suse:01:67117f255fac
ietd: Received close MSG: conn 0 session 32651101479961088lx target 38
现场存储拼命出现网口eth0 down/up, 对存储有什么样影响吗?
Jan 8 12:08:32 SV1600A kernel: eth0: Link is Down
Jan 8 12:08:35 SV1600A kernel: eth0: Link is Up 1000 Mbps Full Duplex, Flow Control: None
…
Jan 8 19:22:27 SV1600A kernel: eth0: Link is Down
Jan 8 19:22:29 SV1600A kernel: eth0: Link is Up 1000 Mbps Full Duplex, Flow Control: None
Jan 8 19:23:09 SV1600A kernel: eth0: Link is Down
Jan 8 19:23:24 SV1600A kernel: eth0: Link is Up 1000 Mbps Full Duplex, Flow Control: None
客户现场昨天(2015-01-08)又出现无法录像问题.

2015-01-08

TODO:
- 完成季度考核
- Emulex驱动加入NAS分支
知道创宇研发技能表v2.2
网络基本功系列：细说网络那些事儿(1月8日更新)
cat test.md |tr -s '[:space:]' '\n' |tr '[:upper:]' '[:lower:]' |sort |uniq -c| sort -nr |head -10
找出文档中出现多次的单词的前10个
- tr命令 translate, 主要用于替换、删除，可视为简单版的sed
- http://blog.chinaunix.net/uid-9525959-id-2001634.html
从应用层来讲是如何给scsi设备发送命令的.sg_inq实际上触发的是ioctl的系统调用,经
过几次辗转反侧,最终sd_ioctl会被调用.而sd_ioctl会调用scsi核心层提供的函数,sg_io,
最终走的路线依然是blk_execute_rq,而关于这个函数最终如何与usb-storage牵上手的,
我们在block层那边对scsi命令进行分析时已经详细的介绍过了.
- http://dato0123.iteye.com/blog/1259924
- http://blog.csdn.net/fudan_abc/article/details/6966911
测试5，单ip情况
- 结论: 存储不设置组IP，使用单ip连接
  - [try twice]单台服务器不断开连接的情况下，重启存储，服务器discovery存储IP ok
  - [try twice]两台服务器不断开连接的情况下，重启存储，两台服务器discovery存储IP Fail
    但是，discovery 另外一个ip(eth1)是ok的
测试4，组ip情况
- 结论: 存储设置组IP
  - 单台服务器不断开连接的情况下，重启存储，服务器discovery存储组IP ok
    但是，使用netstat -apn |grep 3260发现实际连接的不是组IP，是eth0和eth1网口
  - 两台服务器不断开连接的情况下，重启存储，两台服务器discovery存储组IP Fail
测试3，测试单ip情况
- 1.storage ip: eth0:132, eth1:133
- 2.server 211 connection 132, and iscsiadm login; then reboot 132
- 3.after 132 done the reboot, test discovery on 211, it is ok
- 4.repeat 4 times of step 2 and step 3, test ok
- 5.add server 212 and 213 to test, login 132, reboot 132, discovery fail
- 6.reboot 132, server 211,212,213 cann’t discovery 132.
- 7.看起来，单个IP在多个机器连接情况下，也无法正常discovery.在确认一次.
- 8.211,212 and 213 全部iscsiadm logout, 删node, 重启132, 211、212和213 discovery、login正常
- 9.重启132, 211、212和213 discovery失败
- 结论: 存储不设置组IP，使用单ip连接
  - 单台服务器不断开连接的情况下，重启存储，服务器discovery存储 ok
  - 多台服务器不断开连接的情况下，重启存储，服务器discovery存储 fail
  - 多台服务器在断开连接的情况下，重启存储，服务器discovery存储 ok
测试2
- 130.211 执行命令iscsiadm -m node -U all; iscsiadm -m node -o delete;
- 确认全部login到130.107服务器都断开, reboot 130.107
- 130.211 执行命令lsscsi; iscsiadm -m discovery -t st -p 172.16.130.107;
- 结果: discovery ok
- 可是明明使用107来连接的，为什么使用netstat -apn |grep 3260查到的是108地址
  这跟最早是使用108来连接有关吗?
测试1
- 存储机器，组IP 172.16.130.107, eth0:172.16.130.108, eth1:172.16.130.109
- suse机器172.16.130.211, 使用iscsiadm -m discovery -t st -p 172.16.130.108; iscsiadm -m node -L all
- 130.107 reboot
- 130.211执行命令iscsiadm -m discovery -t st -p 172.16.130.108;
- 结果: discovery ok

2015-01-07

for i in {1..9} ; do ucli iscsi_access -A -d j0 -v j0test0$i -i 172.16.130.211 ; done
解压密码文件:/opt/scripts/common/sys_clear_all
是否升级，需要回答以下三个问题:
- 当前是什么原因造成问题，之前使用了两三年为什么没有问题.
- 升级会不会造成数据丢失、之前的配置是否有影响(比如磁盘组和虚拟盘会不会丢失)
- 升级后是不是还会出现同样的问题
http://172.19.1.142/trac

2015-01-06

编写c代码对JSON文件解析, 提取针对iqn的lun信息, targetcli做iscsi删除先删除lun，再删iqn.
Linux检查某个机器的某个端口是否开放, 使用ping是无法做到的.
- telnet 172.16.50.39 3128
- nmap 172.16.50.39 -p 3128
squid代理
- squid -CNd1
- windows
  - IE -> tools -> Connection -> Lan Setting -> 输入IP和端口3128，确认关闭
  - http://linux.vbird.org/linux_server/0420squid.php#client_browser
- Linux
- 通过关闭防火墙，解决连接问题
  - 对于Fedora 20, 无法通过service iptables stop/start/status来控制防火墙
  - systemctl stop firewalld 关闭防火墙
  - systemctl status firewalld 确认防火墙关闭状态
  - 参考 http://stackoverflow.com/questions/24756240/how-can-i-use-iptables-on-centos-7
- 通过设置iptable规则，解决连接问题
  - iptables -L INPUT --line-number 查看规则
  - iptables -I INPUT 7 -p tcp --dport 3128 -j ACCEPT 在REJECT(这里是7号)
  - iptables -D INPUT -p tcp --dport 3128 -j ACCEPT 删除规则
    规则前加入放行3128端口的连接
- http://linux.vbird.org/linux_server/0420squid.php
- http://home.arcor.de/pangj/squid/chap04.html
- http://viong.blog.51cto.com/844766/280978
wireshark 过滤指定端口数据规则: tcp.port eq 3128
测试一
- 1.全部suse(IP:210~217)使用iscsiadm退出登陆
  - iscsiadm -m node -U all;iscsiadm -m node -o delete
  - 在suse上使用lsscsi确认都没有scsi卷了
- 2.重启130.107, 确认没有suse机器连入
  - netstat -apn |grep 3260 确认没有其他ip连入，只有107,108和109的监听
- 3.suse 210~214共5台机器登陆130.107
  - iscsiadm -m discovery -t st -p 172.16.130.107;iscsiadm -m node -L all
  - lsscsi 确认有scsi卷
- 4.在130.107上确认有210~214登陆。然后重启130.107
  - grep -c tid /proc/net/iet/volume 返回40. 40个卷
  - netstat -apn |grep 3260 |wc -l, 返回203, 正确了. 40个卷, 5个机器，200个连接，3个监听
- 5.等130.107启动完毕后,查看210~214是否正常discovery 130.107?
  - netstat -apn |grep 3260 发现连入107的连接都有问题，而且多了108和109的连接
  - 虽然210~214都无法discovery 130.107, 但是对于108和109是没有问题的，这说明
    只有组ip出现问题.而且210~214上使用lsscsi也可以看到scsi卷,说明即使discovery
    不了107，也不会影响业务，但是如果这时全部断开连接，重新使用discovery连接
    就影响业务了.
  - 虽然组ip 130.107的discovery有问题，可是ssh登陆并没有受到影响，为什么?
  - 既然组ip有问题，是否可以改用绑定呢？

2015-01-05

130.107版本回退测试, 完整记录:
- 13.107重新安装系统，回退到旧版本UStor_1.3.1_20100106. 130.210无法discovery,
  之前连接的网络也一下子全部连入107。 210～217，50.70和70.143共10台机器
- 210~217 8台suse, 全部logout，只保留之前50.70,70和143连接。发现210～217 8台
  都无法discovery 107机器(中间也有重启107的iscsi服务)
- 50.70,70.143 2台机器也logout，重启iscsi服务，210～217 discovery和login都ok
- reboot 130.107. 发现很多的Recv-Q有值,约500左右，TCP状态CLOSE_WAIT, IP归属210~213
  不过，过了30秒左右，全部清空了。这时使用210和214来discovery，都timeout, 失败。
  查看下，ietd进程都在。取页面关闭iscsi服务，查看netstat，发现109和108两个IP
  的3260端口有开放给215，256，217，TCP状态是TIME_WAIT.在210上discovery 107，108
  和109都显示connect refused. 但在217上页无法discovery. 再次查看netstat，已没有
  3260端口的连接。这应是正常，因为关闭了iscsi服务，3260端口也就不会开放了。
  打开iscsi服务(在网页上操作)，netstat查看到210~217的连接，并且Recv-Q有些有500
  左右的值，这时，discovery 108和109 ok，但是discovery 107(这是组ip)会失败.
  为什么组IP无法discovery??????
- 重启机器130.107。 210~217 discovery 130.107失败，108和109成功。
  网页禁用iscsi服务，netstat -apn |grep 3260还有108，但找不到107了。
  网页启用iscsi服务，发现了很多107，Recv-Q有值的都是107ip的
  210~217也无法通过107 discovery,但108和109是可以成功discovery的.

130.107版本回退测试

/usr/sbin/ietd -f -d 11
…(hide content)…
1420396717.362198: connection closed
1420396717.362209: [event_loop]connection close: conn sock=7
1420396717.362239: [event_loop]event_loop: incoming_cnt: 0

strace /usr/sbin/ietd -f
…(hide content)…
poll([{fd=7, events=0}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN},

{fd=0, events=0}, {fd=0, events=0}, {fd=0, events=0}, {fd=0, events=0},   
{fd=0, events=0}, {fd=5, events=POLLIN}, {fd=3, events=POLLIN},   
{fd=0, events=0}, {fd=0, events=0}, {fd=0, events=0},   
{fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1},   
{fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=-1},   
{fd=-1}, {fd=-1}, {fd=-1}, ...], 77, -1

root:~# grep -c sid /proc/net/iet/session
390
- root:~# netstat -apn |grep 3260 |awk '{print $5}' |awk -F: '{print $1}' |sort |uniq -c
  1 0.0.0.0
  39 172.16.130.210
  39 172.16.130.211
  39 172.16.130.212
  39 172.16.130.213
  39 172.16.130.214
  39 172.16.130.215
  39 172.16.130.216
  39 172.16.130.217
  39 172.16.50.70
  39 172.16.70.143
- reboot 130.107
linux-wdoa:~ # strace iscsiadm -m discovery -t st -p 172.16.130.100
…(hide content)..
link(“/etc/iscsi/lock”, “/etc/iscsi/lock.write”) = -1 EEXIST (File exists)
nanosleep({0, 10000000}, NULL) = 0
从昨天发现问题的情况时的检查看，情况是这样:
- 3台其他版本+5台suse机器，login到存储130.107
- reboot 130.107
- ps查看，找不到iet的进程, 但/var/log/messages中发现iet是有运行过的.
https://groups.google.com/forum/#!topic/open-iscsi/nfJ29YPRpBU 这个讨论出现
类似的问题。
- 步骤
  - a) Install IET 0.4.15 to work with 2.6.21 kernel
  - b) connect the initiator (open-iscsi 2.0-754)
  - c) play a bit, make sure everything works
  - d) if everything works, stop the target:
    /etc/init.d/iscsi-target stop
  - e) wait a couple of seconds (important!), and start the target again
  - f) on the initiator, issue this command:
    iscsiadm -m discovery -t sendtargets -p
    It will take very long time to complete, and you will get “Login I/O
    error, failed to receive a PDU”.
  - If that command succeeds, do either:
    reissue that same command again (three, four times or so) - it will fail
    repeat e) and f) again.
- 碰到此类问题可以采取的检查措施:
  - netstat
  - tcpdump
继续检查模拟环境:(检查步骤，最上面是最近的，下面是旧的)
- root:~# netstat -apn |grep 3260 |awk ‘{print $5,$1,$2,$3,$4}’ |sort |uniq –check-chars=14 -c
  1 0.0.0.0:* tcp 0 0 0.0.0.0:3260
  39 172.16.130.210:41229 tcp 0 0 172.16.130.107:3260
  39 172.16.130.211:34593 tcp 0 0 172.16.130.107:3260
  39 172.16.130.212:55204 tcp 0 0 172.16.130.107:3260
  39 172.16.130.213:43709 tcp 0 0 172.16.130.107:3260
  39 172.16.130.214:60552 tcp 0 0 172.16.130.107:3260
  39 172.16.50.70:41163 tcp 0 0 172.16.130.107:3260
  39 172.16.70.143:51161 tcp 0 0 172.16.130.107:3260
- 重启机器130.107没有发现问题(discovery 正常)
- 5台suse 全部logout login一遍，discovery ok
- 重启机器130.214, discovery ok
- 重启机器130.107没有发现问题(discovery 正常)
- netstat -apn |grep 3260 |awk '{if($2>0||$3>0){print $0}}'
- 8台机器都连接ok
- 启动5台机器(suse11 sp2, 130.21X)的链接
  - iscsiadm -m discovery -t st -p 172.16.130.107; iscsiadm -m node -L all; iscsiadm -m node
- 关闭5台机器(suse11 sp2, 130.21X)的链接
  - iscsiadm -m node -U all; iscsiadm -m node -o delete; iscsiadm -m node
- 重启机器130.107没有发现问题(discovery 正常)
- 在网页上关闭iscsi服务，启动服务，discovery正常
- 重启机器130.107没有发现问题(discovery 正常)
- cat /proc/net/iet/session |grep ip |awk '{print $2}' |awk -F: '{print $2}' |sort |uniq -c
  39 172.16.130.210
  39 172.16.130.211
  39 172.16.130.212
  39 172.16.130.213
  39 172.16.130.214
  39 172.16.130.5
  39 172.16.50.70
  39 172.16.70.143
  有8台机器在连接，312个session
- cat /proc/net/iet/session |grep tid |wc -l output 39
- cat /etc/uitos-release output UStor_1.3.1_20120720
- root:~# uname -a
  Linux ustor 2.6.26.2-ustor20091016 #1 SMP PREEMPT Fri Jul 20 12:16:13 CST 2012 x86_64 x86_64 x86_64 GNU/Linux
- Jan 4 18:28:05 ustor kernel: iSCSI Target - version 1.2.1
- 172.16.130.107 reboot需要大概1分30秒左右
好像锐捷有使用公司的产品, http://www.ruijie.com.cn/
远程操作:
- 173.4.35.7上执行了iscsiadm，发现连接失败
- 35.1和35.4会出现Sen-Q大于0的情况，但很快就清零.
- 全部存储的IET版本是0.98.1，这个是很老版本
- 可以检查官方的IET从0.98.1升级到1.4.20过程中，都解决了哪些bug，搞不好就有解决
  系统重启连接清理不干净的问题.
  - 从这里http://www.ruijie.com.cn/Service/BBSM/20100927041845.pdf可以看到98版本应该在2009年出来的.
  - 从这里http://sourceforge.net/projects/iscsitarget/files/iscsitarget/可以
    看到0.4.17升级到1.4.18是从2008-11-30到2009-10-05,所以98版本应该是在2009年10月前出的.
  - 查看1.4.18版本的ChangeLog, 发现1.4.18解决了”clean up all connections,
    sessions and targets in the kernel module if the daemon is gone” 不知是否有关系

2015-01-04

sshpass -p 123456 ssh root@172.16.130.210 “iscsiadm -m discovery -t st -p 172.16.130.107”
root:~# grep -i ‘iscsi target’ /var/log/messages
Jan 4 07:59:48 ustor kernel: iSCSI Target - version 1.2.1
root:~# grep 130.21 /proc/net/iet/session|awk ‘{print $2}’ |sort |uniq -c
39 ip:172.16.130.210
39 ip:172.16.130.211
39 ip:172.16.130.212
39 ip:172.16.130.213
39 ip:172.16.130.214
Recv-Q中有大量包没有及时被应用程序取走(recv函数调用)，说明recv函数调用没有返回了
那到底什么原因会造成这样的现象呢?
从tcpdump抓到的包看，客户端
root:~# netstat -apn |grep 3260 |awk ‘{s+=$2} END{print s}’
16905 (comment: Recv-Q )
- Recv-Q Send-Q分别表示网络接收队列，发送队列。Q是Queue的缩写。
  这两个值通常应该为0，如果不为0可能是有问题的。
- 通过netstat的这两个值就可以简单判断程序收不到包到底是包没到还是包没有被进程recv。
- http://ikon.iteye.com/blog/1603989
- http://blog.csdn.net/sjin_1314/article/details/9853163
tshark
- tshark -n -q -r tcpdump01.pcap -z “io,stat,0,tcp.analysis.retransmission”
- tshark -n -q -r tcpdump01.pcap -z “io,stat,0,tcp.analysis.out_of_order”
- tshark -n -q -r tcpdump01.pcap -z “io,stat,5,tcp.analysis.out_of_order”
- tshark -n -q -r tcpdump01.pcap -z “io,stat,5,tcp.analysis.retransmission”
- tshark -n -q -r tcpdump01.pcap -R “ip.addr==172.16.130.214”
root:~# tcpdump -i eth0 -w /tmp/tcpdump01-130.100.log
tcpdump: Couldn’t find user ‘pcap’
root:~# useradd pcap
root:~# tcpdump -i eth0 -w /tmp/tcpdump01-130.100.log
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
^C4485 packets captured
8972 packets received by filter
0 packets dropped by kernel
root:~#
netstat -apn, found 400+ on Recv-Q
netstat -apn |grep 3260 |wc -l
iscsiadm -m node -L all login all node
what is the version of iscsiadm used by customer?
open-iscsi:
- open-iscsi office website
- open-iscsi源码分析-iscsid
- 源码包是open-iscsi，经过rpm包封装后，名字是iscsi-initiator-utils
- Refrence
wireshark 查看 iscsiadm discovery过程
- Filter: ip.addr eq 172.16.130.100 and iscsi
- No. Time Source Destination Protocol Length Info
  54 5.674999 172.16.50.39 172.16.130.100 iSCSI 326 Login Command
  57 5.675339 172.16.130.100 172.16.50.39 iSCSI 258 Login Response (Success)
  60 5.675488 172.16.50.39 172.16.130.100 iSCSI 82 Text Command
  62 5.708458 172.16.130.100 172.16.50.39 iSCSI 190 Text Response
- 上面这种过程是正确的，可以发现iscsi卷的数据包信息
Testing:
- 172.16.130.100, storage server
  - root:~# cat /etc/uitos-release
    UStor_2.0_20130503
  - 2 diskgroup(r5,rd5)
  - 32 iscsi-volumns(tt01~tt32, vv01~vv32), size for each volumn 100000MB
  - 32 iscsi-volumns(tt01~tt32, vv01~vv32)
  - root:~# ip addr
    …(hide other information)…
    3: eth1:
    mtu 1500 qdisc pfifo_fast qlen 1000
```
link/ether 00:25:90:77:0f:eb brd ff:ff:ff:ff:ff:ff  
inet 172.16.130.102/24 scope global eth1  
inet 172.16.130.100/24 scope global secondary eth1  
```
    4: eth0:
    mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:25:90:77:0f:ea brd ff:ff:ff:ff:ff:ff inet 172.16.130.101/24 scope global eth0 inet 172.16.130.100/24 scope global secondary eth0
- 172.16.130.210~214 test server
  - linux-wdoa:~ # cat /etc/SuSE-release
    SUSE Linux Enterprise Server 11 (x86_64)
    VERSION = 11
    PATCHLEVEL = 2
  - linux-wdoa:~ # iscsiadm -m discovery -t st -p 172.16.130.100 无法连接
    iscsiadm: can not connect to iSCSI daemon (111)!
    iscsiadm: Could not scan /sys/class/iscsi_transport.
    iscsiadm: iSCSI driver tcp is not loaded. Load the module then retry the command.
    iscsiadm: Could not perform SendTargets discovery: iSCSI driver not found.
```
Please make sure it is loaded, and retry the operation  
```
  - linux-wdoa:~ # chkconfig –list |grep iscsi 找不到iscsi服务

shell script for add access ip address:
#!/bin/bash
#ucli iscsi_access -A -d r5 -v tt01 -i 172.16.*.*
dg_name=r5
vd_name=tt
vd_size=100000
ip_addr=172.16.*.*
for i in {1..32}
do

`if [ $i -lt 10 ]; then`  
     `ucli iscsi_access -A -d $dg_name -v ${vd_name}0$i -i $ip_addr`  
`else`  
    `ucli iscsi_access -A -d $dg_name -v ${vd_name}$i  -i $ip_addr`  
`fi`

done

shell to create iscsi volumn: ucli vd_iscsi --create -d r5 -v tt0 -s 100000
时间戳转换
- date +%s print timestamp of current time
- date -d @1420210697 translate timestamp to datetime
- date +%s -d"Jan 1, 1970 00:00:01" translate datetime to timestamp
- http://tool.chinaz.com/Tools/unixtime.aspx
研究客户现场iscsi卷挂载失败问题. 我的思路:
- 既然技术支持远程查看存储硬盘，RAID，卷，iscsi服务等正常, 那么可以在discovery
  和login的时候，使用tcpdump在两边都抓一下包，看看具体的包情况。
- 邮件发出时间是2015-01-03 03:46, 问题出现范围是前天到今天凌晨3点,说明至少
  在2015-01-01 00:00就发生问题；监控视频和图片没有存储到存储设备上。
- HE, M3
  - HE(1,4,13,Fail), M3(141,ok)
  - HE version 2010, M3 version 2011
  - iet version difference
- 错误信息汇总:
  - ietd: Destroy a target 1090808065 10
  - sysctl table check failed: /fs/all-write-without-o_sync .5.22 Unknown sysctl binary path
  - iscsi_trgt: open_path(139) Can’t open /dev/dg4/cc068 -2
  - iscsi_trgt: fileio_attach(334) -2
  - Received logout cmnd: Target 28 Initiator: iqn.1996-04.de.suse:01:ab87d8aafb5f
  - [debug]0x00000001open file failed

2015-01-03

运行命令报错: iscsiadm -m discovery -t st -p 173.4.35.4:3260
iscsiadm: socket 3 header read time out
iscsiadm: Login I/O error, failed to receive a PDU
iscsiadm: retrying discovery login to 173.4.35.4
iscsiadm: socket 3 header read time out
iscsiadm: Login I/O error, failed to receive a PDU
iscsiadm: retrying discovery login to 173.4.35.4
- 何解?
- 没有案发现场，如何提取有效信息呢？

2015-01-01

元旦放假三天