Thứ Hai, 4 tháng 12, 2017

Making simple Splunk Nginx dashboard

As a DevOps guy I often do incident analysis, post deployment monitoring and usual logs checks. If you also is using Splunk as me when let me show for you few effective Splunk commands for Nginx logs monitoring.

Extract fileds

To make commands works Nginx log fields have to be extracted into variables.
Where are 2 ways to extract fields:
  1. By default Splunk recognise “access_combined” log format which is default format for Nginx. If it is your case congratulations nothing to do for you!
  2. For custom format of logs you will need to create regular expression. Splunk has built in user interface to extract fields or you can provide regular expression manually.

Chủ Nhật, 19 tháng 11, 2017

Simple shell script to backup MySQL databases

#!/bin/bash
# Simple script to backup MySQL databases

# Parent backup directory
backup_parent_dir="/var/backups/mysql"

# MySQL settings
mysql_user="root"
mysql_password=""

# Read MySQL password from stdin if empty
if [ -z "${mysql_password}" ]; then
  echo -n "Enter MySQL ${mysql_user} password: "
  read -s mysql_password
  echo
fi

Thứ Hai, 6 tháng 11, 2017

How to Redirect Nginx traffic based on the Client’s IP Address

Sometimes developers need to redirect part of their traffic to another web area or specific URL. Depending on the IP address the visitors have, it’s ideal to geo-locate traffic or simply protect areas from unwanted specific IP addresses. So, how do you redirect traffic from IP 1.2.3.4 to http://www.thisurl.com/page2.html and the rest of the traffic to http://www.thisurl.com/? We’ll go over this today.

Redirect using Nginx HttpAccessModule

As Nginx supports conditional configurations, first let’s look at an example of how to redirect traffic using HttpAccessModule running a regex against $remote_addr variable.
server {
if ($remote_addr = 1.2.3.4) {
rewrite ^ http://www.yourwebsite.com/otherpage.htm;
}
}
In this example, 1.2.3.4 is the IP address you want to redirect.

Thứ Ba, 24 tháng 10, 2017

SYN Flooding using SCAPY and Prevention using iptables

DoS (Denial of Service) attacks against Web services make them unavailable for legitimate users, affecting the website owner’s potential business. These involve intentional consumption of network, CPU and memory resources. In this article, I will demonstrate how to do a SYN flood using the SCAPY framework, along with other preventive measures.
Over time, DoS attacks have become more complicated, concealing malicious client requests as legitimate ones. Also, a distributed approach, the DDoS (Distributed Denial of Service) is now being adopted, which involves generating multiple requests to create a flood scenario. One type of DDoS flood attack is the TCP SYN queue flood.
A SYN queue flood attack takes advantage of the TCP protocol’s “three-way handshake”. A client sends a TCP SYN (S flag) packet to begin a connection to the server. The target server replies with a TCP SYN-ACK (SA flag) packet, but the client does not respond to the SYN-ACK, leaving the TCP connection “half-open”. In normal operation, the client should send an ACK (a flag) packet followed by the data to be transferred, or an RST reply to reset the connection. On the target server, the connection is kept open, in a “SYN_RECV” state, as the ACK packet may have been lost due to network problems.

Thứ Hai, 23 tháng 10, 2017

How to Fix Nf_conntrack Table Full Dropping Packet

Issue

Packet drops on this system for connections using ip_conntrack or nf_conntrack. Following messages seen in /var/log/kern on the centos nodes when one of the instances drops packets:
$ tail -f /var/log/kern
Jul  4 03:47:16 centos kernel: : nf_conntrack: table full, dropping packet
Jul  4 03:47:16 centos kernel: : nf_conntrack: table full, dropping packet
This can happen when you are being attacked, or is also very likely to happen on a busy server even if there is no malicious activity.
NOTE: By default, CentOS will set this maximum to 65,536 connections. This is enough for lightly loaded servers, but can easily be exhausted on heavy traffic servers.

How to Fix

Chủ Nhật, 22 tháng 10, 2017

Nginx: 24: Too Many Open Files Error And Solution

How do I fix this problem under CentOS / RHEL / Fedora Linux or UNIX like operating systems?

Linux / UNIX sets soft and hard limit for the number of file handles and open files. You can use ulimit command to view those limitations:
su - nginx
To see the hard and soft values, issue the command as follows:
ulimit -Hn
ulimit -Sn

Increase Open FD Limit at Linux OS Level

Your operating system set limits on how many files can be opened by nginx server. You can easily fix this problem by setting or increasing system open file limits under Linux. Edit file /etc/sysctl.conf, enter:
# vi /etc/sysctl.conf
Append / modify the following line:
fs.file-max = 70000
Save and close the file. Edit /etc/security/limits.conf, enter:
# vi /etc/security/limits.conf
Set soft and hard limit for all users or nginx user as follows:
nginx       soft    nofile   10000
nginx       hard    nofile  30000
Save and close the file. Finally, reload the changes with sysctl command:
# sysctl -p

Chủ Nhật, 15 tháng 10, 2017

yum, rpm and duplicate versions

Apparently, when doing "yum update", and it fails miserably, you can end up with duplicate versions of packages in the RPM database. This seems harmless, but is annoying. yum provides a tool to check for this, but I was not able to find anything that would automatically repair it. So here's a little tip:
$ yum check duplicates | awk '/is a duplicate/ {print $6}' > /tmp/DUPES
$ yum remove `cat /tmp/DUPES`
 Of course, before you remove the dupes, make sure to examine the tmp file (/tmp/DUPES) and make sure it looks ok.
Update:
There seems to be a command to do this, package-cleanup has an option for it. E.g.
$ package-cleanup --cleandupes
However, testing this command on a second box having the same problem gave bad results, it seems to have uninstalled the "real" packages too.

Thứ Năm, 8 tháng 6, 2017

Hệ điều hành Centos 7 – Có gì mới và khác so với Centos 6 – Phần 2 – Systemd

Có thể nói đây là sự thay đổi lớn trong  Centos 7. Trước tiên ta tìm hiểu các kỹ thuật cũ về hệ thống khởi động trên các phiên bản trước của Centos:
+ init – là process đầu tiên được load và sẽ nằm ở đó cho đến khi hệ thống (họ nhà Red Hat, Centos ) shutdown. Múc đích của nó là sử dụng các run levels để quyết định daemons nào được start và tương tác để điều khiển chúng. Tuy nhiên điểm cố hữu ” chết người” của init là ở chỗ nó khởi động các process theo thứ tự lần lượt, các tác vụ chỉ được khởi động khi tác vụ trước đó đã hoàn thành. Vì thế sẽ mất nhiều thời gian hơn và nếu một tác vụ bị failed, quá trình khởi động sẽ dừng lại ở đó cho đến khi nó tìm được cách xử lý.
+ Upstart – là sự thay thế cho init, chủ yếu sử dụng trên Ubuntu và dành cho mục đích khởi động các process không đồng thời, điều này làm tăng tốc độ boot và mức ổn định của hệ thống.

Hệ điều hành Centos 7 – Có gì mới và khác so với Centos 6 – Phần 1

Centos 7 đã ra đời được gần 1 năm nay nhưng  chưa phổ biến. Có lẽ vì người làm webhosting, máy chủ, hay system admin vẫn đang quen với Centos 6 và cũng là để thời gian cho Centos 7 được ổn định và hoàn thiện dần, tương thích với các ứng dụng phổ biến ngày nay như Cloud linux, các phần mềm quản trị webhosting, system như Cpanel hay Direct Admin.
Có thể nói, Centos 7 đã tạo ra một kiến trúc xử lý tương đối mới và khác lạ so với những thay đổi từ Centos 5 lên Centos 6. mô tả những điểm mới khác biệt trong Centos 7 để người dùng cùng nhìn nhận đó là:
1. Hệ thống khởi động mới systemd thay thế initd.
2. firewalld mới vẫn trên nền netfilter như iptables nhưng thêm nhiều tính năng tương tác để xử lý module firewall trong kernel.
3. Boot loader mới GRUB2 hỗ trợ nhiều scripts cũng như nhiều tùy chọn phần cứng trên các mainboard đời mới.
4. Hệ thống filesystem mới XFS hỗ trợ cho lưu trữ lớn, thời gian format siêu nhanh, tích hợp snapshot và cho phép dump trực tiếp mà ko cần umont như EXT.
5. Giao diện desktop GNOME 3

Thứ Năm, 25 tháng 5, 2017

Understanding Hadoop Clusters and the Network

This article is Part 1 in series that will take a closer look at the architecture and methods of a Hadoop cluster, and how it relates to the network and server infrastructure. The content presented here is largely based on academic work and conversations I’ve had with customers running real production clusters. If you run production Hadoop clusters in your data center, I’m hoping you’ll provide your valuable insight in the comments below. Subsequent articles to this will cover the server and network architecture options in closer detail. Before we do that though, lets start by learning some of the basics about how a Hadoop cluster works. OK, let’s get started!
Hadoop Server Roles
The three major categories of machine roles in a Hadoop deployment are Client machines, Masters nodes, and Slave nodes. The Master nodes oversee the two key functional pieces that make up Hadoop: storing lots of data (HDFS), and running parallel computations on all that data (Map Reduce). The Name Node oversees and coordinates the data storage function (HDFS), while the Job Tracker oversees and coordinates the parallel processing of data using Map Reduce. Slave Nodes make up the vast majority of machines and do all the dirty work of storing the data and running the computations. Each slave runs both a Data Node and Task Tracker daemon that communicate with and receive instructions from their master nodes. The Task Tracker daemon is a slave to the Job Tracker, the Data Node daemon a slave to the Name Node.
Client machines have Hadoop installed with all the cluster settings, but are neither a Master or a Slave. Instead, the role of the Client machine is to load data into the cluster, submit Map Reduce jobs describing how that data should be processed, and then retrieve or view the results of the job when its finished. In smaller clusters (~40 nodes) you may have a single physical server playing multiple roles, such as both Job Tracker and Name Node. With medium to large clusters you will often have each role operating on a single server machine.
In real production clusters there is no server virtualization, no hypervisor layer. That would only amount to unnecessary overhead impeding performance. Hadoop runs best on Linux machines, working directly with the underlying hardware. That said, Hadoop does work in a virtual machine. That’s a great way to learn and get Hadoop up and running fast and cheap. I have a 6-node cluster up and running in VMware Workstation on my Windows 7 laptop.

Hadoop Cluster

Thứ Năm, 11 tháng 5, 2017

How to show device on linux

lspci -nnk

02:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] [1000:0079] (rev 05)
 Subsystem: Intel Corporation RAID Controller RS2BL040 [8086:9260]
 Kernel driver in use: megaraid_sas
 Kernel modules: megaraid_sas

lspci -vmmnn

Slot: 02:00.0
Class: RAID bus controller [0104]
Vendor: LSI Logic / Symbios Logic [1000]
Device: MegaRAID SAS 2108 [Liberator] [0079]
SVendor: Intel Corporation [8086]
SDevice: RAID Controller RS2BL040 [9260]
Rev: 05

Thứ Tư, 12 tháng 4, 2017

Iprables example

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -p tcp -m state --state NEW -m tcp --dport 21 -j DROP
-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j DROP
-A INPUT -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -p 50 -j ACCEPT
-A RH-Firewall-1-INPUT -p 51 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp --dport 5353 -d 224.0.0.251 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m udp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 21 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 53 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 53 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2049 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 5900:5950 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8002 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8003 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8899 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 7777 -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited
COMMIT

Thứ Ba, 14 tháng 3, 2017

How to check Fibre Channel HBAs in Linux

Step-1: Determine the Manufacturer and Model of the HBAs.

Run the lspci command to list all PCI cards detected on the system.
# lspci | grep "Fibre Channel"
07:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
0a:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
The above output shows the system bus has detected two QLogic HBAs.

Step-2: Get the Vendor and Device IDs for the HBAs installed.

These can be obtained from the file /usr/share/hwdata/pci.ids
# vi /usr/share/hwdata/pci.ids
.....
1077  QLogic Corp.
 2532  ISP2532-based 8Gb Fibre Channel to PCI Express HBA
.....
The vendor id for QLogic is 1077 and the device id is 2532. You could confirm this using the lspci -nn command as shown below
# lspci -nn | grep "Fibre Channel"
07:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA [1077:2532] (rev 02)
0a:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA [1077:2532] (rev 02)

Step-3: Check if the driver modules are installed.

This can be done by searching the list of available modules. (Replace 2.6.18-308.el5PAE with your kernel version in the command below)
# grep 1077 /lib/modules/2.6.18-308.el5PAE/modules.* | grep 2532
/lib/modules/2.6.18-308.el5PAE/modules.alias:alias pci:v00001077d00002532sv*sd*bc*sc*i* qla2xxx
/lib/modules/2.6.18-308.el5PAE/modules.pcimap:qla2xxx              0x00001077 0x00002532 0xffffffff 0xffffffff 0x00000000 0x00000000 0x0
The above output shows that this HBA is supported by the module qla2xxx

Step-4: Check if the drivers for these HBAs are loaded in the kernel.

The lsmod command will list the currently loaded kernel modules
# lsmod | grep qla2xxx
qla2xxx              1205905  0
scsi_transport_fc      44745  1 qla2xxx
scsi_mod              144277  16 be2iscsi, ib_iser, iscsi_tcp, bnx2i, libcxgbi, libiscsi2, scsi_transport_iscsi2, scsi_dh, sr_mod, sg, usb_storage, qla2xxx, scsi_transport_fc, libata,cciss, sd_mod
The output shows the module qla2xxx is loaded by the kernel. If you don't see any output for lsmod command then you can load the module using modprobe command
# modprobe -v qla2xxx

Step-5: Getting detailed information

You can find detailed information about the fibre channel adapters in the location /sys/class/fc_host/
# ll /sys/class/fc_host/
drwxr-xr-x 3 root root 0 Feb  5 11:28 host3
drwxr-xr-x 3 root root 0 Feb  5 11:28 host4
The directories host3 and host4 in the example above contains information specific to each adapter like node name (WWN), port name (WWN), type, speed,state etc.,
An easier way to get this information is to use the systool command.
# systool -c fc_host
Class = "fc_host"
  Class Device = "host3"
    Device = "host3"
  Class Device = "host4"
    Device = "host4"
The -v option gives you detailed output
# systool -c fc_host -v host3
Class = "fc_host"

 Class Device = "host3"
 Class Device path = "/sys/class/fc_host/host3"
   fabric_name         = "0x50014380272531e9"
   issue_lip           = <store method only>
   node_name           = "0x50014380281412e9"
   port_id             = "0x0000e8"
   port_name           = "0x50014380281412e8"
   port_state          = "Online"
   port_type           = "LPort (private loop)"
   speed               = "8 Gbit"
   supported_classes   = "Class 3"
   supported_speeds    = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"
   symbolic_name       = "HPAK344A FW:v5.06.03 DVR:v8.03.07.09.05.08-k"
   system_hostname     = ""
   tgtid_bind_type     = "wwpn (World Wide Port Name)"
   uevent              = <store method only>

   Device = "host3"
   Device path = "/sys/devices/pci0000:00/0000:00:01.0/0000:07:00.0/host3"
     edc               = <store method only>
     fw_dump           =
     nvram             = "ISP "
     optrom_ctl        = <store method only>
     optrom            =
     reset             = <store method only>
     sfp               = ""
     uevent            = <store method only>
     vpd               = "&" 
 

How to scan new FC LUNS and SCSI disks in Linux ?

1.First find out how many disks are visible in “fdisk -l” .
# fdisk -l 2>/dev/null | egrep '^Disk' | egrep -v 'dm-' | wc -l
 
2.Find out how many host bus adapter configured in the Linux box.you can use “systool -fc_host -v” to verify available FC in the system.
# ls /sys/class/fc_host
host0  host1
In this case,you need to scan host0 & host1 HBA.


3.If the system virtual memory is too low ,then do not proceed further.If you have enough free virtual memory,then you can proceed with below command to scan new LUNS.
# echo "1" > /sys/class/fc_host/host0/issue_lip
# echo "- - -" > /sys/class/scsi_host/host0/scan
# echo "1" > /sys/class/fc_host/host1/issue_lip
# echo "- - -" > /sys/class/scsi_host/host1/scan
Note: You need to monitor the “issue_lip” in /var/log/messages to determine when the scan will complete.This operation is an asynchronous operation.

4.Verify if the new LUN is visible or not by counting the available disks.
# fdisk -l 2>/dev/null | egrep '^Disk' | egrep -v 'dm-' | wc -l
If any new LUNS added ,then you can see more count is more then before scanning the LUNS.

 

 

Thứ Hai, 13 tháng 3, 2017

Some problems with mysql & mariadb

MySQL max_open_files more than 1024


Edit /etc/security/limits.conf and add the following lines
mysql soft nofile 65535
mysql hard nofile 65535

 Edit file /usr/lib/systemd/system/mysqld.service ( /usr/lib/systemd/system/mariadb.service  ) and /etc/systemd/system/mariadb.service.d/migrated-from-my.cnf-settings.conf 

Add at end:

LimitNOFILE=65535
LimitNPROC=65535

Chủ Nhật, 12 tháng 3, 2017

Key VMware vSphere -vCenter

   1. Key VMware vSphere 5.x Enterprise Plus Unlimited CPU :


JU2HK-4HH05-JZA89-J1CQP-1A2JW
4F47K-6EH47-RZW90-ZH252-C2J45
0U25N-6W305-JZTL0-FK2QM-3CFPC
1F6HF-4824J-GZXH0-D225H-8AUN5
HA4EU-D231L-ZZ5N8-3HA7M-A3TQ2



    2. Key VMware vSphere 5.x Enterprise Plus Unlimited VM :


5U6HE-6YJ86-DZ798-1LCZH-12YMH
4Y295-6W00L-6Z5K0-DK95K-8C6H4
1A0MR-08156-MZZ90-KL95M-32U7R
NU2P3-6D113-UZH29-Q0AN6-8AZ1G
1F02R-8H15M-RZ8C1-K10QH-9A3LL

 3. Vmware vSphere 6 Standard)

JA08H-6AJ03-H80G8-WT2Z0-0F06F
JU2D0-2FK13-088Q9-J93X0-928H2
JF092-FFLE6-H8940-Z22NP-06R1F
0V6XK-41J8L-M80P1-K39Z4-AYU2F
1Z6DK-FPK5J-H80U1-N12Z0-2P22D
4V042-DWK8J-084X9-CR0N0-9KU1F
5U4X0-DQ24K-485W0-LV2ZP-0YK5D
4C202-DXH12-H89U0-X39G0-1AK62

Thứ Năm, 9 tháng 3, 2017

Optimizing MySQL, Intermediate results

Configuration options, that made huge difference, comparing to the default values

  1. Query cache must be enabled and configured
query_cache_type=1        
query_cache_limit=1M
query_cache_size=256M
query_cache_min_res_unit=512
2. Various read and sort buffers must be set
key_buffer_size=128M
join_buffer_size=8M
myisam_sort_buffer_size=1M
sort_buffer_size=1M
read_buffer_size=4M 
read_rnd_buffer_size=4M
3. Set InnoDB tables to be stored each in separate file
innodb_file_per_table=1
4. Tweak MyISAM table
myisam_use_mmap=1

Chủ Nhật, 5 tháng 2, 2017

Coping with the TCP TIME-WAIT state on busy Linux servers

TL;DR: Do not enable net.ipv4.tcp_tw_recycle.
The Linux kernel documentation is not very helpful about what net.ipv4.tcp_tw_recycle does:
Enable fast recycling TIME-WAIT sockets. Default value is 0. It should not be changed without advice/request of technical experts.
Its sibling, net.ipv4.tcp_tw_reuse is a little bit more documented but the language is about the same:
Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0. It should not be changed without advice/request of technical experts.
The mere result of this lack of documentation is that we find numerous tuning guides advising to set both these settings to 1 to reduce the number of entries in the TIME-WAIT state. However, as stated by tcp(7) manual page, the net.ipv4.tcp_tw_recycle option is quite problematic for public-facing servers as it won’t handle connections from two different computers behind the same NAT device, which is a problem hard to detect and waiting to bite you: