Kubectl常用命令

二月 24, 2019 容器

运行容器

前台运行的容器

1	kubectl run -it --rm --image=centos --restart=Never test bash

常驻后台的容器

1	kubectl run nginx --image=nginx --replicas=2

通常使用yaml文件创建容器，只在调试或者排错的时候使用kubectl run临时创建容器。

比如：

创建网络排查容器

1	kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools

如果创建了一个MySQL容器并且暴露了service，但是不能使用service访问mysql，这个时候可以使用nslookup或者dig这些网络命令排查。

dnstools# nslookup mysql
Server:         172.21.0.2
Address:        172.21.0.2#53

Name:   mysql.qa.svc.cluster.local
Address: 172.21.147.86

创建MySQL排查容器

1	kubectl run -it --rm --image=mysql:5.6 --restart=Never mysql-client -- mysql -h mysql --port 3306 -u root -p123456

如果创建的MySQL容器在集群外连接不上了，可以创建一个mysql容器在内部连接看看能不能连上。

-h参数是MySQL的host；
--port参数是MySQL的端口号；
-u参数是MySQL的用户名；
-p参数是用户名对应的密码，-p和密码之间没有空格。

应用资源

1	kubectl apply -f deployment.yaml

推荐使用这种方式创建或者更新资源。

获取容器

1	kubectl get pods --all-namespaces -o wide

获取所有namespace的容器。

--all-namespaces表示所有namespace，默认获取default namespace的资源，可以通过-n指定namespace。
-o wide表示输出格式，其他还要json，yaml。

获取所有不是Running状态的容器

1	kubectl get pods --all-namespaces -o wide \| awk '{if ($4 != "Running") print $0}'

获取其他资源

除了容器，还有很多其他资源例如node、service、deployment、statefulset、daemonset、job、pvc、pv等等可以通过kubectl来获取，大多数是区分namespace的，也有不区分namespace的比如node、pv、storageclass等。

删除容器

1	kubectl delete pod nginx-deployment-599c95f496-hd2jc

强制删除：

1	kubectl delete pod nginx-deployment-599c95f496-hd2jc --force --grace-period=0

批量强制删除：

1	kubectl get pods \| grep Terminating \| awk '{print $1}' \| xargs kubectl delete pod --force --grace-period=0

批量强制删除非运行容器

1	kubectl get pods --all-namespaces \| awk '{if ($4 != "Running") system ("kubectl -n " $1 " delete pods " $2 " --grace-period=0 " " --force ")}'

扩缩容

1	kubectl scale deployment nginx --replicas 4

最小可以缩容到0个。

暂停滚动升级

1	kubectl rollout pause deployment nginx

滚动升级时，可以使用该命令暂停升级来实现金丝雀发布。

恢复滚动升级

1	kubectl rollout resume deployment nginx

暂停后恢复。

回滚

1	kubectl rollout undo deployment nginx

回滚到上一次发布。

给node加taint（污点）

1	kubectl taint nodes node1 key=value:NoSchedule

给节点添加污点后只有容忍了该污点的容器才能调度上来。

查看node信息

1	kubectl describe nodes node1

通过该命令可以查看node的资源、内核、容器、标签和污点等等。

给node加标签

1	kubectl label node node1 kubernetes.io/role=node --overwrite

给node加标签后可以用节点亲和性指定某些pod调度到固定node。

删除node上的标签

1	kubectl label node node1 kubernetes.io/role-

根据标签筛选

1	kubectl get nodes -l node-type=iot

禁止节点调度

只禁止不驱逐

1	kubectl cordon node1

这种方式只把node标记为SchedulingDisabled,已经在node上运行的pod不会受影响，之后不会再有新的pod调度上去。

禁止并驱逐

1	kubectl drain node1 --ignore-daemonsets --delete-local-data --force

这种方式除了把node标记为SchedulingDisabled，已经运行的pod也会被驱逐，保证节点除daemonset外没有其他pod。

恢复调度

1	kubectl uncordon node1

CentOS升级内核kernel的几种方式

一月 17, 2019 linux

小版本升级

# install
yum install kernel* -y
# reboot
init 6

安装最新稳定版内核

# import key
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# install elrepo repo
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
# list kernel
yum --disablerepo=\* --enablerepo=elrepo-kernel list kernel*
# install kernel
yum --enablerepo=elrepo-kernel install kernel-ml-devel kernel-ml -y
# yum --disablerepo=\* --enablerepo=elrepo-kernel install -y kernel-ml.x86_64
# modify grub
grub2-set-default 0
grub2-mkconfig -o /boot/grub2/grub.cfg
# reboot
reboot

安装指定版本内核

推荐一个可以找到各个版本内核的国内镜像站：http://mirror.rc.usf.edu/compute_lock/elrepo/kernel/el7/x86_64/RPMS

本次安装以4.19版本的内核示例：

# install
rpm -ivh http://mirror.rc.usf.edu/compute_lock/elrepo/kernel/el7/x86_64/RPMS/kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm
# modify grub
sed -i "s/GRUB_DEFAULT=saved/GRUB_DEFAULT=0/" /etc/default/grub
grub2-mkconfig -o /boot/grub2/grub.cfg
# reboot
reboot

编译源码安装

没试过

参考

https://mritd.me/2016/11/08/update-centos-kernel/

创建NFS的StorageClass

十二月 17, 2018 容器

前言

在Kubernetes的几种网络存储中，NFS是成本较低、使用简单的一种方案。

但是NFS存储不建议用在生产环境，因为我们测试环境的MySQL数据库部署在NFS上都经常出问题，比如nfs4_reclaim_open_state: Lock reclaim failed和kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 26s。

NFS服务端

yum install -y net-tools lsof nfs-utils rpcbind
mkdir /data/nfs -p
vi /etc/exports
#挂载NFS服务器上的/data/nfs/目录到自己的文件系统中，rw表示可读写，no_root_squash 是让root保持权限
/data/nfs/ *(insecure,rw,no_root_squash)

关闭防火墙
systemctl stop firewalld
先为rpcbind和nfs做开机启动：(必须先启动rpcbind服务)
systemctl enable rpcbind.service
systemctl enable nfs-server.service
然后分别启动rpcbind和nfs服务：
systemctl start rpcbind.service
systemctl start nfs-server.service
 
exportfs -r
#可以查看到已经ok
exportfs
/home/nfs 192.168.248.0/24

NFS客户端

#安装nfs工具
yum  install -y nfs-utils
#建立挂载目录
mkdir /data
#挂载nfs
mount -t nfs 192.168.80.145:/data/nfs /data
卸载挂载
umount /data
 
查看是目录挂载状态
df -h
showmount  -e 192.168.80.145

创建NFS-StorageClass

nfs-rbac.yaml

kind: ServiceAccount
apiVersion: v1
metadata:
  name: nfs-client-provisioner
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    namespace: default
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    namespace: default
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io

nfs-deployment.yaml

kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: nfs-client-provisioner
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      tolerations:
        - key: node-env
          value: pre
          effect: NoSchedule
          operator: Equal
      priorityClassName: cluster
      serviceAccountName: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          image: quay.io/external_storage/nfs-client-provisioner:latest
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: nfs-provisioner
            - name: NFS_SERVER
              value: ##NFS_SERVER_IP##
            - name: NFS_PATH
              value: ##NFS_PATH##
      volumes:
        - name: nfs-client-root
          nfs:
            server: ##NFS_SERVER_IP##
            path: ##NFS_PATH##

nfs-storage.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-storage
provisioner: nfs-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
  archiveOnDelete: "false"

test-nfs-storage.yaml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim
  annotations:
    volume.beta.kubernetes.io/storage-class: "nfs-storage"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi

---

kind: Pod
apiVersion: v1
metadata:
  name: test-pod
spec:
  containers:
    - name: test-pod
      image: busybox
      command:
        - "/bin/sh"
      args:
        - "-c"
        - "touch /mnt/SUCCESS && exit 0 || exit 1"
      volumeMounts:
        - name: nfs-pvc
          mountPath: "/mnt"
  restartPolicy: "Never"
  volumes:
    - name: nfs-pvc
      persistentVolumeClaim:
        claimName: test-claim

创建顺序

nfs-rbac.yaml
nfs-deployment.yaml
nfs-storage.yaml

参数说明

nfs-deployment.yaml

##NFS_SERVER_IP##是NFS服务端的IP，根据实际IP进行替换。
##NFS_PATH##是NFS服务端的目录，根据实际目录进行替换。

部署

NFS_SERVER_IP=192.168.80.145  #换成自己的实际IP
NFS_PATH=/data/nfs  # 换成自己的实际目录
kubectl apply -f nfs-rbac.yaml
sed "s|##NFS_SERVER_IP##|${NFS_SERVER_IP}|g;s|##NFS_PATH##|${NFS_PATH}|g" nfs-deployment.yaml | kubectl apply -f -
kubectl apply -f nfs-storage.yaml

验证方法

1	kubectl apply -f test-nfs-storage.yaml

可选：设置默认存储

设置这个StorageClass为Kubernetes的默认存储

 kubectl patch storageclass nfs-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
 [root@master1 nfs-storage]# kubectl get sc
NAME                    PROVISIONER                    AGE
nfs-storage (default)   nfs-provisioner                12m

参考

https://www.cnblogs.com/lixiuran/p/7117000.html

https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client

使用ephemeral-storage管理容器的临时存储

十一月 28, 2018 容器

ephemeral-storage介绍

Kubernetes在1.8的版本中引入了一种类似于CPU，内存的新的资源模式：ephemeral-storage，并且在1.10的版本在kubelet中默认打开了这个特性。

Alpha release target (x.y): 1.7/1.8
Beta release target (x.y): 1.10
Stable release target (x.y): 1.11

ephemeral-storage是为了管理和调度Kubernetes中运行的应用的短暂存储。

在每个Kubernetes的节点上，kubelet的根目录(默认是/var/lib/kubelet)和日志目录(/var/log)保存在节点的主分区上，这个分区同时也会被Pod的EmptyDir类型的volume、容器日志、镜像的层、容器的可写层所占用。ephemeral-storage便是对这块主分区进行管理，通过应用定义的需求(requests)和约束(limits)来调度和管理节点上的应用对主分区的消耗。

ephemeral-storage的eviction逻辑

在节点上的kubelet启动的时候，kubelet会统计当前节点的主分区的可分配的磁盘资源，或者你可以覆盖节点上kubelet的配置来自定义可分配的资源。在创建Pod时会根据存储需求调度到满足存储的节点，在Pod使用超过限制的存储时会对其做驱逐的处理来保证不会耗尽节点上的磁盘空间。

如果运行时指定了别的独立的分区，比如修改了docker的镜像层和容器可写层的存储位置(默认是/var/lib/docker)所在的分区，将不再将其计入ephemeral-storage的消耗。

EmptyDir 的使用量超过了他的 SizeLimit，那么这个 pod 将会被驱逐
Container 的使用量（log，如果没有 overlay 分区，则包括 imagefs）超过了他的 limit，则这个 pod 会被驱逐
Pod 对本地临时存储总的使用量（所有 emptydir 和 container）超过了 pod 中所有container 的 limit 之和，则 pod 被驱逐

ephemeral-storage使用

和内存和CPU的限制类似，存储的限制也是定义在Pod的container中

spec.containers[].resources.limits.ephemeral-storage

spec.containers[].resources.requests.ephemeral-storage

示例：

apiVersion: v1
kind: Pod
metadata:
  name: teststorage
  labels:
    app: teststorage
spec:
  containers:
  - name: teststorage
    image: nginx:1.14
    command: ["bash", "-c", "while true; do dd if=/dev/zero of=$(date '+%s').out count=1 bs=10MB; sleep 1; done"] # 持续写入文件到容器的rootfs中
    resources:
      limits:
        ephemeral-storage: 100Mi #定义存储的限制为100M
      requests:
        ephemeral-storage: 100Mi

[root@master1 ~]# kubectl get pods -o wide
NAME                                READY     STATUS    RESTARTS   AGE       IP               NODE            NOMINATED NODE
teststorage                         1/1       Running   0          7s        172.20.189.69    10.208.204.35   <none>
-------------------------------------------------------------------------------------------------
teststorage                         0/1       Evicted   0          1m        <none>           10.208.204.35   <none>

[root@master1 ~]# kubectl describe pod teststorage 
Name:         teststorage
Namespace:    default
Node:         10.208.204.35/
Start Time:   Wed, 28 Nov 2018 13:48:37 +0800
Labels:       app=teststorage
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"teststorage"},"name":"teststorage","namespace":"default"},"spec":{"contai...
Status:       Failed
Reason:       Evicted
Message:      Pod ephemeral local storage usage exceeds the total limit of containers 100Mi. 
IP:           
Containers:
  teststorage:
    Image:      nginx:1.14
    Port:       <none>
    Host Port:  <none>
    Command:
      bash
      -c
      while true; do dd if=/dev/zero of=$(date '+%s').out count=1 bs=10MB; sleep 1; done
    Limits:
      ephemeral-storage:  100Mi
    Requests:
      ephemeral-storage:  100Mi
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mqzrh (ro)
Volumes:
  default-token-mqzrh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mqzrh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason     Age   From                    Message
  ----     ------     ----  ----                    -------
  Normal   Scheduled  1m    default-scheduler       Successfully assigned default/teststorage to 10.208.204.35
  Normal   Pulled     1m    kubelet, 10.208.204.35  Container image "nginx:1.14" already present on machine
  Normal   Created    1m    kubelet, 10.208.204.35  Created container
  Normal   Started    1m    kubelet, 10.208.204.35  Started container
  Warning  Evicted    8s    kubelet, 10.208.204.35  Pod ephemeral local storage usage exceeds the total limit of containers 100Mi.
  Normal   Killing    8s    kubelet, 10.208.204.35  Killing container with id docker://teststorage:Need to kill Pod

参考文档：

https://v1-11.docs.kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/

https://github.com/kubernetes/enhancements/issues/361

https://yq.aliyun.com/articles/594066

http://www.k8smeetup.com/article/VyEncpgA7

下载谷歌镜像的几种姿势

十一月 28, 2018 容器

前言

在国内，因为墙的存在所以很多国外网站不能访问，这其中就有谷歌镜像网站gcr.io，不过我们可以通过其他方式使用谷歌的镜像。

通过国内镜像站

阿里云镜像站

域名：registry.cn-hangzhou.aliyuncs.com/google_containers

微软镜像站

域名：gcr.azk8s.cn/google_containers

中科大镜像站（拉取速度较慢）

域名：gcr.mirrors.ustc.edu.cn/google_containers

使用方式

替换谷歌镜像地址为国内镜像站地址，比如：

k8s.gcr.io开头的k8s.gcr.io/coredns:1.1.3

1 2	docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.1.3 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.1.3 k8s.gcr.io/coredns:1.1.3

gcr.io开头的gcr.io/google_containers/heapster-amd64:v1.5.3

1
2

docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-amd64:v1.5.3
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-amd64:v1.5.3 gcr.io/google_containers/heapster-amd64:v1.5.3

通过配置代理

前提是有一台能够科学上网的机器，并且目标机器能访问到可以科学上网的机器。

为docker服务创建systemd插件目录：

1	mkdir -p /etc/systemd/system/docker.service.d

配置代理文件

cat >/etc/systemd/system/docker.service.d/http-proxy.conf<EOF 
[Service] 
Environment="HTTP_PROXY=http://proxy.example.com:80/"
EOF

刷新配置并重启Docker

1 2	systemctl daemon-reload systemctl restart docker

验证配置是否加载

1 2	systemctl show --property=Environment docker Environment=HTTP_PROXY=http://proxy.example.com:80/

如果配置已经加载但还是不能下载谷歌镜像，可以试试把HTTP_PROXY改成http_proxy，我的配置是Environment="http_proxy=http://10.208.204.147:1080/"才能使用。

HTTPS_PROXY和NO_PROXY配置类似，具体可以看官网代理配置。

通过脚本

1	curl -sSL https://git.io/getgcr \| bash -s k8s.gcr.io/kube-apiserver:v1.14.3

把k8s.gcr.io/kube-apiserver:v1.14.3替换成要下载的目标镜像即可。

该方法本质上还是通过国内镜像站下载的。

参考

http://mirror.azure.cn/help/gcr-proxy-cache.html

https://blog.docker.com/2015/10/registry-proxy-cache-docker-open-source/

https://stackoverflow.com/questions/23111631/cannot-download-docker-images-behind-a-proxy

十月 21, 2018

在用 Kubernetes 之前，当我们有了容器网络之后，访问一个应用最直接的做法，就是客户端直接去访问一个 Backend Container。

这种做法最直观和容易，同时问题也是显而易见的。

当应用有多个后端容器的时候，怎么做负载均衡，会话保持怎么做，某个容器迁了之后 IP 跟着变怎么办，还有对应的健康检查怎么配，如果想用域名来做访问入口要怎么处理……

这些其实就是 Kubernetes 的 Service 引入所要解决的问题。

Service的意义

Kubernetes Pod 是有生命周期的，它们可以被创建，也可以被销毁，然而一旦被销毁生命就永远结束。通过 ReplicationController能够动态地创建和销毁 Pod。每个 Pod 都会获取它自己的 IP 地址，即使这些 IP 地址不总是稳定可依赖的。这会导致一个问题：在 Kubernetes 集群中，如果一组 Pod（称为 backend）为其它 Pod （称为 frontend）提供服务，那么那些 frontend 该如何发现，并连接到这组 Pod中的哪些 backend 呢？

Kubernetes Service 定义了这样一种抽象：一个 Pod 的逻辑分组，一种可以访问它们的策略 —— 通常称为微服务。