Kubernetes Advanced Usage

Course Overview

Advanced Kubernetes Usage Technologies
Logging ElasticSearch (as StatefulSet), fluentd, Kibana, LogTrail
Authentication OpenID Connect (OIDC), Auth0
Authorization Kubernetes RBAC
Packaging Helm
The Job Resource
Job Scheduling CronJob
Deploying on Kubernetes Spinnaker
Microservices on Kubernetes Linkerd
Federation kubefed
Monitoring Prometheus

Logging with fluentd

  • Logging is important to show errors, information and debugging data about the application
  • When you only run one app, it is pretty obvious how to look for the logs
    • You’d just open the “app.log”-file to see what’s going on
    • Or if deployed as a pod, with kubectl logs
  • With Kubernetes 1 application will be running as one or many pods
    • Finding an error will be much more difficult: what pod do you have to look at?
  • Up until now you might have been using “kubectl logs” to get the log details of a pod
  • To get the correct information, you might have to look up:
    • The pod name
    • The container names
  • And run kubectl logs for every container in every pod that is running for an application
  • Note: you can use both kubectl log and kubectl logs
  • The solution for this problem is to do Log Aggregation
  • It’s not Kubernetes specific
    • It’s already applied for years, even with syslog (a standard for message logging since 1980s)
    • Log Aggregation is nowadays often implemented with more modern tools
      • the ELK Stack (ElasticSearch + Logstash + Kibana)
      • Several hosted services like loggly.com, papertrailapp.com
  • I’ll show you how to setup centralized logging using:
    • Fluentd (For log forwarding)
    • ElasticSearch (For log indexing)
    • Kibana (For visualisation)
    • LogTrail (an easy to use UI to show logs)
  • This solution can be easily customized, you can create custom dashboards to show what is important for you

Setup (AWS)

storage.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  zone: eu-west-1a

es-statefulset.yaml

# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: elasticsearch-logging
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
  - ""
  resources:
  - "services"
  - "namespaces"
  - "endpoints"
  verbs:
  - "get"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  namespace: kube-system
  name: elasticsearch-logging
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
  name: elasticsearch-logging
  namespace: kube-system
  apiGroup: ""
roleRef:
  kind: ClusterRole
  name: elasticsearch-logging
  apiGroup: ""
---
# Elasticsearch deployment itself
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    version: v5.5.1
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  serviceName: elasticsearch-logging
  replicas: 2
  selector:
    matchLabels:
      k8s-app: elasticsearch-logging
      version: v5.5.1
  template:
    metadata:
      labels:
        k8s-app: elasticsearch-logging
        version: v5.5.1
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccountName: elasticsearch-logging
      containers:
      - image: gcr.io/google-containers/elasticsearch:v5.5.1-1
        name: elasticsearch-logging
        resources:
          # need more cpu upon initialization, therefore burstable class
          limits:
            cpu: 1000m
            memory: 2.5Gi
          requests:
            memory: 2.5Gi
            cpu: 100m
        ports:
        - containerPort: 9200
          name: db
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        volumeMounts:
        - name: es-storage
          mountPath: /data
        env:
        - name: "NAMESPACE"
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: "ES_JAVA_OPTS"
          value: "-XX:-AssumeMP"
      # Elasticsearch requires vm.max_map_count to be at least 262144.
      # If your OS already sets up this number to a higher value, feel free
      # to remove this init container.
      initContainers:
      - image: alpine:3.6
        command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"]
        name: elasticsearch-logging-init
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: es-storage
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 8Gi

es-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "Elasticsearch"
spec:
  ports:
  - port: 9200
    protocol: TCP
    targetPort: db
  selector:
    k8s-app: elasticsearch-logging

fluentd-es-configmap.yaml

kind: ConfigMap
apiVersion: v1
data:
  containers.input.conf: |-
    # This configuration file for Fluentd / td-agent is used
    # to watch changes to Docker log files. The kubelet creates symlinks that
    # capture the pod name, namespace, container name & Docker container ID
    # to the docker logs for pods in the /var/log/containers directory on the host.
    # If running this fluentd configuration in a Docker container, the /var/log
    # directory should be mounted in the container.
    #
    # These logs are then submitted to Elasticsearch which assumes the
    # installation of the fluent-plugin-elasticsearch & the
    # fluent-plugin-kubernetes_metadata_filter plugins.
    # See https://github.com/uken/fluent-plugin-elasticsearch &
    # https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter for
    # more information about the plugins.
    #
    # Example
    # =======
    # A line in the Docker log file might look like this JSON:
    #
    # {"log":"2014/09/25 21:15:03 Got request with path wombat\n",
    #  "stream":"stderr",
    #   "time":"2014-09-25T21:15:03.499185026Z"}
    #
    # The time_format specification below makes sure we properly
    # parse the time format produced by Docker. This will be
    # submitted to Elasticsearch and should appear like:
    # $ curl 'http://elasticsearch-logging:9200/_search?pretty'
    # ...
    # {
    #      "_index" : "logstash-2014.09.25",
    #      "_type" : "fluentd",
    #      "_id" : "VBrbor2QTuGpsQyTCdfzqA",
    #      "_score" : 1.0,
    #      "_source":{"log":"2014/09/25 22:45:50 Got request with path wombat\n",
    #                 "stream":"stderr","tag":"docker.container.all",
    #                 "@timestamp":"2014-09-25T22:45:50+00:00"}
    #    },
    # ...
    #
    # The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log
    # record & add labels to the log record if properly configured. This enables users
    # to filter & search logs on any metadata.
    # For example a Docker container's logs might be in the directory:
    #
    #  /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b
    #
    # and in the file:
    #
    #  997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
    #
    # where 997599971ee6... is the Docker ID of the running container.
    # The Kubernetes kubelet makes a symbolic link to this file on the host machine
    # in the /var/log/containers directory which includes the pod name and the Kubernetes
    # container name:
    #
    #    synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
    #    ->
    #    /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
    #
    # The /var/log directory on the host is mapped to the /var/log directory in the container
    # running this instance of Fluentd and we end up collecting the file:
    #
    #   /var/log/containers/synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
    #
    # This results in the tag:
    #
    #  var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
    #
    # The Kubernetes fluentd plugin is used to extract the namespace, pod name & container name
    # which are added to the log message as a kubernetes field object & the Docker container ID
    # is also added under the docker field object.
    # The final tag is:
    #
    #   kubernetes.var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
    #
    # And the final log record look like:
    #
    # {
    #   "log":"2014/09/25 21:15:03 Got request with path wombat\n",
    #   "stream":"stderr",
    #   "time":"2014-09-25T21:15:03.499185026Z",
    #   "kubernetes": {
    #     "namespace": "default",
    #     "pod_name": "synthetic-logger-0.25lps-pod",
    #     "container_name": "synth-lgr"
    #   },
    #   "docker": {
    #     "container_id": "997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b"
    #   }
    # }
    #
    # This makes it easier for users to search for logs by pod name or by
    # the name of the Kubernetes container regardless of how many times the
    # Kubernetes pod has been restarted (resulting in a several Docker container IDs).

    # Example:
    # {"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}
    <source>
      type tail
      path /var/log/containers/*.log
      pos_file /var/log/es-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      format json
      read_from_head true
    </source>
  system.input.conf: |-
    # Example:
    # 2015-12-21 23:17:22,066 [salt.state       ][INFO    ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081
    <source>
      type tail
      format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
      time_format %Y-%m-%d %H:%M:%S
      path /var/log/salt/minion
      pos_file /var/log/es-salt.pos
      tag salt
    </source>

    # Example:
    # Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
    <source>
      type tail
      format syslog
      path /var/log/startupscript.log
      pos_file /var/log/es-startupscript.log.pos
      tag startupscript
    </source>

    # Examples:
    # time="2016-02-04T06:51:03.053580605Z" level=info msg="GET /containers/json"
    # time="2016-02-04T07:53:57.505612354Z" level=error msg="HTTP Error" err="No such image: -f" statusCode=404
    <source>
      type tail
      format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
      path /var/log/docker.log
      pos_file /var/log/es-docker.log.pos
      tag docker
    </source>

    # Example:
    # 2016/02/04 06:52:38 filePurge: successfully removed file /var/etcd/data/member/wal/00000000000006d0-00000000010a23d1.wal
    <source>
      type tail
      # Not parsing this, because it doesn't have anything particularly useful to
      # parse out of it (like severities).
      format none
      path /var/log/etcd.log
      pos_file /var/log/es-etcd.log.pos
      tag etcd
    </source>

    # Multi-line parsing is required for all the kube logs because very large log
    # statements, such as those that include entire object bodies, get split into
    # multiple lines by glog.

    # Example:
    # I0204 07:32:30.020537    3368 server.go:1048] POST /stats/container/: (13.972191ms) 200 [[Go-http-client/1.1] 10.244.1.3:40537]
    <source>
      type tail
      format multiline
      multiline_flush_interval 5s
      format_firstline /^\w\d{4}/
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
      time_format %m%d %H:%M:%S.%N
      path /var/log/kubelet.log
      pos_file /var/log/es-kubelet.log.pos
      tag kubelet
    </source>

    # Example:
    # I1118 21:26:53.975789       6 proxier.go:1096] Port "nodePort for kube-system/default-http-backend:http" (:31429/tcp) was open before and is still needed
    <source>
      type tail
      format multiline
      multiline_flush_interval 5s
      format_firstline /^\w\d{4}/
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
      time_format %m%d %H:%M:%S.%N
      path /var/log/kube-proxy.log
      pos_file /var/log/es-kube-proxy.log.pos
      tag kube-proxy
    </source>

    # Example:
    # I0204 07:00:19.604280       5 handlers.go:131] GET /api/v1/nodes: (1.624207ms) 200 [[kube-controller-manager/v1.1.3 (linux/amd64) kubernetes/6a81b50] 127.0.0.1:38266]
    <source>
      type tail
      format multiline
      multiline_flush_interval 5s
      format_firstline /^\w\d{4}/
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
      time_format %m%d %H:%M:%S.%N
      path /var/log/kube-apiserver.log
      pos_file /var/log/es-kube-apiserver.log.pos
      tag kube-apiserver
    </source>

    # Example:
    # I0204 06:55:31.872680       5 servicecontroller.go:277] LB already exists and doesn't need update for service kube-system/kube-ui
    <source>
      type tail
      format multiline
      multiline_flush_interval 5s
      format_firstline /^\w\d{4}/
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
      time_format %m%d %H:%M:%S.%N
      path /var/log/kube-controller-manager.log
      pos_file /var/log/es-kube-controller-manager.log.pos
      tag kube-controller-manager
    </source>

    # Example:
    # W0204 06:49:18.239674       7 reflector.go:245] pkg/scheduler/factory/factory.go:193: watch of *api.Service ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [2578313/2577886]) [2579312]
    <source>
      type tail
      format multiline
      multiline_flush_interval 5s
      format_firstline /^\w\d{4}/
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
      time_format %m%d %H:%M:%S.%N
      path /var/log/kube-scheduler.log
      pos_file /var/log/es-kube-scheduler.log.pos
      tag kube-scheduler
    </source>

    # Example:
    # I1104 10:36:20.242766       5 rescheduler.go:73] Running Rescheduler
    <source>
      type tail
      format multiline
      multiline_flush_interval 5s
      format_firstline /^\w\d{4}/
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
      time_format %m%d %H:%M:%S.%N
      path /var/log/rescheduler.log
      pos_file /var/log/es-rescheduler.log.pos
      tag rescheduler
    </source>

    # Example:
    # I0603 15:31:05.793605       6 cluster_manager.go:230] Reading config from path /etc/gce.conf
    <source>
      type tail
      format multiline
      multiline_flush_interval 5s
      format_firstline /^\w\d{4}/
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
      time_format %m%d %H:%M:%S.%N
      path /var/log/glbc.log
      pos_file /var/log/es-glbc.log.pos
      tag glbc
    </source>

    # Example:
    # I0603 15:31:05.793605       6 cluster_manager.go:230] Reading config from path /etc/gce.conf
    <source>
      type tail
      format multiline
      multiline_flush_interval 5s
      format_firstline /^\w\d{4}/
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
      time_format %m%d %H:%M:%S.%N
      path /var/log/cluster-autoscaler.log
      pos_file /var/log/es-cluster-autoscaler.log.pos
      tag cluster-autoscaler
    </source>

    # Logs from systemd-journal for interesting services.
    <source>
      type systemd
      filters [{ "_SYSTEMD_UNIT": "docker.service" }]
      pos_file /var/log/gcp-journald-docker.pos
      read_from_head true
      tag docker
    </source>

    <source>
      type systemd
      filters [{ "_SYSTEMD_UNIT": "kubelet.service" }]
      pos_file /var/log/gcp-journald-kubelet.pos
      read_from_head true
      tag kubelet
    </source>

    <source>
      type systemd
      filters [{ "_SYSTEMD_UNIT": "node-problem-detector.service" }]
      pos_file /var/log/gcp-journald-node-problem-detector.pos
      read_from_head true
      tag node-problem-detector
    </source>
  forward.input.conf: |-
    # Takes the messages sent over TCP
    <source>
      type forward
    </source>
  monitoring.conf: |-
    # Prometheus Exporter Plugin
    # input plugin that exports metrics
    <source>
      @type prometheus
    </source>

    <source>
      @type monitor_agent
    </source>

    # input plugin that collects metrics from MonitorAgent
    <source>
      @type prometheus_monitor
      <labels>
        host ${hostname}
      </labels>
    </source>

    # input plugin that collects metrics for output plugin
    <source>
      @type prometheus_output_monitor
      <labels>
        host ${hostname}
      </labels>
    </source>

    # input plugin that collects metrics for in_tail plugin
    <source>
      @type prometheus_tail_monitor
      <labels>
        host ${hostname}
      </labels>
    </source>
  output.conf: |-
    # Enriches records with Kubernetes metadata
    <filter kubernetes.**>
      type kubernetes_metadata
    </filter>

    <match **>
       type elasticsearch
       log_level info
       include_tag_key true
       host elasticsearch-logging
       port 9200
       logstash_format true
       # Set the chunk limits.
       buffer_chunk_limit 2M
       buffer_queue_limit 8
       flush_interval 5s
       # Never wait longer than 5 minutes between retries.
       max_retry_wait 30
       # Disable the limit on the number of retries (retry forever).
       disable_retry_limit
       # Use multiple threads for processing.
       num_threads 2
    </match>
metadata:
  name: fluentd-es-config-v0.1.0
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: Reconcile

fluentd-es-ds.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd-es
  namespace: kube-system
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: fluentd-es
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
  - ""
  resources:
  - "namespaces"
  - "pods"
  verbs:
  - "get"
  - "watch"
  - "list"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: fluentd-es
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
  name: fluentd-es
  namespace: kube-system
  apiGroup: ""
roleRef:
  kind: ClusterRole
  name: fluentd-es
  apiGroup: ""
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd-es-v2.0.1
  namespace: kube-system
  labels:
    k8s-app: fluentd-es
    version: v2.0.1
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  template:
    metadata:
      labels:
        k8s-app: fluentd-es
        kubernetes.io/cluster-service: "true"
        version: v2.0.1
      # This annotation ensures that fluentd does not get evicted if the node
      # supports critical pod annotation based priority scheme.
      # Note that this does not guarantee admission on the nodes (#40573).
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      serviceAccountName: fluentd-es
      containers:
      - name: fluentd-es
        image: gcr.io/google-containers/fluentd-elasticsearch:v2.0.1
        env:
        - name: FLUENTD_ARGS
          value: --no-supervisor -q
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: libsystemddir
          mountPath: /host/lib
          readOnly: true
        - name: config-volume
          mountPath: /etc/fluent/config.d
      nodeSelector:
        beta.kubernetes.io/fluentd-ds-ready: "true"
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      # It is needed to copy systemd library to decompress journals
      - name: libsystemddir
        hostPath:
          path: /usr/lib64
      - name: config-volume
        configMap:
          name: fluentd-es-config-v0.1.0

kibana-deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kibana-logging
  namespace: kube-system
  labels:
    k8s-app: kibana-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: kibana-logging
  template:
    metadata:
      labels:
        k8s-app: kibana-logging
    spec:
      containers:
      - name: kibana-logging

        # official image without logtrail:
        # image: docker.elastic.co/kibana/kibana:5.5.1

        # image with logtrail
        image: wardviaene/kibana-logtrail:5.5.1
        resources:
          # need more cpu upon initialization, therefore burstable class
          limits:
            cpu: 1000m
            memory: 2.5Gi
          requests:
            cpu: 100m
            memory: 2.5Gi
        env:
          - name: ELASTICSEARCH_URL
            value: http://elasticsearch-logging:9200
          # use this if you want to use proxy
          #- name: SERVER_BASEPATH
          #  value: /api/v1/proxy/namespaces/kube-system/services/kibana-logging
          - name: XPACK_MONITORING_ENABLED
            value: "false"
          - name: XPACK_SECURITY_ENABLED
            value: "false"
        ports:
        - containerPort: 5601
          name: ui
          protocol: TCP

kibana-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: kibana-logging
  namespace: kube-system
  labels:
    k8s-app: kibana-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "Kibana"
spec:
  ports:
  - port: 5601
    protocol: TCP
    targetPort: ui
  selector:
    k8s-app: kibana-logging
  type: LoadBalancer

Authentication

  • Currently, by default, X509 certificates are used to authenticate yourself to the kubernetes api-server
  • These certificates were issued when your cluster was setup for the first time
    • If you’re using RKE, minikube or kops, this was done for you
  • In the first Kubernetes course, I showed you how to create new users by generating new certificates and signing them with the Certificate Authority that is used by Kubernetes

Alternative authentication methods

HTTP Basic authentication

  • The kubernetes API server is based on HTTP, so one of the options is to use http basic authentication
  • HTTP Basic authentication only requires us sending a username and password to the API server
  • While this is very simple to do, a username-password combination is less secure and still difficult to maintain within Kubernetes
  • To enable basic authentication, you can use a static password file on the Kubernetes master
  • The path to this static password file needs to be passed to the apiserver as an argument:
    • –basic-auth-file=/path/to/somefile
  • The file needs to be formatted as:
password,user,uid,"group1,group2,group3"
  • Basic auth has a few downsides:
    • It’s currently supported for convenience while the kubernetes team is working on making the more secure methods easier to use
    • To add a user (or change the password), the apiserver needs to be restarted

proxy

  • Another way to handle authentication is to use a proxy
  • When using a proxy, you can handlxe the authentication part yourself
  • You can write your own authentication mechanism and provide the username, and groups to the kubernetes API once the user is authenticated
  • This is a good solution if Kubernetes doesn’t support the authentication method you are looking for
  • The proxy setup will need the following steps:
    • The proxy needs a client certificate signed by the certificate authority that is passed to the api server using –requestheader-client-ca-file
    • The proxy needs to do handle the authentication (using a form, basic auth, or another mechism)
    • Once the user is authenticated, the proxy needs to forward the request to the kubernetes API server and set an HTTP header with the login
  • This login http header is determined by a flag passed to the API server, for example: –requestheader-username-headers=X-Remote-User
    • In this example, the proxy need to set the X-Remote-User after authentication
  • –requestheader-group-headers=X-Remote-Group can be used as an argument to set the group header
  • –requestheader-extra-headers-prefix=X-Remote-Extra- allows you to set extra headers with extra information about the user

OpenID

  • Another (better) alternative is to use OpenID Connect tokens
  • OpenID Connect is built on top of OAuth2
  • It allows you to securely authenticate and then receive an ID Token
  • This ID Token can be verified whether it really originated from the authentication server, because it’s signed (using HMAC SHA256 or RSA)
  • This token is a JSON Web Token (JWT)
    • JWT contains known fields like username and optionally groups
  • Once this token is obtained it can be used as credential to authenticate to the apiserver
  • You can pass –token=<yourtoken> when executing kubectl commands
  • kubectl can also automatically renew your token_id when it expires
    • Although this doesn’t work with all identity providers
  • Using this token, you can also authenticate to the Kubernetes UI
    • To make this easier, I created a reverse proxy that can authenticate you using OpenID Connect and will then pass your token to the UI

Auth0

  • Setup Identity Provider (auth0 account)
  • Create auth0 client for Kubernetes
  • Setup cluster with oidc (OpenID Connect) using kops edit
  • Deploy authentication server (for UI proxy + to hand out bearer tokens)
  • Change variables to match auth0
  • Create DNS record for auth server
  • Try logging in to the UI through authentication server

auth0-secrets.yml

apiVersion: v1
kind: Secret
metadata:
  name: auth0-secrets
type: Opaque
data:
  AUTH0_CLIENT_SECRET: # enter the auth0 secret here

auth0-deployment.yml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kubernetes-auth-server
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: kubernetes-auth-server
    spec:
      containers:
      - name: kubernetes-auth-server
        image: wardviaene/kubernetes-auth-server:1.0.1
        imagePullPolicy: Always
        ports:
        - name: app-port
          containerPort: 3000
        env:
          - name: AUTH0_CLIENT_ID
            value:  # change into your client id
          - name: AUTH0_DOMAIN
            value: newtechacademy.eu.auth0.com # change into your domain
          - name: AUTH0_CALLBACK_URL
            value: http://authserver.kubernetes.newtech.academy/callback # change into your callback url
          - name: AUTH0_API_ID
            value: https://newtechacademy.eu.auth0.com/userinfo # change into your identifier
          - name: AUTH0_CONNECTION
            value: Username-Password-Authentication # auth0 user database connection
          - name: KUBERNETES_UI_HOST
            value: api.kubernetes.newtech.academy
          - name: APP_HOST
            value: authserver.kubernetes.newtech.academy
          - name: AUTH0_CLIENT_SECRET
            valueFrom:
              secretKeyRef:
                name: auth0-secrets
                key: AUTH0_CLIENT_SECRET

auth0-service.yml

apiVersion: v1
kind: Service
metadata:
  name: kubernetes-auth-server
spec:
  ports:
  - port: 80
    targetPort: app-port
    protocol: TCP
  selector:
    app: kubernetes-auth-server
  type: LoadBalancer

Authorization

  • After authentication, authorization controls what the user can do, where does the user have access to
  • The access controls are implemented on an API level (kube-apiserver)
  • When an API request comes in (e.g. when you enter kubectl get nodes), it will be checked to see whether you have access to execute this command
  • There are multiple authorization module available:
    • Node: a special purpose authorization mode that authorizes API requests made by kubelets
    • ABAC: attribute-based access control
      • Access rights are controlled by policies that combine attributes
      • e.g. user “alice” can do anything in namespace “marketing”
      • ABAC does not allow very granular permission control
  • RBAC: role based access control
    • Regulates access using roles
    • Allows admins to dynamically configure permission policies
    • Beta since kubernetes 1.6 but soon to be stable
  • Webhook: sends authorization request to an external REST interface
    • Interesting option if you want to write your own authorization server
    • You can parse the incoming payload (which is JSON) and reply with access granted or access denied
  • To enable an authorization mode, you need to pass –authorization- mode= to the API server at startup
  • For example, to enable RBAC, you pass —authorization-mode=RBAC
  • When using kops, you can create a cluster with the flag –authorization
    • By default (when not specifying the flag), you are allowing everything (same as specifying AlwaysAllow)
      • This might change in the future
    • When specifying –authorization RBAC, a cluster will be created using RBAC
  • If your cluster is already setup with kops, you can use kops edit cluster and change these lines:
kubeAPIServer: authorization
Mode: RBAC
  • If you’re using minikube, you can add a parameter when starting minikube:
$ minikube start —-extra-config=apiserver.Authorization.Mode=RBAC
  • If you’re using another tool to setup the cluster, you’ll have to refer to the documentation of that tool
  • If you set up the cluster manually, you have to take a look at the boot scripts (e.g. systemd)

RBAC

  • You can add RBAC resources with kubectl to grant permissions
    • You first describe them in yaml format, then apply them to the cluster
  • First you define a role, then you can assign users/groups to that role
  • You can create roles limited to a namespace, or you can create roles where the access applies to all namespaces
  • The different kinds of resources I’m going to show you next are:
    • Role (single namespace) and ClusterRole (cluster-wide)
    • RoleBinding (single namespace) and ClusterRoleBinding (cluster-wide)
  • RBAC Role granting read access to pods and secrets within default namespace
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""]
  resources: [“pods”, “secrets”]
  verbs: ["get", "watch", "list"]
  • If you rather want to create a role that spans all namespaces, you can use ClusterRole
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: pod-reader-clusterwide
rules:
- apiGroups: [""]
  resources: [“pods”, “secrets”]
  verbs: ["get", "watch", "list"]
  • Next step is to assign users to the newly created role
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: User
  name: bob
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup:  rbac.authorization.k8s.io
  • If you need to assign a user to a cluster-wide role, you need to use ClusterRoleBinding
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: read-pods
subjects:
- kind: User
  name: alice
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader-clusterwide
  apiGroup: rbac.authorization.k8s.io

RBAC Role

  • A more complex role:
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: more-complex-role
rules:
- apiGroups: [""]
  resources: [“configmaps”, “secrets”, “nodes”]
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [““] resources: [“pods”]
  verbs: ["get", "list", "watch", "create", "update", "patch", “delete"]
- apiGroups: ["extensions", "apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Pre-Defined RBAC Roles

  • You don’t need to create your own complex roles, you can also use one of the predefined ones:
Role name Description
cluster-admin super-user access. Can be used with ClusterRoleBinding (superuser access on the cluster) or with RoleBinding to limit access within a namespace
admin admin access, intended to be used only with RoleBinding. Has read/write access within the namespace, but can’t change quotas
edit read/write access to most objects within a namespace, but can’t view/ create new roles or rolebindings
view read access, but can’t see any secrets, roles, or rolebindings
  • Superuser example:
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: admins
  namespace: default
subjects:
- kind: User
  name: bob
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

Packaging Helm

  • Helm is the package manager for Kubernetes
  • Helm helps you to manage Kubernetes applications
  • Helm is maintained by the CNCF - The Cloud Native Computing Foundation (together with Kubernetes, fluentd, linkerd, and others)
  • Helm was started by Google and Deis
  • Deis provides a PaaS on Kubernetes and was bought by Microsoft in April 2017

Charts

  • Helm uses a packaging format called charts
    • A chart is a collection of files that describe a set of Kubernetes resources
    • A single chart can deploy an app, a piece of software or a database for example
    • It can have dependencies, e.g. to install wordpress chart, you need a mysql chart
    • You can write your own chart to deploy your application on Kubernetes using helm
  • Charts use templates that are typically developed by a package maintainer
  • They will generate yaml files that Kubernetes understands
  • You can think of templates as dynamic yaml files, which can contain logic and variables
  • This is an example of a template within a chart:
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ .Release.Name }}-configmap
data:
  myvalue: "Hello World"
  drink: {{ .Values.favoriteDrink }}
  • The favoriteDrink value can then be overridden by the user when running helm install
  • Overriding values can be useful to make sure the app is configured in a way you want
Parameter Description Default
mysqlRootPassword Password for the root user. nil
mysqlUser Username of new user to create. nil
mysqlPassword Password for the new user. nil
mysqlDatabase Name for new database to create nil
persistence.enabled Create a volume to store data TRUE
persistence.size Size of persistent volume claim 8Gi RW
persistence.storageClass Type of persistent volume claim nil (uses alpha storage class annotation)

The Job Resource

  • Up until now we’ve always seen pods as long running services that don’t stop (like webservers)
  • You can also schedule pods as Jobs rather than with a ReplicationController / ReplicaSet
  • With a ReplicationController / ReplicaSet a pod will be indefinitely restarted if the pod stops
  • With the Job resource, pods are expected to run a specific task and then exit
  • There are 3 main types of jobs:
    1. Non parallel Jobs
    2. Parallel Jobs with fixed completion count
    3. Parallel Jobs with work queue
  • Non Parallel jobs:
    • The Job resource will monitor the job, and restart the pod if it fails or gets deleted
    • You can still influence this with a restartPolicy attribute
    • When the pod successfully completes, the job itself will be completed
  • An example of a non-parallel job:
apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    metadata:
      name: pi
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  • Parallel jobs with a fixed completion count:
    • In this case a Job can run multiple pods in parallel
    • You specify a fixed completion count
    • The job is complete when completion count == successful exited pods
    • To use this, add “completions: ” to the specification of your job
  • Parallel jobs with a work queue:
    • In this case the pods should coordinate with themselves (or an external service) to determine what each should work on
    • When any pod terminates with success, no more pods are created and the job is considered completed
      • This because the pods should know themselves when all the work is done, at the point when a pod terminates with success, all pods should terminate, because the job is completed

Scheduling Using the crontab resource

  • A Cron Job can schedule Job resources based on time
    • Once at a specified time
      • e.g. Run this Job once, this night at 3 AM
    • Recurrently
      • e.g. Run this Job every night at 3 AM
  • CronJob is comparable with crontab in Linux/Unix systems
  • CronJob Schedule format (Cron notation, see also https://en.wikipedia.org/wiki/Cron)

┌───────────── minute (0 - 59) │ ┌───────────── hour (0 - 23) │ │ ┌───────────── day of month (1 - 31) │ │ │ ┌───────────── month (1 - 12) │ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday; 7 is also Sunday on some systems) │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ * * * * *

Example 1: Every night at 3:20 AM 20 3 * * *

Example 2: Every 15 minutes */15 * * * *

CronJob

  • An example of a cronjob:
apiVersion: batch/v2alpha1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "25 3 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: my-cronjob
            image: busybox
            args:
            - /bin/sh
            - -c
            - echo This command runs every night at 3:25 AM
          restartPolicy: OnFailure

Deploying on Kubernetes Using Spinnaker

  • Spinnaker is a Continuous Delivery platform
  • It can automate deployments on different cloud providers:
    • AWS EC2, Google Compute/Container/App engine, Azure, Openstack, and Kubernetes
  • Works only for immutable (cloud-native) apps
  • Created at Netflix
  • Also integrates with Continuous Integration tools (Jenkins, Travis), with monitoring tools, and provides different deployment strategies
  • Spinnaker has 2 core sets of features:
    • Cluster Management:
      • View and manage your cluster resources in the cloud (or in this case, within Kubernetes)
    • Deployment Management:
      • Create and manage continuous delivery workflows
      • This is where you create the delivery pipeline
  • Deployment strategies: Source: https://www.spinnaker.io/concepts
_images/spinnaker1.jpg _images/spinnaker2.jpg

Terminology

  • The terminology in Spinnaker is a bit different than the terminology used in Kubernetes:
    • Account: maps to a credential able to authenticate against Kubernetes, and docker registries where your images are stored
    • Instance: maps to a Kubernetes pod
    • Server Group: maps to a Replica Set
    • Cluster: a “Spinnaker deployment” on Kubernetes
      • Spinnaker uses its own orchestration and is different from the Deployment on Kubernetes
    • Load Balancer: maps to a Kubernetes Service

Installation

spinnaker.yaml

# Define which registries and repositories you want available in your
# Spinnaker pipeline definitions
# For more info visit:
#   http://www.spinnaker.io/v1.0/docs/target-deployment-configuration#section-docker-registry

# Configure your Docker registries here
accounts:
- name: dockerhub
  address: https://index.docker.io
  repositories:
    - library/alpine
    - library/ubuntu
    - library/centos
    - library/nginx
    - wardviaene/spinnaker-node-demo
# - name: gcr
#   address: https://gcr.io
#   username: _json_key
#   password: '<INSERT YOUR SERVICE ACCOUNT JSON HERE>'
#   email: 1234@5678.com

# Settings for notifications via email
# For more info visit:
#   http://www.spinnaker.io/docs/notifications-and-events-guide#section-email
mail:
  enabled: false
  host: smtp.example.org
  username: admin
  password: admin
  fromAddress: spinnaker@example.org
  port: 25

# Images for each component
images:
  clouddriver: gcr.io/spinnaker-marketplace/clouddriver:0.5.0-72
  echo: gcr.io/spinnaker-marketplace/echo:0.4.0-72
  deck: gcr.io/spinnaker-marketplace/deck:1.3.0-72
  igor: gcr.io/spinnaker-marketplace/igor:0.4.0-72
  orca: gcr.io/spinnaker-marketplace/orca:0.5.0-72
  gate: gcr.io/spinnaker-marketplace/gate:0.5.0-72
  front50: gcr.io/spinnaker-marketplace/front50:0.4.1-72
  rosco: gcr.io/spinnaker-marketplace/rosco:0.4.0-72

# Change this if youd like to expose Spinnaker outside the cluster
deck:
  host: localhost
  port: 9000
  protocol: http

gate:
  allowedOriginsPattern: '^https?://(?:localhost|127.0.0.1|[^/]+\.example\.com)(?::[1-9]\d*)?/?$'

# Bucket to use when storing config data in S3 compatible storage
storageBucket: demo-spinnaker

# Change service type for UI service
serviceType: ClusterIP

# Resources to provide to each of
# the Spinnaker components
resources:
  limits:
    cpu: 1000m
    memory: 1280Mi
  requests:
    cpu: 1000m
    memory: 1280Mi

# Redis password to use for the in-cluster redis service
# Redis is not exposed publically
redis:
  redisPassword: password

# Minio access/secret keys for the in-cluster S3 usage
# Minio is not exposed publically
minio:
  enabled: true
  imageTag: RELEASE.2016-11-26T02-23-47Z
  serviceType: ClusterIP
  accessKey: spinnakeradmin
  secretKey: spinnakeradmin

gcs:
  enabled: false
  project: my-project-name
  jsonKey: '<INSERT CLOUD STORAGE JSON HERE>'

# Configuration for the Jenkins instance that is attached to the
# igor component of Spinnaker. For more info about the Jenkins integration
# with Spinnaker, visit:
#   http://www.spinnaker.io/docs/jenkins-script-execution-stage
jenkins:
  Master:
    ImageTag: 2.62
    Cpu: "500m"
    Memory: "512Mi"
    ServiceType: ClusterIP
    CustomConfigMap: true
    InstallPlugins:
      - kubernetes:0.11
      - workflow-aggregator:2.5
      - workflow-job:2.11
      - credentials-binding:1.12
      - git:3.2.0

  Agent:
    Image: viglesiasce/spinnaker-jenkins-agent
    ImageTag: v0.2.0
    Cpu: "500m"
    Memory: "512Mi"
  • Install with
helm install --name spinnaker -f spinnaker.yaml stable/spinnaker

Linkerd

  • On Kubernetes you can run a lot of microservices on one cluster
  • It can quickly become difficult to manage the endpoints of all the different services that make up an application within the cluster:
    • Service discovery in Kubernetes is pretty limited
    • Routing is often on a round-robin based
    • There is no failure handling
      • Except removing pods that fail their healthcheck
    • It’s also difficult to visualize the different services
  • HTTP APIs can be very basic and there’s no failure handling:
    • What happens when app A sends an HTTP GET request to app B, but B is temporary not available?
    • If not written in the APP, the request will just fail and will not retried
  • Linkerd can solve these issues for us, and provide us many more features
  • Linkerd is a transparent proxy that adds
    • service discovery
    • routing
      • Latency aware load balancing
      • It can shift traffic to do canary deployments
    • failure handling
      • Using retries, deadlines and circuit braking
    • and visibility
      • Using web UIs
  • How do the pods find linkerd?

node-name-test.yml

apiVersion: v1
kind: Pod
metadata:
  name: node-name-test
spec:
  restartPolicy: Never
  containers:
  - image: gcr.io/google_containers/busybox
    command: [ "sh", "-c" ]
    args:
      - while true; do
          echo -en '\n';
          nslookup $MY_NODE_NAME;
          echo -en '\n';
          printenv MY_NODE_NAME MY_POD_NAME MY_POD_NAMESPACE;
          printenv MY_POD_IP MY_POD_SERVICE_ACCOUNT;
          sleep 60;
        done;
    name: node-name
    env:
    - name: MY_NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    - name: MY_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: MY_POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: MY_POD_SERVICE_ACCOUNT
      valueFrom:
        fieldRef:
          fieldPath: spec.serviceAccountName
  • hello-world.yml implementation:
spec:
  dnsPolicy: ClusterFirst
  containers:
  - name: service
    image: buoyantio/helloworld:0.1.4
    env:
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    - name: POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: http_proxy
      value: $(NODE_NAME):4140
    args:
    - "-addr=:7777"
    - "-text=Hello"
    - "-target=world"
    ports:
    - name: service
      containerPort: 7777

Install linkerd

linkerd.yml

# runs linkerd in a daemonset, in linker-to-linker mode
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: l5d-config
data:
  config.yaml: |-
    admin:
      ip: 0.0.0.0
      port: 9990

    namers:
    - kind: io.l5d.k8s
      experimental: true
      host: localhost
      port: 8001

    telemetry:
    - kind: io.l5d.prometheus
    - kind: io.l5d.recentRequests
      sampleRate: 0.25

    usage:
      orgId: linkerd-examples-daemonset

    routers:
    - protocol: http
      label: outgoing
      dtab: |
        /srv        => /#/io.l5d.k8s/default/http;
        /host       => /srv;
        /svc        => /host;
        /host/world => /srv/world-v1;
      interpreter:
        kind: default
        transformers:
        - kind: io.l5d.k8s.daemonset
          namespace: default
          port: incoming
          service: l5d
      servers:
      - port: 4140
        ip: 0.0.0.0
      service:
        responseClassifier:
          kind: io.l5d.http.retryableRead5XX

    - protocol: http
      label: incoming
      dtab: |
        /srv        => /#/io.l5d.k8s/default/http;
        /host       => /srv;
        /svc        => /host;
        /host/world => /srv/world-v1;
      interpreter:
        kind: default
        transformers:
        - kind: io.l5d.k8s.localnode
      servers:
      - port: 4141
        ip: 0.0.0.0
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: l5d
  name: l5d
spec:
  template:
    metadata:
      labels:
        app: l5d
    spec:
      volumes:
      - name: l5d-config
        configMap:
          name: "l5d-config"
      containers:
      - name: l5d
        image: buoyantio/linkerd:1.2.0
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        args:
        - /io.buoyant/linkerd/config/config.yaml
        ports:
        - name: outgoing
          containerPort: 4140
          hostPort: 4140
        - name: incoming
          containerPort: 4141
        - name: admin
          containerPort: 9990
        volumeMounts:
        - name: "l5d-config"
          mountPath: "/io.buoyant/linkerd/config"
          readOnly: true

      - name: kubectl
        image: buoyantio/kubectl:v1.4.0
        args:
        - "proxy"
        - "-p"
        - "8001"
---
apiVersion: v1
kind: Service
metadata:
  name: l5d
spec:
  selector:
    app: l5d
  type: LoadBalancer
  ports:
  - name: outgoing
    port: 4140
  - name: incoming
    port: 4141
  - name: admin
    port: 9990
kubectl apply -f linkerd.yml
INGRESS_LB=$(kubectl get svc l5d -o jsonpath="{.status.loadBalancer.ingress[0].*}")
echo http://$INGRESS_LB:9990

Visualization:

linkerd-viz.yml

---
apiVersion: v1
kind: ReplicationController
metadata:
  name: linkerd-viz
  labels:
    name: linkerd-viz
spec:
  replicas: 1
  selector:
    name: linkerd-viz
  template:
    metadata:
      labels:
        name: linkerd-viz
    spec:
      containers:
      - name: linkerd-viz
        image: buoyantio/linkerd-viz:0.1.5
        args: ["k8s"]
        imagePullPolicy: Always
        env:
        - name: PUBLIC_PORT
          value: "3000"
        - name: STATS_PORT
          value: "9191"
        - name: SCRAPE_INTERVAL
          value: "30s"
        ports:
        - name: grafana
          containerPort: 3000
        - name: prometheus
          containerPort: 9191

      - name: kubectl
        image: buoyantio/kubectl:v1.4.0
        args:
        - "proxy"
        - "-p"
        - "8001"
---
apiVersion: v1
kind: Service
metadata:
  name: linkerd-viz
  labels:
    name: linkerd-viz
spec:
  type: LoadBalancer
  ports:
  - name: grafana
    port: 80
    targetPort: 3000
  - name: prometheus
    port: 9191
    targetPort: 9191
  selector:
    name: linkerd-viz
kubectl apply -f linkerd-viz.yml
VIZ_INGRESS_LB=$(kubectl get svc linkerd-viz -o jsonpath="{.status.loadBalancer.ingress[0].*}")
echo http://$VIZ_INGRESS_LB

Example app

hello-world.yml

---
apiVersion: v1
kind: ReplicationController
metadata:
  name: hello
spec:
  replicas: 3
  selector:
    app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      dnsPolicy: ClusterFirst
      containers:
      - name: service
        image: buoyantio/helloworld:0.1.4
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: http_proxy
          value: $(NODE_NAME):4140
        args:
        - "-addr=:7777"
        - "-text=Hello"
        - "-target=world"
        ports:
        - name: service
          containerPort: 7777
---
apiVersion: v1
kind: Service
metadata:
  name: hello
spec:
  selector:
    app: hello
  clusterIP: None
  ports:
  - name: http
    port: 7777
---
apiVersion: v1
kind: ReplicationController
metadata:
  name: world-v1
spec:
  replicas: 3
  selector:
    app: world-v1
  template:
    metadata:
      labels:
        app: world-v1
    spec:
      dnsPolicy: ClusterFirst
      containers:
      - name: service
        image: buoyantio/helloworld:0.1.4
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: TARGET_WORLD
          value: world
        args:
        - "-addr=:7778"
        ports:
        - name: service
          containerPort: 7778
---
apiVersion: v1
kind: Service
metadata:
  name: world-v1
spec:
  selector:
    app: world-v1
  clusterIP: None
  ports:
  - name: http
    port: 7778
kubectl create -f hello-world.yml
http_proxy=$INGRESS_LB:4140 curl -s http://hello

Examples are from https://github.com/linkerd/linkerd-examples/tree/master/k8s-daemonset/k8s/

Federation

  • Federation is still in alpha stage so use at own risk. One project worth following to use federation is the Kubernetes Multi Cluster (Federation v2) which targets 2018 Q4 for a beta release (https://github.com/kubernetes/community/tree/master/sig-multicluster).
  • Federation can be used to manage multiple clusters:
    • You can sync resources across clusters
      • The same deployment version will run on cluster A and cluster B
    • You can do cross cluster discovery
      • One DNS record / Virtual IP (VIP) for a resource spanning multiple clusters
      • Can help to achieve High Availability (spreading load across clusters, enabling failover between clusters)
  • A few reasons to run multiple clusters:
    • Lower latency for customers (bringing app geographically closer to customer)
    • Fault isolation when physical hardware (or a rack / full zone) fails
    • Scale beyond the limitations of one cluster
    • Run clusters in a hybrid cloud environment
      • On cluster can run on-premises, with a failover to cloud

Monitoring with Prometheus

  • Prometheus is an open source monitoring and alerting tool
    • You can compare it to the heapster + grafana setup from the first Kubernetes course, but Prometheus can do a lot more
  • It has been built at Soundcloud
  • Many companies and organizations have adopted it since
  • It is now a standalone and open source project
  • Prometheus joined the Cloud Native Computing Foundation in 2016
  • Prometheus provides a multi-dimensional data model
    • Time series identified by metric name and key/value pair
  • It has a flexible query language you can use
  • There is no distributed storage necessary
  • Metric collection happens via a pull model over HTTP
  • Push is also supported using a gateway
  • Service Discovery supported
  • Web UI (dashboard) with graphing capabilities
_images/prometheus1.jpg

Installation

  • Setup permissions for Prometheus

rbac.yml

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus-operator
rules:
- apiGroups:
  - extensions
  resources:
  - thirdpartyresources
  verbs:
  - "*"
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - "*"
- apiGroups:
  - monitoring.coreos.com
  resources:
  #- alertmanagers
  - prometheuses
  - servicemonitors
  verbs:
  - "*"
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs: ["*"]
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  verbs: ["*"]
- apiGroups: [""]
  resources:
  - pods
  verbs: ["list", "delete"]
- apiGroups: [""]
  resources:
  - services
  - endpoints
  verbs: ["get", "create", "update"]
- apiGroups: [""]
  resources:
  - nodes
  verbs: ["list", "watch"]
- apiGroups: [""]
  resources:
  - namespaces
  verbs: ["list"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-operator
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-operator
subjects:
- kind: ServiceAccount
  name: prometheus-operator
  namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources:
  - nodes
  - pods
  - resourcequotas
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: monitoring
kubectl create -f rbac.yml
  • Deploy Prometheus

prometheus-resource.yml

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  labels:
    prometheus: prometheus
spec:
  replicas: 2
  version: v1.7.0
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchExpressions:
    - {key: prometheus-enabled, operator: Exists}
  #ruleSelector:
  #  matchLabels:
  #    role: prometheus-rulefiles
  #    prometheus: k8s
  resources:
    requests:
      # 2Gi is default, but won't schedule if you don't have a node with >2Gi
      # memory. Modify based on your target and time-series count for
      # production use. This value is mainly meant for demonstration/testing
      # purposes.
      memory: 400Mi
  #alerting:
  #  alertmanagers:
  #  - namespace: monitoring
  #    name: alertmanager-main
  #    port: web

prometheus.yml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus-operator
  name: prometheus-operator
spec:
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: prometheus-operator
    spec:
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
        image: quay.io/coreos/prometheus-operator:v0.17.0
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http
        resources:
          limits:
            cpu: 200m
            memory: 100Mi
          requests:
            cpu: 100m
            memory: 50Mi
      serviceAccountName: prometheus-operator
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: LoadBalancer
  ports:
  - name: web
    port: 9090
    protocol: TCP
    targetPort: web
  selector:
    prometheus: prometheus
kubectl create -f prometheus.yml
kubectl create -f prometheus-resource.yml
  • Deploy Kubernetes monitoring

kubernetes-monitoring.yml

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler-prometheus-discovery
  labels:
    k8s-app: kube-scheduler
    prometheus-enabled: "true"
spec:
  selector:
    k8s-app: kube-scheduler
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager-prometheus-discovery
  labels:
    k8s-app: kube-controller-manager
    prometheus-enabled: "true"
spec:
  selector:
    k8s-app: kube-controller-manager
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  template:
    metadata:
      labels:
        app: node-exporter
      name: node-exporter
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - image:  quay.io/prometheus/node-exporter:v0.14.0
        args:
        - "-collector.procfs=/host/proc"
        - "-collector.sysfs=/host/sys"
        name: node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: scrape
        resources:
          requests:
            memory: 30Mi
            cpu: 100m
          limits:
            memory: 50Mi
            cpu: 200m
        volumeMounts:
        - name: proc
          readOnly:  true
          mountPath: /host/proc
        - name: sys
          readOnly: true
          mountPath: /host/sys
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: node-exporter
    k8s-app: node-exporter
    prometheus-enabled: "true"
  name: node-exporter
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http-metrics
    port: 9100
    protocol: TCP
  selector:
    app: node-exporter
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kube-state-metrics
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: quay.io/coreos/kube-state-metrics:v1.0.0
        ports:
        - name: metrics
          containerPort: 8080
        resources:
          requests:
            memory: 100Mi
            cpu: 100m
          limits:
            memory: 200Mi
            cpu: 200m
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kube-apiserver
  labels:
    k8s-app: apiserver
    prometheus-enabled: "true"
spec:
  jobLabel: component
  selector:
    matchLabels:
      component: apiserver
      provider: kubernetes
  namespaceSelector:
    matchNames:
    - default
  endpoints:
  - port: https
    interval: 30s
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      serverName: kubernetes
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubelet
  labels:
    k8s-app: kubelet
    prometheus-enabled: "true"
spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  - port: cadvisor
    interval: 30s
    honorLabels: true
  selector:
    matchLabels:
      k8s-app: kubelet
  namespaceSelector:
    matchNames:
    - kube-system
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
    prometheus-enabled: "true"
spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  selector:
    matchLabels:
      k8s-app: kube-controller-manager
  namespaceSelector:
    matchNames:
    - kube-system
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
    prometheus-enabled: "true"
spec:
  jobLabel: k8s-app
  endpoints:
  - port: http-metrics
    interval: 30s
  selector:
    matchLabels:
      k8s-app: kube-scheduler
  namespaceSelector:
    matchNames:
    - kube-system
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kube-state-metrics
  labels:
    k8s-app: kube-state-metrics
    prometheus-enabled: "true"
spec:
  jobLabel: k8s-app
  selector:
    matchLabels:
      k8s-app: kube-state-metrics
  namespaceSelector:
    matchNames:
    - monitoring
  endpoints:
  - port: http-metrics
    interval: 30s
    honorLabels: true
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: node-exporter
  labels:
    k8s-app: node-exporter
    prometheus-enabled: "true"
spec:
  jobLabel: k8s-app
  selector:
    matchLabels:
      k8s-app: node-exporter
  namespaceSelector:
    matchNames:
    - monitoring
  endpoints:
  - port: http-metrics
    interval: 30s
kubectl create -f kubernetes-monitoring.yml
  • Deploy example application

example-app.yml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: example-app
        image: fabxc/instrumented_app
        ports:
        - name: web
          containerPort: 8080
---
kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
    prometheus-enabled: "true"
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
    prometheus-enabled: "true"
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web