Seldon-Core 基础实践之 Istio 和 seldon core 安装
一、 Istio安装
1、 Istio
Istio 是一个开源服务网格。如果您不熟悉术语服务网格,那么值得多读一点关于 Istio 的内容。
Seldon Core 可以与 istio 结合使用。 Istio 提供了一个入口网关,Seldon Core 可以自动将新部署连接到该网关。下面描述了使用 istio 的步骤。
1.1 下载
对于 Linux 和 macOS,下载 Istio 的最简单方法是使用以下命令:
# curl -L https://istio.io/downloadIstio | sh -
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.11.4 | sh -
# curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.6.8 TARGET_ARCH=x86_64 sh -
# 如果该地址拉不下来,则直接去github找资源
wget --no-check-certificate https://github.com/istio/istio/releases/download/1.11.4/istio-1.11.4-linux-amd64.tar.gz
tar -zxvf istio-1.11.4-linux-amd64.tar.gz -C /opt/modules/
[root@centos03 istio]# cd /opt/modules/istio-1.11.4/
[root@centos03 istio-1.11.4]# ls -l
总用量 28
drwxr-x---. 2 root root 22 10月 13 22:50 bin
-rw-r--r--. 1 root root 11348 10月 13 22:50 LICENSE
drwxr-xr-x. 5 root root 52 10月 13 22:50 manifests
-rw-r-----. 1 root root 854 10月 13 22:50 manifest.yaml
-rw-r--r--. 1 root root 5866 10月 13 22:50 README.md
drwxr-xr-x. 21 root root 4096 10月 13 22:50 samples
drwxr-xr-x. 3 root root 57 10月 13 22:50 tools
移动包目录:
cd istio-1.11.4
将istioctl
客户端添加到您的路径(Linux 或 macOS):
export PATH=$PWD/bin:$PATH
1.2 安装istio
Istio 提供了一个命令行工具 istioctl 来简化安装过程。演示配置文件有一组很好的默认值,可以在您的本地集群上使用。
> istioctl install --set profile=demo -y
操作:
[root@centos03 istio-1.11.4]# istioctl install --set profile=demo -y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Egress gateways installed
✔ Installation complete
Thank you for installing Istio 1.11. Please take a few minutes to tell us about your install/upgrade experience! https://forms.gle/kWULBRjUv7hHci7T6
[root@centos03 istio-1.11.4]#
查看pod:
kubectl describe pods kube-apiserver-db1 -n kube-system
查看日志:
[root@k8s-master01 ~]# kubectl describe pods istiod-b498dc7b7-9vmw9 -n istio-system
Name: istiod-b498dc7b7-9vmw9
Namespace: istio-system
Priority: 0
Node: k8s-node02/11.0.1.21
Start Time: Thu, 15 Sep 2022 08:26:44 -0700
Labels: app=istiod
install.operator.istio.io/owning-resource=unknown
istio=pilot
istio.io/rev=default
operator.istio.io/component=Pilot
pod-template-hash=b498dc7b7
sidecar.istio.io/inject=false
Annotations: cni.projectcalico.org/containerID: 5a22fc2322db8d8732581a9c6ea6ed1d37524aa317a57f102a4e0b73565f850d
cni.projectcalico.org/podIP: 10.233.123.37/32
cni.projectcalico.org/podIPs: 10.233.123.37/32
prometheus.io/port: 15014
prometheus.io/scrape: true
sidecar.istio.io/inject: false
Status: Pending
IP: 10.233.123.37
IPs:
IP: 10.233.123.37
Controlled By: ReplicaSet/istiod-b498dc7b7
Containers:
discovery:
Container ID:
Image: docker.io/istio/pilot:1.11.4
Image ID:
Ports: 8080/TCP, 15010/TCP, 15017/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
CLUSTER_ID: Kubernetes
Mounts:
/etc/cacerts from cacerts (ro)
/var/run/secrets/istio-dns from local-certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from istiod-token-96jb6 (ro)
/var/run/secrets/remote from istio-kubeconfig (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
local-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
cacerts:
Type: Secret (a volume populated by a Secret)
SecretName: cacerts
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned istio-system/istiod-b498dc7b7-9vmw9 to k8s-node02
Warning Failed 8m10s kubelet Failed to pull image "docker.io/istio/pilot:1.11.4": rpc error: code = Unknown desc = context canceled
Warning Failed 8m10s kubelet Error: ErrImagePull
Normal SandboxChanged 8m10s kubelet Pod sandbox changed, it will be killed and re-created.
Normal BackOff 8m6s (x3 over 8m8s) kubelet Back-off pulling image "docker.io/istio/pilot:1.11.4"
Warning Failed 8m6s (x3 over 8m8s) kubelet Error: ImagePullBackOff
Normal Pulling 7m53s (x2 over 17m) kubelet Pulling image "docker.io/istio/pilot:1.11.4"
[root@k8s-master01 ~]#
命名空间标签 istio-injection=enabled
指示 Istio 在我们在该命名空间中部署的任何内容旁边自动注入代理。我们将为我们的默认命名空间设置它:
kubectl label namespace default istio-injection=enabled
1.3 创建 Istio 网关
为了让 Seldon Core 使用 Istio 的功能来管理集群流量,我们需要通过运行以下命令来创建一个 Istio 网关:
kubectl apply -f - << END
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: seldon-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway # use istio default controller
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
END
操作:
[root@centos03 istio-1.11.4]# kubectl apply -f - << END
> apiVersion: networking.istio.io/v1alpha3
> kind: Gateway
> metadata:
> name: seldon-gateway
> namespace: istio-system
> spec:
> selector:
> istio: ingressgateway # use istio default controller
> servers:
> - port:
> number: 80
> name: http
> protocol: HTTP
> hosts:
> - "*"
> END
gateway.networking.istio.io/seldon-gateway created
二、K8S安装Seldon-Core
1、环境要求
安装要求:
- k8s >= 1.18
- Helm >= 3.0
- Istio >= 1.5
[root@centos03 ~]# helm version
version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}
2、使用Helm安装Seldon Core
2.1 创建命名空间
kubectl create namespace seldon-system
[root@centos03 ~]# kubectl create namespace seldon-system
namespace/seldon-system created
2.2 安装
现在我们可以在 seldon-system
命名空间中安装 Seldon Core。
helm install seldon-core seldon-core-operator \
--repo https://storage.googleapis.com/seldon-charts \
--set usageMetrics.enabled=true \
--set istio.enabled=true \
--namespace seldon-system
部署成功:
[root@centos03 ~]# helm install seldon-core seldon-core-operator \
> --repo https://storage.googleapis.com/seldon-charts \
> --set usageMetrics.enabled=true \
> --set istio.enabled=true \
> --namespace seldon-system
NAME: seldon-core
LAST DEPLOYED: Fri Dec 3 08:33:58 2021
NAMESPACE: seldon-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
您可以通过执行以下操作来检查您的 Seldon 控制器是否正在运行:
kubectl get pods -n seldon-system
您应该会看到一个 STATUS=Running 的 seldon-controller-manager pod
。
2.3.4 删除未启动的seldon-core服务
找到seldon-core命名空间下未启动的服务的deployment name:
[root@centos03 istio-1.11.4]# kubectl get deployment -n seldon-system
NAME READY UP-TO-DATE AVAILABLE AGE
seldon-controller-manager 0/1 1 0 86m
您在 /var/spool/mail/root 中有新邮件
删除该Pod:
[root@centos03 istio-1.11.4]# kubectl delete deployment seldon-controller-manager -n seldon-system
deployment.apps "seldon-controller-manager" deleted
[root@centos03 istio-1.11.4]#
重新安装
[root@centos03 ~]# helm install seldon-core seldon-core-operator \
--repo https://storage.googleapis.com/seldon-charts \
--set usageMetrics.enabled=true \
--set istio.enabled=true \
--namespace seldon-system
重新安装seldon-core时出现如下报错:Error: cannot re-use a name that is still in use
解决方案如下:
helm ls --all-namespaces
[root@centos03 ~]# helm ls --all-namespaces
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
notification-manager kubesphere-monitoring-system 1 2021-10-09 19:43:37.02050295 +0800 CST deployed notification-manager-1.0.0 1.0.0
seldon-core seldon-system 1 2021-12-03 08:33:58.992570322 +0800 CST deployed seldon-core-operator-1.11.2 1.11.2
snapshot-controller kube-system 8 2021-12-03 08:28:11.818233147 +0800 CST deployed snapshot-controller-0.1.0 2.1.1
[root@centos03 ~]#
[root@centos03 ~]# kubectl delete namespace seldon-system
namespace "seldon-system" deleted
[root@centos03 ~]# kubectl create namespace seldon-system
[root@centos03 ~]# helm install seldon-core seldon-core-operator \
--repo https://storage.googleapis.com/seldon-charts \
--set usageMetrics.enabled=true \
--set istio.enabled=true \
--namespace seldon-system
重启之后,可以看到 pod 一直处于pending
状态:
查看启动:
# kubectl get pods -n seldon-system
kubectl describe pod seldon-controller-manager-7b77d5988-7qnkk -n seldon-system
查看pod描述:
[root@centos03 ~]# kubectl describe pod seldon-controller-manager-7b77d5988-7qnkk -n seldon-system
Name: seldon-controller-manager-7b77d5988-7qnkk
Namespace: seldon-system
Priority: 0
Node: <none>
Labels: app=seldon
app.kubernetes.io/instance=seldon1
app.kubernetes.io/name=seldon
app.kubernetes.io/version=v0.5
control-plane=seldon-controller-manager
pod-template-hash=7b77d5988
Annotations: prometheus.io/scrape: true
sidecar.istio.io/inject: false
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/seldon-controller-manager-7b77d5988
Containers:
manager:
Image: docker.io/seldonio/seldon-core-operator:1.11.2
Ports: 4443/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/manager
Args:
--enable-leader-election
--webhook-port=4443
--create-resources=$(MANAGER_CREATE_RESOURCES)
--log-level=$(MANAGER_LOG_LEVEL)
--leader-election-id=$(MANAGER_LEADER_ELECTION_ID)
Limits:
cpu: 500m
memory: 300Mi
Requests:
cpu: 100m
memory: 200Mi
Environment:
MANAGER_LEADER_ELECTION_ID: a33bd623.machinelearning.seldon.io
MANAGER_LOG_LEVEL: INFO
WATCH_NAMESPACE:
RELATED_IMAGE_EXECUTOR:
RELATED_IMAGE_ENGINE:
RELATED_IMAGE_STORAGE_INITIALIZER:
RELATED_IMAGE_SKLEARNSERVER:
RELATED_IMAGE_XGBOOSTSERVER:
RELATED_IMAGE_MLFLOWSERVER:
RELATED_IMAGE_TFPROXY:
RELATED_IMAGE_TENSORFLOW:
RELATED_IMAGE_EXPLAINER:
RELATED_IMAGE_MOCK_CLASSIFIER:
MANAGER_CREATE_RESOURCES: false
POD_NAMESPACE: seldon-system (v1:metadata.namespace)
CONTROLLER_ID:
AMBASSADOR_ENABLED: true
AMBASSADOR_SINGLE_NAMESPACE: false
ENGINE_CONTAINER_IMAGE_AND_VERSION: docker.io/seldonio/engine:1.11.2
ENGINE_CONTAINER_IMAGE_PULL_POLICY: IfNotPresent
ENGINE_CONTAINER_SERVICE_ACCOUNT_NAME: default
ENGINE_CONTAINER_USER: 8888
ENGINE_LOG_MESSAGES_EXTERNALLY: false
PREDICTIVE_UNIT_HTTP_SERVICE_PORT: 9000
PREDICTIVE_UNIT_GRPC_SERVICE_PORT: 9500
PREDICTIVE_UNIT_DEFAULT_ENV_SECRET_REF_NAME:
PREDICTIVE_UNIT_METRICS_PORT_NAME: metrics
ENGINE_SERVER_GRPC_PORT: 5001
ENGINE_SERVER_PORT: 8000
ENGINE_PROMETHEUS_PATH: /prometheus
ISTIO_ENABLED: true
EXECUTOR_DEFAULT_CPU_LIMIT: 500m
EXECUTOR_DEFAULT_MEMORY_LIMIT: 512Mi
ENGINE_DEFAULT_CPU_REQUEST: 500m
ENGINE_DEFAULT_MEMORY_REQUEST: 512Mi
ENGINE_DEFAULT_CPU_LIMIT: 500m
ENGINE_DEFAULT_MEMORY_LIMIT: 512Mi
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from seldon-manager-token-j4sgs (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
cert:
Type: Secret (a volume populated by a Secret)
SecretName: seldon-webhook-server-cert
Optional: false
seldon-manager-token-j4sgs:
Type: Secret (a volume populated by a Secret)
SecretName: seldon-manager-token-j4sgs
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38h default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Warning FailedScheduling 38h default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Warning FailedScheduling 16s (x29 over 27m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
[root@centos03 ~]#
如果判断某个 Node 资源是否足够? 通过 kubectl describe node 查看 node 资源情况,关注以下信息:
Allocatable: 表示此节点能够申请的资源总和
Allocated resources: 表示此节点已分配的资源 (Allocatable 减去节点上所有 Pod 总的 Request)
前者与后者相减,可得出剩余可申请的资源。如果这个值小于 Pod 的 request,就不满足 Pod 的资源要求,Scheduler 在 Predicates (预选) 阶段就会剔除掉这个 Node,也就不会调度上去。
解决方案:
[root@centos03 ~]# cd /etc/systemd/system/kubelet.service.d/
[root@centos03 kubelet.service.d]# ls -l
总用量 4
-rw-r--r--. 1 root root 991 10月 9 19:36 10-kubeadm.conf
[root@centos03 kubelet.service.d]# cat 10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generate at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.222.12 --hostname-override=centos03 "
ExecStart=
ExecStart=/usr/local/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
[root@centos03 kubelet.service.d]#
systemd启动kubelet时会用10-kubeadm.conf中的ExecStart覆盖/lib/systemd/system/kubelet.service中的ExecStart,这样我们才能看到上面kubelet后面那一长溜命令行启动参数。我们要做的就是在这行启动参数后面添加上我们想设置的nodefs.available的threshold值。
出于配置风格一致的考量,我们定义一个新的Environment var,比如就叫:KUBELET_EVICTION_POLICY_ARGS
:
Environment="KUBELET_EVICTION_POLICY_ARGS=--eviction-hard=nodefs.available<5%" ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_EXTRA_ARGS $KUBELET_EVICTION_POLICY_ARGS
然后重启kubelet:
systemctl daemon-reload
systemctl restart kubelet
上边的问题是由于虚拟机CPU分配的过少,导致启动失败,在虚拟及配置调高CPU内存即可。
本地端口转发
因为您的 kubernetes 集群在本地运行,我们需要将您本地机器上的一个端口转发到集群中的一个端口,以便我们能够从外部访问它。你可以通过运行来做到这一点:
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
这会将任何流量从本地机器上的端口 8080 转发到集群内的端口 80。
三、部署模型
您现在已在本地集群上成功安装 Seldon Core,并准备开始将模型部署为生产微服务。
使用预先打包的模型服务器部署您的模型
我们为一些最流行的深度学习和机器学习框架提供优化的模型服务器,允许您部署经过训练的模型二进制文件/权重,而无需容器化或修改它们。
您只需将模型二进制文件上传到您喜欢的对象存储中,在这种情况下,我们在 Google 存储桶中有一个经过训练的 scikit-learn iris 模型:
gs://seldon-models/v1.12.0-dev/sklearn/iris/model.joblib
创建一个命名空间来运行你的模型:
kubectl create namespace seldon
然后,我们可以通过运行以下命令,使用为 scikit-learn (SKLEARN_SERVER) 预先打包的模型服务器将带有 Seldon Core 的模型部署到我们的 Kubernetes 集群:kubectl apply
$ kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
namespace: seldon
spec:
name: iris
predictors:
- graph:
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/v1.12.0-dev/sklearn/iris
name: classifier
name: default
replicas: 1
END
操作:
[root@centos03 ~]# kubectl apply -f - << END
> apiVersion: machinelearning.seldon.io/v1
> kind: SeldonDeployment
> metadata:
> name: iris-model
> namespace: seldon
> spec:
> name: iris
> predictors:
> - graph:
> implementation: SKLEARN_SERVER
> modelUri: gs://seldon-models/v1.12.0-dev/sklearn/iris
> name: classifier
> name: default
> replicas: 1
> END
seldondeployment.machinelearning.seldon.io/iris-model created
[root@centos03 ~]#
向您部署的模型发送 API 请求
部署的每个模型都公开了一个标准化的用户界面,以使用我们的 OpenAPI 模式发送请求。
这可以通过端点访问,http://<ingress_url>/seldon/<namespace>/<model-name>/api/v1.0/doc/
这将允许您直接通过浏览器发送请求。
http://192.168.222.12:8080/seldon/seldon/iris-model/api/v1.0/doc/
查看该pod暴露的服务:
[root@centos03 ~]# kubectl get svc --all-namespaces | grep seldon
seldon-system seldon-webhook-service ClusterIP 10.233.57.127 <none> 443/TCP 16d
seldon iris-model-default ClusterIP 10.233.53.254 <none> 8000/TCP,5001/TCP 6d15h
seldon iris-model-default-classifier ClusterIP 10.233.61.74 <none> 9000/TCP,9500/TCP 14d
在k8s集群测试服务:
$ curl -X POST http://10.233.53.254:8000/api/v1.0/predictions \
-H 'Content-Type: application/json' \
-d '{ "data": { "ndarray": [[1,2,3,4]] } }'
在服务器上执行:
[root@centos03 ~]# kubectl get svc --all-namespaces | grep seldon
seldon-system seldon-webhook-service ClusterIP 10.233.57.127 <none> 443/TCP 16d
seldon iris-model-default ClusterIP 10.233.53.254 <none> 8000/TCP,5001/TCP 6d15h
seldon iris-model-default-classifier ClusterIP 10.233.61.74 <none> 9000/TCP,9500/TCP 14d
[root@centos03 ~]# curl -X POST http://10.233.53.254:8000/seldon/seldon/iris-model/api/v1.0/predictions \
> -H 'Content-Type: application/json' \
> -d '{ "data": { "ndarray": [[1,2,3,4]] } }'
404 page not found
[root@centos03 ~]# curl -X POST http://10.233.53.254:8000/api/v1.0/predictions \
> -H 'Content-Type: application/json' \
> -d '{ "data": { "ndarray": [[1,2,3,4]] } }'
{"data":{"names":["t:0","t:1","t:2"],"ndarray":[[0.0006985194531162835,0.00366803903943666,0.995633441507447]]},"meta":{"requestPath":{"classifier":"seldonio/sklearnserver:1.11.2"}}}
[root@centos03 ~]#
使用语言包装器部署您的自定义模型
对于更多具有自定义依赖项(例如第 3 方库、操作系统二进制文件甚至外部系统)的自定义深度学习和机器学习用例,我们可以使用任何 Seldon Core 语言包装器。
您只需要编写一个类包装器来公开模型的逻辑;例如在 Python 中,我们可以创建一个文件Model.py:
import pickle
class Model:
def __init__(self):
self._model = pickle.loads( open("model.pickle", "rb") )
def predict(self, X):
output = self._model(X)
return output
我们现在可以使用Seldon Core s2i utils容器化我们的类文件来生成sklearn_iris图像:
s2i build . seldonio/seldon-core-s2i-python3:0.18 sklearn_iris:0.1
现在我们将它部署到我们的 Seldon Core Kubernetes 集群:
$ kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
namespace: model-namespace
spec:
name: iris
predictors:
- componentSpecs:
- spec:
containers:
- name: classifier
image: sklearn_iris:0.1
- graph:
name: classifier
name: default
replicas: 1
END
向您部署的模型发送 API 请求¶
部署的每个模型都公开了一个标准化的用户界面,以使用我们的 OpenAPI 模式发送请求。
这可以通过端点访问,http:///seldon///api/v1.0/doc/这将允许您直接通过浏览器发送请求。
相关文章:
Install Locally Seldon Core 官方文档
Seldon-Core K8S 官方文档
Istio官方文档
Istio 服务网格部署实践
Kubernetes节点资源耗尽状态的处理
Pod 一直处于 Pending 状态
深入了解高级生产机器学习集成
为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)