README.md 9,2 КБ
Newer Older
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
1
# Scaling and High Availability
Drew Blessing's avatar
Drew Blessing включено в состав коммита
2
3
4
5
6
7

GitLab supports several different types of clustering and high-availability.
The solution you choose will be based on the level of scalability and
availability you require. The easiest solutions are scalable, but not necessarily
highly available.

Drew Blessing's avatar
Drew Blessing включено в состав коммита
8
9
10
11
12
13
14
15
GitLab provides a service that is usually essential to most organizations: it
enables people to collaborate on code in a timely fashion. Any downtime should
therefore be short and planned. Luckily, GitLab provides a solid setup even on
a single server without special measures. Due to the distributed nature
of Git, developers can still commit code locally even when GitLab is not
available. However, some GitLab features such as the issue tracker and
Continuous Integration are not available when GitLab is down.

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
16
**Keep in mind that all highly-available solutions come with a trade-off between
Drew Blessing's avatar
Drew Blessing включено в состав коммита
17
18
19
20
21
cost/complexity and uptime**. The more uptime you want, the more complex the
solution. And the more complex the solution, the more work is involved in
setting up and maintaining it. High availability is not free and every HA
solution should balance the costs against the benefits.

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
22
23
24
25
There are many options when choosing a highly-available GitLab architecture. We
recommend engaging with GitLab Support to choose the best architecture for your
use-case. This page contains some various options and guidelines based on
experience with GitLab.com and Enterprise Edition on-premises customers.
Drew Blessing's avatar
Drew Blessing включено в состав коммита
26

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
27
28
For a detailed insight into how GitLab scales and configures GitLab.com, you can
watch [this 1 hour Q&A](https://www.youtube.com/watch?v=uCU8jdYzpac)
Cassiana Gudgenov's avatar
Cassiana Gudgenov включено в состав коммита
29
with [John Northrup](https://gitlab.com/northrup), and live questions coming in from some of our customers.
Drew Blessing's avatar
Drew Blessing включено в состав коммита
30

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
31
## GitLab Components
Drew Blessing's avatar
Drew Blessing включено в состав коммита
32

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
33
34
35
The following components need to be considered for a scaled or highly-available
environment. In many cases components can be combined on the same nodes to reduce
complexity.
Drew Blessing's avatar
Drew Blessing включено в состав коммита
36

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
37
38
39
40
41
42
43
44
- Unicorn/Workhorse - Web-requests (UI, API, Git over HTTP)
- Sidekiq - Asynchronous/Background jobs
- PostgreSQL - Database
  - Consul - Database service discovery and health checks/failover
  - PGBouncer - Database pool manager
- Redis - Key/Value store (User sessions, cache, queue for Sidekiq)
  - Sentinel - Redis health check/failover manager
- Gitaly - Provides high-level RPC access to Git repositories
Drew Blessing's avatar
Drew Blessing включено в состав коммита
45

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
## Scalable Architecture Examples

When an organization reaches a certain threshold it will be necessary to scale
the GitLab instance. Still, true high availability may not be necessary. There
are options for scaling GitLab instances relatively easily without incurring the
infrastructure and maintenance costs of full high availability.

### Basic Scaling

This is the simplest form of scaling and will work for the majority of
cases. Backend components such as PostgreSQL, Redis and storage are offloaded
to their own nodes while the remaining GitLab components all run on 2 or more
application nodes.

This form of scaling also works well in a cloud environment when it is more
cost-effective to deploy several small nodes rather than a single
larger one.

- 1 PostgreSQL node
- 1 Redis node
- 1 NFS/Gitaly storage server
Ben Bodenmiller's avatar
Ben Bodenmiller включено в состав коммита
67
- 2 or more GitLab application nodes (Unicorn, Workhorse, Sidekiq)
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
68
- 1 Monitoring node (Prometheus, Grafana)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
69
70
71
72
73
74
75

#### Installation Instructions

Complete the following installation steps in order. A link at the end of each
section will bring you back to the Scalable Architecture Examples section so
you can continue with the next step.

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
76
77
78
79
1. [PostgreSQL](database.md#postgresql-in-a-scaled-environment)
1. [Redis](redis.md#redis-in-a-scaled-environment)
1. [Gitaly](gitaly.md) (recommended) or [NFS](nfs.md)
1. [GitLab application nodes](gitlab.md)
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
80
1. [Monitoring node (Prometheus and Grafana)](monitoring_node.md)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
81
82
83
84
85
86
87
88
89
90
91
92

### Full Scaling

For very large installations it may be necessary to further split components
for maximum scalability. In a fully-scaled architecture the application node
is split into separate Sidekiq and Unicorn/Workhorse nodes. One indication that
this architecture is required is if Sidekiq queues begin to periodically increase
in size, indicating that there is contention or not enough resources.

- 1 PostgreSQL node
- 1 Redis node
- 2 or more NFS/Gitaly storage servers
Ben Bodenmiller's avatar
Ben Bodenmiller включено в состав коммита
93
94
- 2 or more Sidekiq nodes
- 2 or more GitLab application nodes (Unicorn, Workhorse)
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
95
- 1 Monitoring node (Prometheus, Grafana)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138

## High Availability Architecture Examples

When organizations require scaling *and* high availability the following
architectures can be utilized. As the introduction section at the top of this
page mentions, there is a tradeoff between cost/complexity and uptime. Be sure
this complexity is absolutely required before taking the step into full
high availability.

For all examples below, we recommend running Consul and Redis Sentinel on
dedicated nodes. If Consul is running on PostgreSQL nodes or Sentinel on
Redis nodes there is a potential that high resource usage by PostgreSQL or
Redis could prevent communication between the other Consul and Sentinel nodes.
This may lead to the other nodes believing a failure has occurred and automated
failover is necessary. Isolating them from the services they monitor reduces
the chances of split-brain.

The examples below do not really address high availability of NFS. Some enterprises
have access to NFS appliances that manage availability. This is the best case
scenario. In the future, GitLab may offer a more user-friendly solution to
[GitLab HA Storage](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/2472).

There are many options in between each of these examples. Work with GitLab Support
to understand the best starting point for your workload and adapt from there.

### Horizontal

This is the simplest form of high availability and scaling. It requires the
fewest number of individual servers (virtual or physical) but does have some
trade-offs and limits.

This architecture will work well for many GitLab customers. Larger customers
may begin to notice certain events cause contention/high load - for example,
cloning many large repositories with binary files, high API usage, a large
number of enqueued Sidekiq jobs, etc. If this happens you should consider
moving to a hybrid or fully distributed architecture depending on what is causing
the contention.

- 3 PostgreSQL nodes
- 2 Redis nodes
- 3 Consul/Sentinel nodes
- 2 or more GitLab application nodes (Unicorn, Workhorse, Sidekiq, PGBouncer)
- 1 NFS/Gitaly server
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
139
- 1 Monitoring node (Prometheus, Grafana)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
140

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
141
![Horizontal architecture diagram](img/horizontal.png)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
142
143
144
145
146
147
148
149
150

### Hybrid

In this architecture, certain components are split on dedicated nodes so high
resource usage of one component does not interfere with others. In larger
environments this is a good architecture to consider if you foresee or do have
contention due to certain workloads.

- 3 PostgreSQL nodes
Drew Blessing's avatar
Drew Blessing включено в состав коммита
151
- 1 PgBouncer node
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
152
153
154
- 2 Redis nodes
- 3 Consul/Sentinel nodes
- 2 or more Sidekiq nodes
Drew Blessing's avatar
Drew Blessing включено в состав коммита
155
- 2 or more GitLab application nodes (Unicorn, Workhorse)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
156
- 1 or more NFS/Gitaly servers
Drew Blessing's avatar
Drew Blessing включено в состав коммита
157
- 1 Monitoring node (Prometheus, Grafana)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
158

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
159
![Hybrid architecture diagram](img/hybrid.png)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
160

Drew Blessing's avatar
Drew Blessing включено в состав коммита
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
#### Reference Architecture

- **Status:** Work-in-progress
- **Supported Users (approximate):** 10,000
- **Related Issues:** [gitlab-com/support/support-team-meta#1513](https://gitlab.com/gitlab-com/support/support-team-meta/issues/1513), 
 [gitlab-org/quality/team-tasks#110](https://gitlab.com/gitlab-org/quality/team-tasks/issues/110)

The Support and Quality teams are in the process of building and performance testing
an environment that will support about 10,000 users. The specifications below
are a work-in-progress representation of the work so far. Quality will be 
certifying this environment in FY20-Q2. The specifications may be adjusted 
prior to certification based on performance testing. 

- 3 PostgreSQL - 4 CPU, 8GB RAM
- 1 PgBouncer - 2 CPU, 4GB RAM
- 2 Redis - 2 CPU, 8GB RAM
Drew Blessing's avatar
Drew Blessing включено в состав коммита
177
- 3 Consul/Sentinel - 2 CPU, 2GB RAM
Drew Blessing's avatar
Drew Blessing включено в состав коммита
178
179
180
181
182
- 4 Sidekiq - 4 CPU, 8GB RAM
- 5 GitLab application nodes - 20 CPU, 64GB RAM
- 1 Gitaly - 20 CPU, 64GB RAM
- 1 Monitoring node - 4 CPU, 8GB RAM

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
### Fully Distributed

This architecture scales to hundreds of thousands of users and projects and is
the basis of the GitLab.com architecture. While this scales well it also comes
with the added complexity of many more nodes to configure, manage and monitor.

- 3 PostgreSQL nodes
- 4 or more Redis nodes (2 separate clusters for persistent and cache data)
- 3 Consul nodes
- 3 Sentinel nodes
- Multiple dedicated Sidekiq nodes (Split into real-time, best effort, ASAP,
  CI Pipeline and Pull Mirror sets)
- 2 or more Git nodes (Git over SSH/Git over HTTP)
- 2 or more API nodes (All requests to `/api`)
- 2 or more Web nodes (All other web requests)
- 2 or more NFS/Gitaly servers
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
199
- 1 Monitoring node (Prometheus, Grafana)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
200

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
201
![Fully Distributed architecture diagram](img/fully-distributed.png)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
202
203
204

The following pages outline the steps necessary to configure each component
separately:
Drew Blessing's avatar
Drew Blessing включено в состав коммита
205
206
207

1. [Configure the database](database.md)
1. [Configure Redis](redis.md)
Brett Walker's avatar
Brett Walker включено в состав коммита
208
   1. [Configure Redis for GitLab source installations](redis_source.md)
Drew Blessing's avatar
Drew Blessing включено в состав коммита
209
1. [Configure NFS](nfs.md)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
210
   1. [NFS Client and Host setup](nfs_host_client_setup.md)
Drew Blessing's avatar
Drew Blessing включено в состав коммита
211
212
1. [Configure the GitLab application servers](gitlab.md)
1. [Configure the load balancers](load_balancer.md)
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
213
1. [Monitoring node (Prometheus and Grafana)](monitoring_node.md)
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
214