elasticsearch.md 25,8 КБ
Newer Older
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
1
# Elasticsearch integration **(STARTER ONLY)**
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
2
3
4
5
6
7
8
9
10

> [Introduced](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/109 "Elasticsearch Merge Request") in GitLab [Starter](https://about.gitlab.com/pricing/) 8.4. Support
> for [Amazon Elasticsearch](http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg.html) was [introduced](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1305) in GitLab
> [Starter](https://about.gitlab.com/pricing/) 9.0.

This document describes how to set up Elasticsearch with GitLab. Once enabled,
you'll have the benefit of fast search response times and the advantage of two
special searches:

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
11
12
- [Advanced Global Search](../user/search/advanced_global_search.md)
- [Advanced Syntax Search](../user/search/advanced_search_syntax.md)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
13
14

## Version Requirements
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
15

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<!-- Please remember to update ee/lib/system_check/app/elasticsearch_check.rb if this changes -->

| GitLab version | Elasticsearch version |
| -------------- | --------------------- |
| GitLab Enterprise Edition 8.4 - 8.17  | Elasticsearch 2.4 with [Delete By Query Plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/2.4/plugins-delete-by-query.html) installed |
| GitLab Enterprise Edition 9.0 - 11.4   | Elasticsearch 5.1 - 5.5 |
| GitLab Enterprise Edition 11.5+        | Elasticsearch 5.6 - 6.x |

## Installing Elasticsearch

Elasticsearch is _not_ included in the Omnibus packages. You will have to
install it yourself whether you are using the Omnibus package or installed
GitLab from source. Providing detailed information on installing Elasticsearch
is out of the scope of this document.

Once the data is added to the database or repository and [Elasticsearch is
enabled in the admin area](#enabling-elasticsearch) the search index will be
updated automatically. Elasticsearch can be installed on the same machine as
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
34
GitLab or on a separate server, or you can use the [Amazon Elasticsearch](http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg.html)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
35
36
37
38
39
40
41
service.

You can follow the steps as described in the [official web site](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html "Elasticsearch installation documentation") or
use the packages that are available for your OS.

## Elasticsearch repository indexer (beta)

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
42
In order to improve elasticsearch indexing performance, GitLab has made available a [new indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer).
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
43
44
This will replace the included Ruby indexer in the future but should be considered beta software for now, so there may be some bugs.

Greg Myers's avatar
Greg Myers включено в состав коммита
45
46
47
48
49
The Elasticsearch Go indexer is included in Omnibus for GitLab 11.8 and newer.

To use the new Elasticsearch indexer included in Omnibus, check the box "Use the new repository indexer (beta)"  when [enabling the Elasticsearch integration](#enabling-elasticsearch).

If you would like to use the Elasticsearch Go indexer with a source installation or an older version of GitLab, please follow the instructions below.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
50
51
52
53
54
55
56
57
58
59
60
61
62
63

### Installation

First, we need to install some dependencies, then we'll build and install
the indexer itself.

#### Dependencies

This project relies on [ICU](http://site.icu-project.org/) for text encoding,
therefore we need to ensure the development packages for your platform are
installed before running `make`.

##### Debian / Ubuntu

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
64
To install on Debian or Ubuntu, run:
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
65
66
67
68
69

```sh
sudo apt install libicu-dev
```

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
70
71
72
73
74
75
76
77
##### CentOS / RHEL

To install on CentOS or RHEL, run:

```sh
sudo yum install libicu-devel
```

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
78
79
80
81
82
83
84
85
86
87
88
89
90
91
##### Mac OSX

To install on macOS, run:

```sh
brew install icu4c
export PKG_CONFIG_PATH="/usr/local/opt/icu4c/lib/pkgconfig:$PKG_CONFIG_PATH"
```

#### Building and installing

To build and install the indexer, run:

```sh
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
92
git clone https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer.git
Greg Myers's avatar
Greg Myers включено в состав коммита
93
cd gitlab-elasticsearch-indexer
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
make
sudo make install
```

The `gitlab-elasticsearch-indexer` will be installed to `/usr/local/bin`.

You can change the installation path with the `PREFIX` env variable.
Please remember to pass the `-E` flag to `sudo` if you do so.

Example:

```sh
PREFIX=/usr sudo -E make install
```

Once installed, enable it under your instance's elasticsearch settings explained [below](#enabling-elasticsearch).

## System Requirements

Elasticsearch requires additional resources in excess of those documented in the
[GitLab system requirements](../install/requirements.md). These will vary by
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
115
installation size, but you should ensure **at least** an additional **8 GiB of RAM**
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
116
117
for each Elasticsearch node, per the [official guidelines](https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html).

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
118
119
120
121
Keep in mind, this is the **minimum requirements** as per Elasticsearch. For
production instances, they recommend considerably more resources.

Storage requirements also vary based on the installation side, but as a rule of
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
122
123
124
125
126
127
128
129
130
131
132
133
134
thumb, you should allocate the total size of your production database, **plus**
two-thirds of the total size of your git repositories. Efforts to reduce this
total are being tracked in this epic: [gitlab-org&153](https://gitlab.com/groups/gitlab-org/-/epics/153).

## Enabling Elasticsearch

In order to enable Elasticsearch, you need to have admin access. Go to
**Admin > Settings > Integrations** and find the "Elasticsearch" section.

The following Elasticsearch settings are available:

| Parameter                           | Description |
| ---------                           | ----------- |
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
135
| `Elasticsearch indexing`            | Enables/disables Elasticsearch indexing. You may want to enable indexing but disable search in order to give the index time to be fully completed, for example. Also, keep in mind that this option doesn't have any impact on existing data, this only enables/disables background indexer which tracks data changes. So by enabling this you will not get your existing data indexed, use special rake task for that as explained in [Adding GitLab's data to the Elasticsearch index](#adding-gitlabs-data-to-the-elasticsearch-index). |
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
136
137
| `Use the new repository indexer (beta)` | Perform repository indexing using [GitLab Elasticsearch Indexer](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). |
| `Search with Elasticsearch enabled` | Enables/disables using Elasticsearch in search. |
Evan Read's avatar
Evan Read включено в состав коммита
138
| `URL`                              | The URL to use for connecting to Elasticsearch. Use a comma-separated list to support clustering (e.g., `http://host1, https://host2:9200`). If your Elasticsearch instance is password protected, pass the `username:password` in the URL (e.g., `http://<username>:<password>@<elastic_host>:9200/`). |
Nick Thomas's avatar
Nick Thomas включено в состав коммита
139
140
| `Number of Elasticsearch shards` | Elasticsearch indexes are split into multiple shards for performance reasons. In general, larger indexes need to have more shards. Changes to this value do not take effect until the index is recreated. You can read more about tradeoffs in the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#create-index-settings) |
| `Number of Elasticsearch replicas` | Each Elasticsearch shard can have a number of replicas. These are a complete copy of the shard, and can provide increased query performance or resilience against hardware failure. Increasing this value will greatly increase total disk space required by the index. |
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
| `Limit namespaces and projects that can be indexed` | Enabling this will allow you to select namespaces and projects to index. All other namespaces and projects will use database search instead. Please note that if you enable this option but do not select any namespaces or projects, none will be indexed. [Read more below](#limiting-namespaces-and-projects).
| `Using AWS hosted Elasticsearch with IAM credentials` | Sign your Elasticsearch requests using [AWS IAM authorization](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) or [AWS EC2 Instance Profile Credentials](http://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-iam-instance-profile.html#getting-started-create-iam-instance-profile-cli). The policies must be configured to allow `es:*` actions. |
| `AWS Region` | The AWS region your Elasticsearch service is located in. |
| `AWS Access Key` | The AWS access key. |
| `AWS Secret Access Key` | The AWS secret access key. |

### Limiting namespaces and projects

If you select `Limit namespaces and projects that can be indexed`, more options will become available
![limit namespaces and projects options](img/limit_namespaces_projects_options.png)

You can select namespaces and projects to index exclusively. Please note that if the namespace is a group it will include
any sub-groups and projects belonging to those sub-groups to be indexed as well.

You can filter the selection dropdown by writing part of the namespace or project name you're interested in.
![limit namespace filter](img/limit_namespace_filter.png)

NOTE: **Note**:
If no namespaces or projects are selected, no Elasticsearch indexing will take place.

CAUTION: **Warning**:
If you have already indexed your instance, you will have to regenerate the index in order to delete all existing data
for filtering to work correctly. To do this run the rake tasks `gitlab:elastic:create_empty_index` and
Evan Read's avatar
Evan Read включено в состав коммита
164
`gitlab:elastic:clear_index_status`. Afterwards, removing a namespace or a project from the list will delete the data
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
165
166
167
168
169
170
171
172
173
174
from the Elasticsearch index as expected.

## Disabling Elasticsearch

To disable the Elasticsearch integration:

1. Navigate to the **Admin > Settings > Integrations**
1. Find the 'Elasticsearch' section and uncheck 'Search with Elasticsearch enabled'
   and 'Elasticsearch indexing'
1. Click **Save** for the changes to take effect
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
175
1. (Optional) Delete the existing index by running the command `sudo gitlab-rake gitlab:elastic:delete_index`
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
176
177
178
179
180

## Adding GitLab's data to the Elasticsearch index

### Indexing small instances (database size less than 500 MiB, size of repos less than 5 GiB)

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
181
Configure Elasticsearch's host and port in **Admin > Settings**. Then index the data using one of the following commands:
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
182
183
184
185
186
187
188
189
190

```sh
# Omnibus installations
sudo gitlab-rake gitlab:elastic:index

# Installations from source
bundle exec rake gitlab:elastic:index RAILS_ENV=production
```

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
191
After it completes the indexing process, [enable Elasticsearch searching](elasticsearch.md#enabling-elasticsearch).
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
192
193
194

### Indexing large instances

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
195
196
197
198
199
WARNING: **Warning**:
Performing asynchronous indexing, as this will describe, will generate a lot of sidekiq jobs.
Make sure to prepare for this task by either [Horizontally Scaling](../administration/high_availability/README.md#basic-scaling)
or creating [extra sidekiq processes](../administration/operations/extra_sidekiq_processes.md)

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
200
201
202
203
204
205
206
207
208
209
210
Configure Elasticsearch's host and port in **Admin > Settings > Integrations**. Then create empty indexes using one of the following commands:

```sh
# Omnibus installations
sudo gitlab-rake gitlab:elastic:create_empty_index

# Installations from source
bundle exec rake gitlab:elastic:create_empty_index RAILS_ENV=production
```

Indexing large Git repositories can take a while. To speed up the process, you
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
211
212
can temporarily disable auto-refreshing and replicating. In our experience, you can expect a 20%
decrease in indexing time. We'll enable them when indexing is done. This step is optional!
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
213
214
215
216
217
218
219
220
221

```bash
curl --request PUT localhost:9200/gitlab-production/_settings --data '{
    "index" : {
        "refresh_interval" : "-1",
        "number_of_replicas" : 0
    } }'
```

Markus Koller's avatar
Markus Koller включено в состав коммита
222
Then enable Elasticsearch indexing and run project indexing tasks:
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
223
224
225

```sh
# Omnibus installations
Markus Koller's avatar
Markus Koller включено в состав коммита
226
sudo gitlab-rake gitlab:elastic:index_projects
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
227
228

# Installations from source
Markus Koller's avatar
Markus Koller включено в состав коммита
229
bundle exec rake gitlab:elastic:index_projects RAILS_ENV=production
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
230
231
```

Markus Koller's avatar
Markus Koller включено в состав коммита
232
233
This enqueues a Sidekiq job for each project that needs to be indexed.
You can view the jobs in the admin panel (they are placed in the `elastic_indexer`
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
234
235
236
237
queue), or you can query indexing status using a rake task:

```sh
# Omnibus installations
Markus Koller's avatar
Markus Koller включено в состав коммита
238
sudo gitlab-rake gitlab:elastic:index_projects_status
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
239
240

# Installations from source
Markus Koller's avatar
Markus Koller включено в состав коммита
241
bundle exec rake gitlab:elastic:index_projects_status RAILS_ENV=production
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
242
243
244
245

Indexing is 65.55% complete (6555/10000 projects)
```

Markus Koller's avatar
Markus Koller включено в состав коммита
246
247
If you want to limit the index to a range of projects you can provide the
`ID_FROM` and `ID_TO` parameters:
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
248
249
250

```sh
# Omnibus installations
Markus Koller's avatar
Markus Koller включено в состав коммита
251
sudo gitlab-rake gitlab:elastic:index_projects ID_FROM=1001 ID_TO=2000
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
252
253

# Installations from source
Markus Koller's avatar
Markus Koller включено в состав коммита
254
bundle exec rake gitlab:elastic:index_projects ID_FROM=1001 ID_TO=2000 RAILS_ENV=production
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
255
256
257
```

Where `ID_FROM` and `ID_TO` are project IDs. Both parameters are optional.
Markus Koller's avatar
Markus Koller включено в состав коммита
258
The above examples will index all projects starting with ID `1001` up to (and including) ID `2000`.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
259

Markus Koller's avatar
Markus Koller включено в состав коммита
260
261
262
263
264
TIP: **Troubleshooting:**
Sometimes the project indexing jobs queued by `gitlab:elastic:index_projects`
can get interrupted. This may happen for many reasons, but it's always safe
to run the indexing task again - it will skip those repositories that have
already been indexed.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
265
266
267
268
269
270
271
272

As the indexer stores the last commit SHA of every indexed repository in the
database, you can run the indexer with the special parameter `UPDATE_INDEX` and
it will check every project repository again to make sure that every commit in
that repository is indexed, it can be useful in case if your index is outdated:

```sh
# Omnibus installations
Markus Koller's avatar
Markus Koller включено в состав коммита
273
sudo gitlab-rake gitlab:elastic:index_projects UPDATE_INDEX=true ID_TO=1000
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
274
275

# Installations from source
Markus Koller's avatar
Markus Koller включено в состав коммита
276
bundle exec rake gitlab:elastic:index_projects UPDATE_INDEX=true ID_TO=1000 RAILS_ENV=production
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
277
278
279
```

You can also use the `gitlab:elastic:clear_index_status` Rake task to force the
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
280
indexer to "forget" all progress, so retrying the indexing process from the
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
281
282
start.

Nick Thomas's avatar
Nick Thomas включено в состав коммита
283
284
285
The `index_projects` command enqueues jobs to index all project and wiki
repositories, and most database content. However, snippets still need to be
indexed separately. To do so, run one of these commands:
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
286
287
288

```sh
# Omnibus installations
Nick Thomas's avatar
Nick Thomas включено в состав коммита
289
sudo gitlab-rake gitlab:elastic:index_snippets
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
290
291

# Installations from source
Nick Thomas's avatar
Nick Thomas включено в состав коммита
292
bundle exec rake gitlab:elastic:index_snippets RAILS_ENV=production
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
293
294
295
296
297
298
299
300
301
302
303
304
```

Enable replication and refreshing again after indexing (only if you previously disabled it):

```bash
curl --request PUT localhost:9200/gitlab-production/_settings --data '{
    "index" : {
        "number_of_replicas" : 1,
        "refresh_interval" : "1s"
    } }'
```

Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
305
306
307
308
309
310
311
A force merge should be called after enabling the refreshing above.

For Elasticsearch 6.x, before proceeding with the force merge, the index should be in read-only mode:

```bash
curl --request PUT localhost:9200/gitlab-production/_settings --data '{
  "settings": {
Evan Read's avatar
Evan Read включено в состав коммита
312
    "index.blocks.write": true
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
313
314
315
316
  } }'
```

Then, initiate the force merge:
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
317
318

```bash
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
319
320
321
322
323
324
325
326
curl --request POST 'http://localhost:9200/gitlab-production/_forcemerge?max_num_segments=5'
```

After this, if your index is in read-only, switch back to read-write:

```bash
curl --request PUT localhost:9200/gitlab-production/_settings --data '{
  "settings": {
Evan Read's avatar
Evan Read включено в состав коммита
327
    "index.blocks.write": false
Achilleas Pipinellis's avatar
Achilleas Pipinellis включено в состав коммита
328
  } }'
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
329
330
331
332
```

Enable Elasticsearch search in **Admin > Settings > Integrations**. That's it. Enjoy it!

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
333
334
335
336
337
338
339
340
## GitLab Elasticsearch Rake Tasks

There are several rake tasks available to you via the command line:

- [sudo gitlab-rake gitlab:elastic:index](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
  - This is a wrapper task. It does the following:
    - `sudo gitlab-rake gitlab:elastic:create_empty_index`
    - `sudo gitlab-rake gitlab:elastic:clear_index_status`
Markus Koller's avatar
Markus Koller включено в состав коммита
341
342
343
344
345
    - `sudo gitlab-rake gitlab:elastic:index_projects`
    - `sudo gitlab-rake gitlab:elastic:index_snippets`
- [sudo gitlab-rake gitlab:elastic:index_projects](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
  - This iterates over all projects and queues sidekiq jobs to index them in the background.
- [sudo gitlab-rake gitlab:elastic:index_projects_status](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
346
347
348
349
350
351
352
353
354
355
356
  - This determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100.
- [sudo gitlab-rake gitlab:elastic:create_empty_index](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
  - This generates an empty index on the Elasticsearch side.
- [sudo gitlab-rake gitlab:elastic:clear_index_status](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
  - This deletes all instances of IndexStatus for all projects.
- [sudo gitlab-rake gitlab:elastic:delete_index](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
  - This removes the GitLab index on the Elasticsearch instance.
- [sudo gitlab-rake gitlab:elastic:recreate_index](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
  - Does the same thing as `sudo gitlab-rake gitlab:elastic:create_empty_index`
- [sudo gitlab-rake gitlab:elastic:index_snippets](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
  - Performs an Elasticsearch import that indexes the snippets data.
Blair Lunceford's avatar
Blair Lunceford включено в состав коммита
357
358
- [sudo gitlab-rake gitlab:elastic:projects_not_indexed](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
  - Displays which projects are not indexed.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374

### Environment Variables

In addition to the rake tasks, there are some environment variables that can be used to modify the process:

| Environment Variable | Data Type | What it does                                                                 |
| -------------------- |:---------:| ---------------------------------------------------------------------------- |
| `UPDATE_INDEX`       | Boolean   | Tells the indexer to overwrite any existing index data (true/false).         |
| `ID_TO`              | Integer   | Tells the indexer to only index projects less than or equal to the value.    |
| `ID_FROM`            | Integer   | Tells the indexer to only index projects greater than or equal to the value. |

### Indexing a specific project

Because the `ID_TO` and `ID_FROM` environment variables use the `or equal to` comparison, you can index only one project by using both these variables with the same project ID number:

```sh
Markus Koller's avatar
Markus Koller включено в состав коммита
375
root@git:~# sudo gitlab-rake gitlab:elastic:index_projects ID_TO=5 ID_FROM=5
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
Indexing project repositories...I, [2019-03-04T21:27:03.083410 #3384]  INFO -- : Indexing GitLab User / test (ID=33)...
I, [2019-03-04T21:27:05.215266 #3384]  INFO -- : Indexing GitLab User / test (ID=33) is done!
```

## Elasticsearch Index Scopes

When performing a search, the GitLab index will use the following scopes:

| Scope Name       | What it searches       |
| ---------------- | ---------------------- |
| `commits`        | Commit data            |
| `projects`       | Project data (default) |
| `blobs`          | Code                   |
| `issues`         | Issue data             |
| `merge_requests` | Merge Request data     |
| `milestones`     | Milestone data         |
| `notes`          | Note data              |
| `snippets`       | Snippet data           |
| `wiki_blobs`     | Wiki contents          |

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
396
397
398
399
400
401
## Tuning

### Deleted documents

Whenever a change or deletion is made to an indexed GitLab object (a merge request description is changed, a file is deleted from the master branch in a repository, a project is deleted, etc), a document in the index is deleted.  However, since these are "soft" deletes, the overall number of "deleted documents", and therefore wasted space, increases.  Elasticsearch does intelligent merging of segments in order to remove these deleted documents.  However, depending on the amount and type of activity in your GitLab installation, it's possible to see as much as 50% wasted space in the index.

Evan Read's avatar
Evan Read включено в состав коммита
402
In general, we recommend simply letting Elasticsearch merge and reclaim space automatically, with the default settings. From [Lucene's Handling of Deleted Documents](https://www.elastic.co/blog/lucenes-handling-of-deleted-documents "Lucene's Handling of Deleted Documents"), _"Overall, besides perhaps decreasing the maximum segment size, it is best to leave Lucene's defaults as-is and not fret too much about when deletes are reclaimed."_
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
403

Evan Read's avatar
Evan Read включено в состав коммита
404
However, some larger installations may wish to tune the merge policy settings:
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
405
406
407
408
409
410
411
412
413
414
415

- Consider reducing the `index.merge.policy.max_merged_segment` size from the default 5 GB to maybe 2 GB or 3 GB.  Merging only happens when a segment has at least 50% deletions.  Smaller segment sizes will allow merging to happen more frequently.

  ```bash
  curl --request PUT http://localhost:9200/gitlab-production/_settings --data '{
    "index" : {
      "merge.policy.max_merged_segment": "2gb"
    }
  }'
  ```

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
416
- You can also adjust `index.merge.policy.reclaim_deletes_weight`, which controls how aggressively deletions are targeted.  But this can lead to costly merge decisions, so we recommend not changing this unless you understand the tradeoffs.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431

  ```bash
  curl --request PUT http://localhost:9200/gitlab-production/_settings --data '{
    "index" : {
      "merge.policy.reclaim_deletes_weight": "3.0"
    }
  }'
  ```

- Do not do a [force merge](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html "Force Merge") to remove deleted documents.  A warning in the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html "Force Merge") states that this can lead to very large segments that may never get reclaimed, and can also cause significant performance or availability issues.

## Troubleshooting

Here are some common pitfalls and how to overcome them:

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
432
433
- **How can I verify my GitLab instance is using Elasticsearch?**

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
434
  The easiest method is via the rails console (`sudo gitlab-rails console`) by running the following:
Evan Read's avatar
Evan Read включено в состав коммита
435

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
436
437
438
439
440
  ```ruby
  u = User.find_by_username('your-username')
  s = SearchService.new(u, {:search => 'search_term'})
  pp s.search_objects.class.name
  ```
Evan Read's avatar
Evan Read включено в состав коммита
441

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
442
  If you see `Elasticsearch::Model::Response::Records`, you are using Elasticsearch.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
443

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
444
445
- **I updated GitLab and now I can't find anything**

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
446
447
448
  We continuously make updates to our indexing strategies and aim to support
  newer versions of Elasticsearch. When indexing changes are made, it may
  be necessary for you to [reindex](#adding-gitlabs-data-to-the-elasticsearch-index) after updating GitLab.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
449
450
451

- **I indexed all the repositories but I can't find anything**

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
452
  Make sure you indexed all the database data [as stated above](#adding-gitlabs-data-to-the-elasticsearch-index).
Evan Read's avatar
Evan Read включено в состав коммита
453

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
454
  Beyond that, check via the [Elasticsearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html) to see if the data shows up on the Elasticsearch side.
Evan Read's avatar
Evan Read включено в состав коммита
455

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
456
  If it shows up via the [Elasticsearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html), check that it shows up via the rails console (`sudo gitlab-rails console`):
Evan Read's avatar
Evan Read включено в состав коммита
457

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
458
459
460
461
462
  ```ruby
  u = User.find_by_username('your-username')
  s = SearchService.new(u, {:search => 'search_term', :scope => blobs})
  pp s.search_objects.to_a
  ```
Evan Read's avatar
Evan Read включено в состав коммита
463

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
464
  See [Elasticsearch Index Scopes](elasticsearch.md#elasticsearch-index-scopes) for more information on searching for specific types of data.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
465

Takuya Noguchi's avatar
Takuya Noguchi включено в состав коммита
466
- **I indexed all the repositories but then switched Elasticsearch servers and now I can't find anything**
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
467

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
468
  You will need to re-run all the rake tasks to re-index the database, repositories, and wikis.
Evan Read's avatar
Evan Read включено в состав коммита
469

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
470
471
- **The indexing process is taking a very long time**

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
472
  The more data present in your GitLab instance, the longer the indexing process takes.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
473

Blair Lunceford's avatar
Blair Lunceford включено в состав коммита
474
475
476
477
- **There are some projects that weren't indexed, but we don't know which ones**

  You can run `sudo gitlab-rake gitlab:elastic:projects_not_indexed` to display projects that aren't indexed.

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
478
479
- **No new data is added to the Elasticsearch index when I push code**

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
480
481
482
  When performing the initial indexing of blobs, we lock all projects until the project finishes indexing. It could
  happen that an error during the process causes one or multiple projects to remain locked. In order to unlock them,
  run the `gitlab:elastic:clear_locked_projects` rake task.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
483
484
485

- **"Can't specify parent if no parent field has been configured"**

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
  If you enabled Elasticsearch before GitLab 8.12 and have not rebuilt indexes you will get
  exception in lots of different cases:

  ```text
  Elasticsearch::Transport::Transport::Errors::BadRequest([400] {
      "error": {
          "root_cause": [{
              "type": "illegal_argument_exception",
              "reason": "Can't specify parent if no parent field has been configured"
          }],
          "type": "illegal_argument_exception",
          "reason": "Can't specify parent if no parent field has been configured"
      },
      "status": 400
  }):
  ```

  This is because we changed the index mapping in GitLab 8.12 and the old indexes should be removed and built from scratch again,
  see details in the [8-11-to-8-12 update guide](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/doc/update/8.11-to-8.12.md#11-elasticsearch-index-update-if-you-currently-use-elasticsearch).
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
505
506
507

- Exception `Elasticsearch::Transport::Transport::Errors::BadRequest`

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
508
509
  If you have this exception (just like in the case above but the actual message is different) please check if you have the correct Elasticsearch version and you met the other [requirements](#system-requirements).
  There is also an easy way to check it automatically with `sudo gitlab-rake gitlab:check` command.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
510
511
512

- Exception `Elasticsearch::Transport::Transport::Errors::RequestEntityTooLarge`

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
513
514
515
  ```text
  [413] {"Message":"Request size exceeded 10485760 bytes"}
  ```
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
516

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
517
518
519
520
  This exception is seen when your Elasticsearch cluster is configured to reject
  requests above a certain size (10MiB in this case). This corresponds to the
  `http.max_content_length` setting in `elasticsearch.yml`. Increase it to a
  larger size and restart your Elasticsearch cluster.
Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
521

Marcel Amirault's avatar
Marcel Amirault включено в состав коммита
522
523
524
  AWS has [fixed limits](http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-limits.html)
  for this setting ("Maximum Size of HTTP Request Payloads"), based on the size of
  the underlying instance.