Template
1
0
Fork 0
mirror of https://codeberg.org/forgejo/forgejo synced 2024-11-30 13:56:09 +01:00
Commit graph

22 commits

Author SHA1 Message Date
techknowlogick 033d92997f
Allow skipping forks and mirrors from being indexed (#23187)
This PR adds two new options to disable repo/code search indexing of
both forks and mirrors.

Related: #22842
2023-05-25 16:13:47 +08:00
wxiaoguang 6f9c278559
Rewrite queue (#24505)
# ⚠️ Breaking

Many deprecated queue config options are removed (actually, they should
have been removed in 1.18/1.19).

If you see the fatal message when starting Gitea: "Please update your
app.ini to remove deprecated config options", please follow the error
messages to remove these options from your app.ini.

Example:

```
2023/05/06 19:39:22 [E] Removed queue option: `[indexer].ISSUE_INDEXER_QUEUE_TYPE`. Use new options in `[queue.issue_indexer]`
2023/05/06 19:39:22 [E] Removed queue option: `[indexer].UPDATE_BUFFER_LEN`. Use new options in `[queue.issue_indexer]`
2023/05/06 19:39:22 [F] Please update your app.ini to remove deprecated config options
```

Many options in `[queue]` are are dropped, including:
`WRAP_IF_NECESSARY`, `MAX_ATTEMPTS`, `TIMEOUT`, `WORKERS`,
`BLOCK_TIMEOUT`, `BOOST_TIMEOUT`, `BOOST_WORKERS`, they can be removed
from app.ini.

# The problem

The old queue package has some legacy problems:

* complexity: I doubt few people could tell how it works.
* maintainability: Too many channels and mutex/cond are mixed together,
too many different structs/interfaces depends each other.
* stability: due to the complexity & maintainability, sometimes there
are strange bugs and difficult to debug, and some code doesn't have test
(indeed some code is difficult to test because a lot of things are mixed
together).
* general applicability: although it is called "queue", its behavior is
not a well-known queue.
* scalability: it doesn't seem easy to make it work with a cluster
without breaking its behaviors.

It came from some very old code to "avoid breaking", however, its
technical debt is too heavy now. It's a good time to introduce a better
"queue" package.

# The new queue package

It keeps using old config and concept as much as possible.

* It only contains two major kinds of concepts:
    * The "base queue": channel, levelqueue, redis
* They have the same abstraction, the same interface, and they are
tested by the same testing code.
* The "WokerPoolQueue", it uses the "base queue" to provide "worker
pool" function, calls the "handler" to process the data in the base
queue.
* The new code doesn't do "PushBack"
* Think about a queue with many workers, the "PushBack" can't guarantee
the order for re-queued unhandled items, so in new code it just does
"normal push"
* The new code doesn't do "pause/resume"
* The "pause/resume" was designed to handle some handler's failure: eg:
document indexer (elasticsearch) is down
* If a queue is paused for long time, either the producers blocks or the
new items are dropped.
* The new code doesn't do such "pause/resume" trick, it's not a common
queue's behavior and it doesn't help much.
* If there are unhandled items, the "push" function just blocks for a
few seconds and then re-queue them and retry.
* The new code doesn't do "worker booster"
* Gitea's queue's handlers are light functions, the cost is only the
go-routine, so it doesn't make sense to "boost" them.
* The new code only use "max worker number" to limit the concurrent
workers.
* The new "Push" never blocks forever
* Instead of creating more and more blocking goroutines, return an error
is more friendly to the server and to the end user.

There are more details in code comments: eg: the "Flush" problem, the
strange "code.index" hanging problem, the "immediate" queue problem.

Almost ready for review.

TODO:

* [x] add some necessary comments during review
* [x] add some more tests if necessary
* [x] update documents and config options
* [x] test max worker / active worker
* [x] re-run the CI tasks to see whether any test is flaky
* [x] improve the `handleOldLengthConfiguration` to provide more
friendly messages
* [x] fine tune default config values (eg: length?)

## Code coverage:

![image](https://user-images.githubusercontent.com/2114189/236620635-55576955-f95d-4810-b12f-879026a3afdf.png)
2023-05-08 19:49:59 +08:00
techknowlogick 92c160d8e7
Add meilisearch support (#23136)
Add meilisearch support

Fixes #20665
2023-03-28 22:23:23 -04:00
Lunny Xiao d845be661f
handle deprecated settings (#22992)
Fix #22736
2023-02-20 16:18:26 -06:00
Lunny Xiao c53ad052d8
Refactor the setting to make unit test easier (#22405)
Some bugs caused by less unit tests in fundamental packages. This PR
refactor `setting` package so that create a unit test will be easier
than before.

- All `LoadFromXXX` files has been splited as two functions, one is
`InitProviderFromXXX` and `LoadCommonSettings`. The first functions will
only include the code to create or new a ini file. The second function
will load common settings.
- It also renames all functions in setting from `newXXXService` to
`loadXXXSetting` or `loadXXXFrom` to make the function name less
confusing.
- Move `XORMLog` to `SQLLog` because it's a better name for that.

Maybe we should finally move these `loadXXXSetting` into the `XXXInit`
function? Any idea?

---------

Co-authored-by: 6543 <6543@obermui.de>
Co-authored-by: delvh <dev.lh@web.de>
2023-02-20 00:12:01 +08:00
flynnnnnnnnnn e81ccc406b
Implement FSFE REUSE for golang files (#21840)
Change all license headers to comply with REUSE specification.

Fix #16132

Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
2022-11-27 18:20:29 +00:00
6543 54e9ee37a7
format with gofumpt (#18184)
* gofumpt -w -l .

* gofumpt -w -l -extra .

* Add linter

* manual fix

* change make fmt
2022-01-20 18:46:10 +01:00
Gusted 1d98d205f5
Enable deprecation error for v1.17.0 (#18341)
Co-authored-by: Andrew Thornton <art27@cantab.net>
2022-01-20 18:00:38 +01:00
luzpaz e0296b6a6d
Fix various documentation, user-facing, and source comment typos (#16367)
* Fix various doc, user-facing, and source comment typos

Found via `codespell -q 3 -S ./options/locale,./vendor -L ba,pullrequest,pullrequests,readby`
2021-07-08 13:38:13 +02:00
zeripath ffbf35b7e9
Clean-up the settings hierarchy for issue_indexer queue (#16001)
There are a couple of settings in `[indexer]` relating to the `issue_indexer` queue
which override settings in unpredictable ways. This PR adjusts this hierarchy and makes
explicit that these settings are deprecated.

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: techknowlogick <techknowlogick@gitea.io>
2021-06-16 18:19:20 -04:00
zeripath c1a80b7d6a
Use filepath.ToSlash and Join in indexer defaults and queues (#15971)
As revealed by #15964 there is inconsistent use of filepath Join and path Join
for these directories. The best thing to do is to use filepath.Join but then ToSlash
them for consistency.

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: John Olheiser <john.olheiser@gmail.com>
2021-05-25 22:50:35 -04:00
zeripath 3aaf64885f
Change default queue settings to be low go-routines (#15964)
This PR suggests a change to the default configuration for queues:

* Use a common DATADIR for the queues
* Set starting workers to 0 and make boost a single worker

Signed-off-by: Andrew Thornton <art27@cantab.net>
2021-05-24 02:23:55 +03:00
zeripath 144fa5a537
Avoid setting the CONN_STR in issue indexer queue unless it is meant to be set (#13069)
Since the move to common leveldb and common redis the disk queue code (#12385)
will check the connection string before defaulting to the DATADIR.

Therefore we should ensure that the connection string is kept empty
unless it is actually set.

Unforunately the issue indexer was missed in #13025 this PR fixes this omission

Fix #13062

Signed-off-by: Andrew Thornton <art27@cantab.net>
2020-10-07 23:24:41 +01:00
Lunny Xiao 9bc69ff26e
Support elastic search for code search (#10273)
* Support elastic search for code search

* Finished elastic search implementation and add some tests

* Enable test on drone and added docs

* Add new fields to elastic search

* Fix bug

* remove unused changes

* Use indexer alias to keep the gitea indexer version

* Improve codes

* Some code improvements

* The real indexer name changed to xxx.v1

Co-authored-by: zeripath <art27@cantab.net>
2020-08-30 19:08:01 +03:00
Lauris BH 3c45cf8494
Add detected file language to code search (#10256)
Move langauge detection to separate module to be more reusable

Add option to disable vendored file exclusion from file search

Allways show all language stats for search
2020-02-20 16:53:55 -03:00
Lunny Xiao 5dbf36f356
Issue search support elasticsearch (#9428)
* Issue search support elasticsearch

* Fix lint

* Add indexer name on app.ini

* add a warnning on SearchIssuesByKeyword

* improve code
2020-02-13 14:06:17 +08:00
Lunny Xiao 89b4e0477b
Refactor code indexer (#9313)
* Refactor code indexer

* fix test

* fix test

* refactor code indexer

* fix import

* improve code

* fix typo

* fix test and make code clean

* fix lint
2019-12-23 20:31:16 +08:00
zeripath 167e8f18da
Restore Graceful Restarting & Socket Activation (#7274)
* Prevent deadlock in indexer initialisation during graceful restart

* Move from gracehttp to our own service to add graceful ssh

* Add timeout for start of indexers and make hammer time configurable

* Fix issue with re-initialization in indexer during tests

* move the code to detect use of closed to graceful

* Handle logs gracefully - add a pid suffix just before restart

* Move to using a cond and a holder for indexers

* use time.Since

* Add some comments and attribution

* update modules.txt

* Use zero to disable timeout

* Move RestartProcess to its own file

* Add cleanup routine
2019-10-15 14:39:51 +01:00
guillep2k 72f6d5c882 Restrict repository indexing by glob match (#7767)
* Restrict repository indexing by file extension

* Use REPO_EXTENSIONS_LIST_INCLUDE instead of REPO_EXTENSIONS_LIST_EXCLUDE and have a more flexible extension pattern

* Corrected to pass lint gosimple

* Add wildcard support to REPO_INDEXER_EXTENSIONS

* This reverts commit 72a650c8e4.

* Add wildcard support to REPO_INDEXER_EXTENSIONS (no make vendor)

* Simplify isIndexable() for better clarity

* Add gobwas/glob to vendors

* manually set appengine new release

* Implement better REPO_INDEXER_INCLUDE and REPO_INDEXER_EXCLUDE

* Add unit and integration tests

* Update app.ini.sample and reword config-cheat-sheet

* Add doc page and correct app.ini.sample

* Some polish on the doc

* Simplify code as suggested by @lafriks
2019-09-11 20:26:28 +03:00
Lunny Xiao e7d7dcb090 Issue indexer queue redis support (#6218)
* add redis queue

* finished indexer redis queue

* add redis vendor

* fix vet

* Update docs/content/doc/advanced/config-cheat-sheet.en-us.md

Co-Authored-By: lunny <xiaolunwen@gmail.com>

* switch to go mod

* Update required changes for new logging func signatures
2019-04-08 12:05:15 +03:00
Lunny Xiao 477ef46251
Add more tests and docs for issue indexer, add db indexer type for searching from database (#6144)
* add more tests and docs for issue indexer, add db indexer type for searching from database

* fix typo

* fix typo

* fix lint

* improve docs
2019-02-21 13:01:28 +08:00
Lunny Xiao 830ae61456 Refactor issue indexer (#5363) 2019-02-19 09:39:39 -05:00