We need to shorten the timeout to bound effectively for
computation size. This protects against "too big" repos.
This also protects to some extent against too long lines
if kept to very low values (basically so that grep cannot run out
of memory beforehand).
Docs-PR: forgejo/docs#812
Fix#31271.
When gogit is enabled, `IsObjectExist` calls
`repo.gogitRepo.ResolveRevision`, which is not correct. It's for
checking references not objects, it could work with commit hash since
it's both a valid reference and a commit object, but it doesn't work
with blob objects.
So it causes #31271 because it reports that all blob objects do not
exist.
(cherry picked from commit f4d3120f9d1de6a260a5e625b3ffa6b35a069e9b)
Conflicts:
trivial resolution because go-git support was dropped https://codeberg.org/forgejo/forgejo/pulls/4941
Support compression for Actions logs to save storage space and
bandwidth. Inspired by
https://github.com/go-gitea/gitea/issues/24256#issuecomment-1521153015
The biggest challenge is that the compression format should support
[seekable](https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md).
So when users are viewing a part of the log lines, Gitea doesn't need to
download the whole compressed file and decompress it.
That means gzip cannot help here. And I did research, there aren't too
many choices, like bgzip and xz, but I think zstd is the most popular
one. It has an implementation in Golang with
[zstd](https://github.com/klauspost/compress/tree/master/zstd) and
[zstd-seekable-format-go](https://github.com/SaveTheRbtz/zstd-seekable-format-go),
and what is better is that it has good compatibility: a seekable format
zstd file can be read by a regular zstd reader.
This PR introduces a new package `zstd` to combine and wrap the two
packages, to provide a unified and easy-to-use API.
And a new setting `LOG_COMPRESSION` is added to the config, although I
don't see any reason why not to use compression, I think's it's a good
idea to keep the default with `none` to be consistent with old versions.
`LOG_COMPRESSION` takes effect for only new log files, it adds `.zst` as
an extension to the file name, so Gitea can determine if it needs
decompression according to the file name when reading. Old files will
keep the format since it's not worth converting them, as they will be
cleared after #31735.
<img width="541" alt="image"
src="https://github.com/user-attachments/assets/e9598764-a4e0-4b68-8c2b-f769265183c9">
(cherry picked from commit 33cc5837a655ad544b936d4d040ca36d74092588)
Conflicts:
assets/go-licenses.json
go.mod
go.sum
resolved with make tidy
If the assign the pull request review to a team, it did not show the
members of the team in the "requested_reviewers" field, so the field was
null. As a solution, I added the team members to the array.
fix#31764
(cherry picked from commit 94cca8846e7d62c8a295d70c8199d706dfa60e5c)
There is no reason to reject initial dashes in git-grep
expressions... other than the code not supporting it previously.
A new method is introduced to relax the security checks.
- When people click on the logout button, a event is sent to all
browser tabs (actually to a shared worker) to notify them of this
logout. This is done in a blocking fashion, to ensure every registered
channel (which realistically should be one for every user because of the
shared worker) for a user receives this message. While doing this, it
locks the mutex for the eventsource module.
- Codeberg is currently observing a deadlock that's caused by this
blocking behavior, a channel isn't receiving the logout event. We
currently don't have a good theory of why this is being caused. This in
turn is causing that the logout functionality is no longer working and
people no longer receive notifications, unless they refresh the page.
- This patchs makes this message non-blocking and thus making it
consistent with the other messages. We don't see a good reason why this
specific event needs to be blocking and the commit introducing it
doesn't offer a rationale either.
See https://codeberg.org/forgejo/discussions/issues/164 for the
rationale and discussion of this change.
Everything related to the `go-git` dependency is dropped (Only a single
instance is left in a test file to test for an XSS, it requires crafting
an commit that Git itself refuses to craft). `_gogit` files have
been removed entirely, `go:build: !gogit` is removed, `XXX_nogogit.go` files
either have been renamed or had their code being merged into the
`XXX.go` file.
It is a waste of resources to scan them looking for matches
because they are never returned back - they appear as empty
lines in the current format.
Notably, even if they were returned, it is unlikely that matching
in binary files makes sense when the goal is "code search".
Analogously to how it happens for MaxResultLimit.
The default of 20 is inspired by a well-known, commercial code
hosting platform.
Unbounded limits are risky because they expose Forgejo to a class
of DoS attacks where queries are crafted to take advantage of
missing bounds.
Previous arch package grouping was not well-suited for complex or multi-architecture environments. It now supports the following content:
- Support grouping by any path.
- New support for packages in `xz` format.
- Fix clean up rules
<!--start release-notes-assistant-->
## Draft release notes
<!--URL:https://codeberg.org/forgejo/forgejo-->
- Features
- [PR](https://codeberg.org/forgejo/forgejo/pulls/4903): <!--number 4903 --><!--line 0 --><!--description c3VwcG9ydCBncm91cGluZyBieSBhbnkgcGF0aCBmb3IgYXJjaCBwYWNrYWdl-->support grouping by any path for arch package<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4903
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Exploding Dragon <explodingfkl@gmail.com>
Co-committed-by: Exploding Dragon <explodingfkl@gmail.com>
- Fix "WARNING: item list for enum is not a valid JSON array, using the
old deprecated format" messages from
https://github.com/go-swagger/go-swagger in the CI.
- `CheckOAuthAccessToken` returns both user ID and additional scopes
- `grantAdditionalScopes` returns AccessTokenScope ready string (grantScopes)
compiled from requested additional scopes by the client
- `userIDFromToken` sets returned grantScopes (if any) instead of default `all`
Provide a bit more journald integration. Specifically:
- support emission of printk-style log level prefixes, documented in [`sd-daemon`(3)](https://man7.org/linux/man-pages/man3/sd-daemon.3.html#DESCRIPTION), that allow journald to automatically annotate stderr log lines with their level;
- add a new "journaldflags" item that is supposed to be used in place of "stdflags" when under journald to reduce log clutter (i. e. strip date/time info to avoid duplication, and use log level prefixes instead of textual log levels);
- detect whether stderr and/or stdout are attached to journald by parsing `$JOURNAL_STREAM` environment variable and adjust console logger defaults accordingly.
<!--start release-notes-assistant-->
## Draft release notes
<!--URL:https://codeberg.org/forgejo/forgejo-->
- Features
- [PR](https://codeberg.org/forgejo/forgejo/pulls/2869): <!--number 2869 --><!--line 0 --><!--description bG9nOiBqb3VybmFsZCBpbnRlZ3JhdGlvbg==-->log: journald integration<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/2869
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Ivan Shapovalov <intelfx@intelfx.name>
Co-committed-by: Ivan Shapovalov <intelfx@intelfx.name>
- Fixes an XSS that was introduced in
https://codeberg.org/forgejo/forgejo/pulls/1433
- This XSS allows for `href`s in anchor elements to be set to a
`javascript:` uri in the repository description, which would upon
clicking (and not upon loading) the anchor element execute the specified
javascript in that uri.
- [`AllowStandardURLs`](https://pkg.go.dev/github.com/microcosm-cc/bluemonday#Policy.AllowStandardURLs) is now called for the repository description
policy, which ensures that URIs in anchor elements are `mailto:`,
`http://` or `https://` and thereby disallowing the `javascript:` URI.
It also now allows non-relative links and sets `rel="nofollow"` on
anchor elements.
- Unit test added.
Now that my colleague just posted a wonderful blog post https://blog.datalad.org/posts/forgejo-runner-podman-deployment/ on forgejo runner, some time I will try to add that damn codespell action to work on CI here ;) meanwhile some typos managed to sneak in and this PR should address them (one change might be functional in a test -- not sure if would cause a fail or not)
### Release notes
- [ ] I do not want this change to show in the release notes.
- [ ] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4857
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Co-committed-by: Yaroslav Halchenko <debian@onerussian.com>
These are the three conflicted changes from #4716:
* https://github.com/go-gitea/gitea/pull/31632
* https://github.com/go-gitea/gitea/pull/31688
* https://github.com/go-gitea/gitea/pull/31706
cc @earl-warren; as per discussion on https://github.com/go-gitea/gitea/pull/31632 this involves a small compatibility break (OIDC introspection requests now require a valid client ID and secret, instead of a valid OIDC token)
## Checklist
The [developer guide](https://forgejo.org/docs/next/developer/) contains information that will be helpful to first time contributors. There also are a few [conditions for merging Pull Requests in Forgejo repositories](https://codeberg.org/forgejo/governance/src/branch/main/PullRequestsAgreement.md). You are also welcome to join the [Forgejo development chatroom](https://matrix.to/#/#forgejo-development:matrix.org).
### Tests
- I added test coverage for Go changes...
- [ ] in their respective `*_test.go` for unit tests.
- [x] in the `tests/integration` directory if it involves interactions with a live Forgejo server.
### Documentation
- [ ] I created a pull request [to the documentation](https://codeberg.org/forgejo/docs) to explain to Forgejo users how to use this change.
- [ ] I did not document these changes and I do not expect someone else to do it.
### Release notes
- [ ] I do not want this change to show in the release notes.
- [ ] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.
<!--start release-notes-assistant-->
## Draft release notes
<!--URL:https://codeberg.org/forgejo/forgejo-->
- Breaking features
- [PR](https://codeberg.org/forgejo/forgejo/pulls/4724): <!--number 4724 --><!--line 0 --><!--description T0lEQyBpbnRlZ3JhdGlvbnMgdGhhdCBQT1NUIHRvIGAvbG9naW4vb2F1dGgvaW50cm9zcGVjdGAgd2l0aG91dCBzZW5kaW5nIEhUVFAgYmFzaWMgYXV0aGVudGljYXRpb24gd2lsbCBub3cgZmFpbCB3aXRoIGEgNDAxIEhUVFAgVW5hdXRob3JpemVkIGVycm9yLiBUbyBmaXggdGhlIGVycm9yLCB0aGUgY2xpZW50IG11c3QgYmVnaW4gc2VuZGluZyBIVFRQIGJhc2ljIGF1dGhlbnRpY2F0aW9uIHdpdGggYSB2YWxpZCBjbGllbnQgSUQgYW5kIHNlY3JldC4gVGhpcyBlbmRwb2ludCB3YXMgcHJldmlvdXNseSBhdXRoZW50aWNhdGVkIHZpYSB0aGUgaW50cm9zcGVjdGlvbiB0b2tlbiBpdHNlbGYsIHdoaWNoIGlzIGxlc3Mgc2VjdXJlLg==-->OIDC integrations that POST to `/login/oauth/introspect` without sending HTTP basic authentication will now fail with a 401 HTTP Unauthorized error. To fix the error, the client must begin sending HTTP basic authentication with a valid client ID and secret. This endpoint was previously authenticated via the introspection token itself, which is less secure.<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4724
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Shivaram Lingamneni <slingamn@cs.stanford.edu>
Co-committed-by: Shivaram Lingamneni <slingamn@cs.stanford.edu>
Was facing issues while writing unit tests for federation code. Mocks weren't catching all network calls, because was being out of scope of the mocking infra. Plus, I think we can have more granular tests.
This PR puts the client behind an interface, that can be retrieved from `ctx`. Context doesn't require initialization, as it defaults to the implementation available in-tree. It may be overridden when required (like testing).
## Mechanism
1. Get client factory from `ctx` (factory contains network and crypto parameters that are needed)
2. Initialize client with sender's keys and the receiver's public key
3. Use client as before.
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4853
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Aravinth Manivannan <realaravinth@batsense.net>
Co-committed-by: Aravinth Manivannan <realaravinth@batsense.net>
- If you have the external issue setting enabled, any reference would
have been rendered as an external issue, however this shouldn't be
happening to references that refer to issues in other repositories.
- Unit test added.
Mastodon with `AUTHORIZED_FETCH` enabled requires the `Host` header to
be signed too, add it to the default for `setting.Federation.GetHeaders`
and `setting.Federation.PostHeaders`.
For this to work, we need to sign the request later: not immediately
after `NewRequest`, but just before sending them out with `client.Do`.
Doing so also lets us use `setting.Federation.GetHeaders` (we were using
`.PostHeaders` even for GET requests before).
Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
Part of #24256.
Clear up old action logs to free up storage space.
Users will see a message indicating that the log has been cleared if
they view old tasks.
<img width="1361" alt="image"
src="https://github.com/user-attachments/assets/9f0f3a3a-bc5a-402f-90ca-49282d196c22">
Docs: https://gitea.com/gitea/docs/pulls/40
---------
Co-authored-by: silverwind <me@silverwind.io>
(cherry picked from commit 687c1182482ad9443a5911c068b317a91c91d586)
Conflicts:
custom/conf/app.example.ini
routers/web/repo/actions/view.go
trivial context conflict
Fix#31137.
Replace #31623#31697.
When migrating LFS objects, if there's any object that failed (like some
objects are losted, which is not really critical), Gitea will stop
migrating LFS immediately but treat the migration as successful.
This PR checks the error according to the [LFS api
doc](https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md#successful-responses).
> LFS object error codes should match HTTP status codes where possible:
>
> - 404 - The object does not exist on the server.
> - 409 - The specified hash algorithm disagrees with the server's
acceptable options.
> - 410 - The object was removed by the owner.
> - 422 - Validation error.
If the error is `404`, it's safe to ignore it and continue migration.
Otherwise, stop the migration and mark it as failed to ensure data
integrity of LFS objects.
And maybe we should also ignore others errors (maybe `410`? I'm not sure
what's the difference between "does not exist" and "removed by the
owner".), we can add it later when some users report that they have
failed to migrate LFS because of an error which should be ignored.
(cherry picked from commit 09b56fc0690317891829906d45c1d645794c63d5)
This is an implementation of a quota engine, and the API routes to
manage its settings. This does *not* contain any enforcement code: this
is just the bedrock, the engine itself.
The goal of the engine is to be flexible and future proof: to be nimble
enough to build on it further, without having to rewrite large parts of
it.
It might feel a little more complicated than necessary, because the goal
was to be able to support scenarios only very few Forgejo instances
need, scenarios the vast majority of mostly smaller instances simply do
not care about. The goal is to support both big and small, and for that,
we need a solid, flexible foundation.
There are thee big parts to the engine: counting quota use, setting
limits, and evaluating whether the usage is within the limits. Sounds
simple on paper, less so in practice!
Quota counting
==============
Quota is counted based on repo ownership, whenever possible, because
repo owners are in ultimate control over the resources they use: they
can delete repos, attachments, everything, even if they don't *own*
those themselves. They can clean up, and will always have the permission
and access required to do so. Would we count quota based on the owning
user, that could lead to situations where a user is unable to free up
space, because they uploaded a big attachment to a repo that has been
taken private since. It's both more fair, and much safer to count quota
against repo owners.
This means that if user A uploads an attachment to an issue opened
against organization O, that will count towards the quota of
organization O, rather than user A.
One's quota usage stats can be queried using the `/user/quota` API
endpoint. To figure out what's eating into it, the
`/user/repos?order_by=size`, `/user/quota/attachments`,
`/user/quota/artifacts`, and `/user/quota/packages` endpoints should be
consulted. There's also `/user/quota/check?subject=<...>` to check
whether the signed-in user is within a particular quota limit.
Quotas are counted based on sizes stored in the database.
Setting quota limits
====================
There are different "subjects" one can limit usage for. At this time,
only size-based limits are implemented, which are:
- `size:all`: As the name would imply, the total size of everything
Forgejo tracks.
- `size:repos:all`: The total size of all repositories (not including
LFS).
- `size:repos:public`: The total size of all public repositories (not
including LFS).
- `size:repos:private`: The total size of all private repositories (not
including LFS).
- `sizeall`: The total size of all git data (including all
repositories, and LFS).
- `sizelfs`: The size of all git LFS data (either in private or
public repos).
- `size:assets:all`: The size of all assets tracked by Forgejo.
- `size:assets:attachments:all`: The size of all kinds of attachments
tracked by Forgejo.
- `size:assets:attachments:issues`: Size of all attachments attached to
issues, including issue comments.
- `size:assets:attachments:releases`: Size of all attachments attached
to releases. This does *not* include automatically generated archives.
- `size:assets:artifacts`: Size of all Action artifacts.
- `size:assets:packages:all`: Size of all Packages.
- `size:wiki`: Wiki size
Wiki size is currently not tracked, and the engine will always deem it
within quota.
These subjects are built into Rules, which set a limit on *all* subjects
within a rule. Thus, we can create a rule that says: "1Gb limit on all
release assets, all packages, and git LFS, combined". For a rule to
stand, the total sum of all subjects must be below the rule's limit.
Rules are in turn collected into groups. A group is just a name, and a
list of rules. For a group to stand, all of its rules must stand. Thus,
if we have a group with two rules, one that sets a combined 1Gb limit on
release assets, all packages, and git LFS, and another rule that sets a
256Mb limit on packages, if the user has 512Mb of packages, the group
will not stand, because the second rule deems it over quota. Similarly,
if the user has only 128Mb of packages, but 900Mb of release assets, the
group will not stand, because the combined size of packages and release
assets is over the 1Gb limit of the first rule.
Groups themselves are collected into Group Lists. A group list stands
when *any* of the groups within stand. This allows an administrator to
set conservative defaults, but then place select users into additional
groups that increase some aspect of their limits.
To top it off, it is possible to set the default quota groups a user
belongs to in `app.ini`. If there's no explicit assignment, the engine
will use the default groups. This makes it possible to avoid having to
assign each and every user a list of quota groups, and only those need
to be explicitly assigned who need a different set of groups than the
defaults.
If a user has any quota groups assigned to them, the default list will
not be considered for them.
The management APIs
===================
This commit contains the engine itself, its unit tests, and the quota
management APIs. It does not contain any enforcement.
The APIs are documented in-code, and in the swagger docs, and the
integration tests can serve as an example on how to use them.
Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>