What We Found: 225 Vulnerabilities in 45 Open Source Projects
We did an experiment last month.
We used kolega.dev on 45 open source projects. These weren't just random GitHub repos; but mature projects, worked on and used by real users in enterprise applications. Langfuse, Qdrant, NocoDB, Phase, Cloudreve, Agenta and Weaviate are all examples. We found 225 security holes in those 45 projects. So far, maintainers & security teams have reviewed 41 of our reports and merged 37 PR's, that's a 90% acceptance rate.
Kogela.dev now even appears twice in the LangFuse - Hall of Fame
What we really found
The bad news for maintainers is that the weaknesses weren't very complicated. They weren't new ways to attack or exploit, frighteningly they were just patterns that we see all the time in production code. The problem is legacy SAST pattern recognition scanning tools like Semgrep, the industry standard, not the project or the maintainers.
A double negative that messed up an auth check.
We found this in Phase (a tool for managing secrets, ironically):
Try to figure that out at 2 a.m. The goal is to "check permissions if user_id exists." But because of Python's operator precedence, not user_id is not None becomes (not user_id) is (not None), which is False is True when user_id exists. The check for permission never runs.
In total, there are nine ways to get around authorisation in Phase. All of these are due to small mistakes in logic.
RCE in vLLM through pickle deserialisation.
This code will run without weights_only=True. The user gave the path. In vLLM, which is one of the most popular LLM inference frameworks.
JWT validation that didn't check anything.
The code brought in the JWT library. Called the function to decode. Handled errors correctly. At first glance, it looked right. But verify_signature: False means that anyone can make fake tokens.
An ORDER BY clause made from user input.
The WHERE clause has parameters. The injection point is in the part that can't be changed.
Why it's hard to catch these bugs
None of these weaknesses follow simple patterns. There is no eval(user_input) or clear SQL concatenation.
if not user_id is not Noneis a double-negative logic error.user_can_access_environment(user_object)when it should beuser_can_access_environment(user_id)-type confusionverify_signature: Falsein a config dict is a semantic misconfiguration.Race conditions in async code that looks like it's running in order
The one thing they all have in common is that they are syntactically correct. The code works. It compiles. It even looks good if you skim it. The problem is with the meaning: the code doesn't do what it's supposed to.
Pattern-matching tools look for code patterns that are known to be bad. They catch eval(user_input) and SELECT * FROM users WHERE id = '" + id + "'". They can't catch "you passed a string where you needed an object" because that's not a pattern. It's a semantic mistake that needs to know what the code meant.
The projects
We chose projects from different groups on purpose:
LLM Infrastructure: Langfuse, vLLM, Agenta, LiteLLM, Ollama, ChromaDB
Vector Databases: Qdrant, Weaviate, Milvus
Infrastructure and DevOps: Portainer, Coolify, OpenObserve, SigNoz
Identity & Auth: Authentik, ZITADEL, Casdoor
Managing Secrets: Phase, Infisical
Data and Productivity: NocoDB, Plane, BookStack, Cloudreve
And about 20 more that we're still working on.
These aren't hard-to-find projects. Teams depend on them for infrastructure.
What the maintainers said
Please use responsible disclosure. Tell them in private, give them time to fix it, and then publish it.
Most of the maintainers were easy to work with. Here are some answers:
NocoDB said after we told them about SQL injection in their Oracle client and SSRF in attachment uploads | "Thanks again for sending the full report and suggestions for fixing the problems. We thank you for your helpful cooperation during the whole process." Merged several PRs. Fixes sent. |
Langfuse said after we told them about SSRF in their PostHog integration: | "I just sent out a fix because it was easy to do. We put you in our Hall of Fame!" Merged fix in less than 24 hours. |
Phase, after 9 times of getting around authorisation: | All 9 were fixed in PRs #722-731 within a week of our report. |
The numbers
Status | Vulnerabilities | Projects |
Reviewed & Accepted | 37 | 16 |
Reviewed & Rejected | 4 | 16 |
Awaiting Response | 135 | 20 |
In Progress | 49 | 7 |
Total | 225 | 45 |
Acceptance rate for reviewed findings: 90.24%
It's important because we're not the ones saying our results are good. It's the maintainers - people who know their codebases - who are saying that the vulnerabilities are real.
What we found, by type
Authorisation and Authentication Bypasses (31)
The largest group. Usually not missing auth checks, but rather auth checks that are there but don't work right. Confusion about types, mistakes in logic, and missing IDOR validation.
Phase had 9, and Agenta had 4, which included RestrictedPython sandbox escapes that let any code run.
SSRF (12)
URL validation that checks to see if a URL parses, but doesn't stop internal ranges or cloud metadata endpoints. Langfuse (PostHog integration), Weaviate (header-based URL override), NocoDB (attachment uploads), and Agenta (testset import and webhook endpoints) all have these features.
SQL Injection (8)
Not the clear string concatenation. The small things: partial parameterisation, ORDER BY clauses, and dynamic table names. The Oracle client from NocoDB was the worst.
Race Conditions (15)
TOCTOU patterns in code that runs asynchronously. Races to refresh tokens. Permission check/action holes. It's hard to find because the code looks like it's in order.
Timing Attacks (8)
For secrets, use string comparison instead of constant-time comparison. Found in Cloudreve's HMAC check.
Code Execution (6)
Unsafe deserialisation (vLLM's torch.load), sandbox escapes (Agenta's RestrictedPython), and command injection are all examples of this.
Problems with Cryptography (10)
Random number generators that can be predicted for operations that need security. Use math/rand instead of crypto/rand to make tokens.
How we work
We use a two-tiered approach:
Tier 1: Standard SAST (Semgrep), SCA for dependencies, and finding secrets. Catches patterns that are known.
Tier 2: AI Deep Code Scan. Tracking the flow of data across files, checking the flow of authentication, and checking the types of data across function boundaries. Catches what pattern matching doesn't.
Intensive triage using AI and deep semantic understanding of how the code works not just the syntax.
Automated PR's creation for issues, ready to review and merge
The gap
Standard SAST finds one type of bug: the obvious ones. Patterns that are known, functions that are known to be bad, and clear injection. That's useful.
SAST tools won't find Semantic vulnerabilities - logic errors, type confusion, auth flow bugs, and race conditions - don't fit into any patterns. They need to know what the code is supposed to do, not just what it does.
225 security holes in 45 projects that are still being worked on. 90% of maintainers said yes and accepted the PR's. These are the bugs that get through the SAST net.
All results are public
Every assessment is made public with all the technical information:
https://www.kolega.dev/security-wins/
Places where the code is. CWEs. Numbers for PR. Timelines for disclosure. You can check every claim.
Let us know if you think we made a mistake. We have made mistakes in the past and will admit it.
This study is still going on. The numbers show what was disclosed as of January 2026. We are still looking at new projects and working with maintainers on fixes.