This seems to be mostly useless ai hype. Firstly it's quite impolite to assume all open sources projects are hosted on github/gitlab. That said, I uploaded sydbox.git temporarily to gitlab to have it scanned. It took 10 minutes to scan the whole project and it found a single vulnerability "RCE: IRC Message Command Execution Bypass" in file dev/bot.py which is our IRC script to run commands on the CTF server. Hilarious! Please do better :)
It's hard to evaluate such a tool. I scanned my OSS MCP server for databases at https://github.com/skanga/dbchat and it found 0 vulnerabilities. Now I'm wondering if my code is perfect :-) or the tool has issues!
Coincidentally, the IA tool of Semgrep just signalled me a real although very minor issue on some C project a couple of days ago. So I tried gecko on the same repository to see if it could detect anything else, but no. So I removed the fix from the github repo to see if gecko would also complain about the issue, but I believe I hit a bug in the UI: I deleted the previous project and created a new one, using the same github URL of course, and although gecko said that it started the scan, the list of scans stayed disapointingly empty.
Might be related to the fact that gecko does not support C apparently?
At least that's the impression I got from hovering the mouse cursor on the minuscule list of pictos below "Supported Languages".
Not supporting C and C++ in a tool looking for security issues is a bit of a bummer, no?
We’ve limited the free tier to one scan per user, so deleting a scan and starting a new one won’t work because of that restriction.
And yes, we don’t support C or C++ yet. Our focus is on detecting business logic vulnerabilities (auth bypasses, privilege escalations, IDORs) that traditional SAST tools often miss. The types of exploitable security issues typically found in C/C++ (mainly memory corruption type issues) are better found through fuzzing and dynamic testing rather than static analysis.
I understand it is not your focus, but fuzzing still falls short and there is a lot AI can help. For example, when there is checksum, fuzzers typically can't progress and it is "solved" by disabling checks when building for fuzzing. AI can just look at the source code doing the checksum and write the code to fill them in, or use its world knowledge to recognize the function is named sha_256 and import Python hashlib, etc.
Hint: we are working on this, and it can easily expand coverage in oss-fuzz even if those targets have been fuzzed for a long time with enormous amount of compute.
Althoug a lot of the popular attention is directed toward buffer overflows and use-after-free errors, that does not mean that C programs are free from the same business logic vulnerabilities as programs written in other languages, or even that those errors are less frequent; Just that buffer overflows are easier to detect.
The other language that I would put next on the priority list is Java, which gecko also seems to not support. I guess gecko is more web-oriented, which makes sense for a security tool, I suppose.
I imagine a cool way to get users to notice your tool would be to scan public Github repos with many followers, and comment on the code vulnerabilities.
We just need to follow responsible disclosure first by notifying the maintainers, working with them on a fix, and making it public once it is resolved.
This is one area I expect LLMs to really shine. I've tried a few static analysis tools for security, but it feels like the cookie cutter checks aren't that effective for catching anything but the most basic vulnerabilities. Having context on the actual purpose of the code seems like a great way to provide better scans without needing to a researcher for a deeper pentest.
I just started a scan on an open source project I was looking at, but I would love to see you add Elixir to the list of supported languages so that I can use this for my team's codebase!
Static analysis tools were the bane of my existence being security guy at a software provider. A customer insisted on running a popular one on our 20 million line code base. Two of us spent two weeks clearing false positives. Absolutely nothing was left.
> {"error":"EISDIR: illegal operation on a directory, read","details":"Error: EISDIR: illegal operation on a directory, read"}
which I only knew because I had the dev tools open, and not because the UI said that it encountered an error. I don't feel that security tools get a pass on bad UI error handling
Creating an accurate call graph is difficult, especially for dynamic languages such as JavaScript or TypeScript. The academia has spent decades of effort on this. I am wondering why your custom parser could do this much better. And, I am interested in how to store dynamic typing information into Protobuf's strong typing system.
Due to the limited context window, it is definitely unaffordable to provide the entire application's source code to the model. I am wondering what kind of "context" information is generally helpful for bug detection, like the call chain?
Thanks, we use a similar approach to GitHub's stack graphs (https://github.blog/open-source/introducing-stack-graphs/) to build a graph structure with definition/reference nodes. For dynamic typing in protobuf, we use the language compiler as an intermediary to resolve dynamic types into static relationships, then encode the relationships into protobuf.
Yes, we don't feed entire codebases to the LLM. The LLM queries our indexer for symbols names and code sections (exposed functions, data flow boundaries, sanitization functions) to build up the call chain and reason about the vulnerability.
Congrats on the launch.
How do you differentiate yourself from Corgea.com? Or general purpose AI code review solutions such as Cursor BugBot / GitHub Copilot Code Reviews / CodeRabbit?
Thank you. SAST tools built on AST or call graph parsing will struggle to detect code logic vulnerabilities because their models are too simplistic. They lose the language-specific semantics in dynamically typed languages where objects change at runtime, or in microservices where calls span multiple services. So they are limited to simple pattern-based detections and miss vulnerabilities that depend on long cross-file call chains and reflected function calls. These are the types of paths that auth bypasses and privilege escalations occur in.
AI code review tools aren’t designed for security analysis at all. They work using vector search or RAG to find relevant files, which is imprecise for retrieving these code paths in high token density projects. So any reasoning the LLM does is built on incomplete or incorrect context.
Our indexer uses LSIF for compiler-accurate symbol resolution so we can reconstruct full call chains, spanning files, modules, and services, with the same accuracy as an IDE. This code reasoning, tied with the LLM's threat modelling and analysis, allows for higher fidelity outputs.
Super cool! Just tried it out and it is giving me 100% confidence for two vulnerabilities (one 9.4, one 6.5) that aren't real -- how is that confidence calculated?
The confidence score is calculated by two factors: whether the function call chain represents a valid code path (programmatic correctness) and how well it aligns with the defined threat model for what it thinks is a security vulnerability. False positives usually occur from incorrect assumptions about context, for example, flagging endpoints as missing authentication when such behaviour is actually intended.
Was this an incorrect code path or an incorrect understanding of a security issue?
This is why we focus heavily on threat modelling and defining the security and business invariants that must hold. From a code level, the only context we can infer is through developer intent and data flow analysis.
Something we are working on is custom rules and allowing a user to add context when starting a scan to improve alignment and reduces false positives.
I feel for the poor engineers who will have to triage thousands of false positives because $boss was pitched this tool (or one of the competitors) as the true™ solution to all their security problems.
OK, but that's a criticism better aimed at... every security testing tool produced previous to this one, most especially Burp, the Microsoft Word of pentesting and the single greatest source of bullshit bounty submissions for over a decade running.
I wanted to check it out but the oauth flow is asking for permission to write my github email address and profile settings. Is this a bug? If not, what are these permissions needed for?
It also asks for permission to "act on my behalf" which I can understand would be necessary for agent-y stuff but it's not something I'm willing to hand over for a mere vuln scan.
It says "Profile (write) Manage a user's profile settings.", not write email address. The "Act on your behalf" permission is even worse. I agree with you that it should only be asking for read permissions on anything for this purpose.
This is a bug, the email-address permissions have been descoped to read-only. Profile settings are either read/write or none, hence the former. If you're concerned about privacy, sign up using email/password.
I was similarly put off but eventually figured out that you can merely create a normal email based login and point the tool to a publicly hosted git repository, which is nice.
No we didn't build one, we use the main foundation models. We have evals for each part of the workflow and different models perform better on different tasks, overall the majority of it uses Sonnet 4.
For all the vulns Gecko found they were manually validated by humans and have a CVE assigned by a CNA. The issue that curl had was because it was a paid bug bounty program they had an influx of AI slop reports that looked like real issues but weren't exploitable.
This seems to be mostly useless ai hype. Firstly it's quite impolite to assume all open sources projects are hosted on github/gitlab. That said, I uploaded sydbox.git temporarily to gitlab to have it scanned. It took 10 minutes to scan the whole project and it found a single vulnerability "RCE: IRC Message Command Execution Bypass" in file dev/bot.py which is our IRC script to run commands on the CTF server. Hilarious! Please do better :)
It's hard to evaluate such a tool. I scanned my OSS MCP server for databases at https://github.com/skanga/dbchat and it found 0 vulnerabilities. Now I'm wondering if my code is perfect :-) or the tool has issues!
Coincidentally, the IA tool of Semgrep just signalled me a real although very minor issue on some C project a couple of days ago. So I tried gecko on the same repository to see if it could detect anything else, but no. So I removed the fix from the github repo to see if gecko would also complain about the issue, but I believe I hit a bug in the UI: I deleted the previous project and created a new one, using the same github URL of course, and although gecko said that it started the scan, the list of scans stayed disapointingly empty.
> to see if it could detect anything else, but no
Might be related to the fact that gecko does not support C apparently? At least that's the impression I got from hovering the mouse cursor on the minuscule list of pictos below "Supported Languages". Not supporting C and C++ in a tool looking for security issues is a bit of a bummer, no?
We’ve limited the free tier to one scan per user, so deleting a scan and starting a new one won’t work because of that restriction.
And yes, we don’t support C or C++ yet. Our focus is on detecting business logic vulnerabilities (auth bypasses, privilege escalations, IDORs) that traditional SAST tools often miss. The types of exploitable security issues typically found in C/C++ (mainly memory corruption type issues) are better found through fuzzing and dynamic testing rather than static analysis.
I understand it is not your focus, but fuzzing still falls short and there is a lot AI can help. For example, when there is checksum, fuzzers typically can't progress and it is "solved" by disabling checks when building for fuzzing. AI can just look at the source code doing the checksum and write the code to fill them in, or use its world knowledge to recognize the function is named sha_256 and import Python hashlib, etc.
Hint: we are working on this, and it can easily expand coverage in oss-fuzz even if those targets have been fuzzed for a long time with enormous amount of compute.
Althoug a lot of the popular attention is directed toward buffer overflows and use-after-free errors, that does not mean that C programs are free from the same business logic vulnerabilities as programs written in other languages, or even that those errors are less frequent; Just that buffer overflows are easier to detect.
The other language that I would put next on the priority list is Java, which gecko also seems to not support. I guess gecko is more web-oriented, which makes sense for a security tool, I suppose.
Anyway, wish you lots of successes!
I imagine a cool way to get users to notice your tool would be to scan public Github repos with many followers, and comment on the code vulnerabilities.
Yes, that's exactly what we do. Some examples: https://github.com/eosphoros-ai/DB-GPT/pull/2650, https://github.com/dagster-io/dagster/pull/30002
We just need to follow responsible disclosure first by notifying the maintainers, working with them on a fix, and making it public once it is resolved.
This is one area I expect LLMs to really shine. I've tried a few static analysis tools for security, but it feels like the cookie cutter checks aren't that effective for catching anything but the most basic vulnerabilities. Having context on the actual purpose of the code seems like a great way to provide better scans without needing to a researcher for a deeper pentest.
I just started a scan on an open source project I was looking at, but I would love to see you add Elixir to the list of supported languages so that I can use this for my team's codebase!
Static analysis tools were the bane of my existence being security guy at a software provider. A customer insisted on running a popular one on our 20 million line code base. Two of us spent two weeks clearing false positives. Absolutely nothing was left.
Terence Tao wrote on "blue team" vs "red team" in cybersecurity and how "unreliable" AI is more suited to red team side. I found it very insightful.
https://news.ycombinator.com/item?id=44711306
We've had a few request for Elixir and it's definitely something we will work on.
> {"error":"EISDIR: illegal operation on a directory, read","details":"Error: EISDIR: illegal operation on a directory, read"}
which I only knew because I had the dev tools open, and not because the UI said that it encountered an error. I don't feel that security tools get a pass on bad UI error handling
Very interesting and cool project.
Creating an accurate call graph is difficult, especially for dynamic languages such as JavaScript or TypeScript. The academia has spent decades of effort on this. I am wondering why your custom parser could do this much better. And, I am interested in how to store dynamic typing information into Protobuf's strong typing system.
Due to the limited context window, it is definitely unaffordable to provide the entire application's source code to the model. I am wondering what kind of "context" information is generally helpful for bug detection, like the call chain?
Thanks, we use a similar approach to GitHub's stack graphs (https://github.blog/open-source/introducing-stack-graphs/) to build a graph structure with definition/reference nodes. For dynamic typing in protobuf, we use the language compiler as an intermediary to resolve dynamic types into static relationships, then encode the relationships into protobuf.
Yes, we don't feed entire codebases to the LLM. The LLM queries our indexer for symbols names and code sections (exposed functions, data flow boundaries, sanitization functions) to build up the call chain and reason about the vulnerability.
Congrats on the launch. How do you differentiate yourself from Corgea.com? Or general purpose AI code review solutions such as Cursor BugBot / GitHub Copilot Code Reviews / CodeRabbit?
Thank you. SAST tools built on AST or call graph parsing will struggle to detect code logic vulnerabilities because their models are too simplistic. They lose the language-specific semantics in dynamically typed languages where objects change at runtime, or in microservices where calls span multiple services. So they are limited to simple pattern-based detections and miss vulnerabilities that depend on long cross-file call chains and reflected function calls. These are the types of paths that auth bypasses and privilege escalations occur in.
AI code review tools aren’t designed for security analysis at all. They work using vector search or RAG to find relevant files, which is imprecise for retrieving these code paths in high token density projects. So any reasoning the LLM does is built on incomplete or incorrect context.
Our indexer uses LSIF for compiler-accurate symbol resolution so we can reconstruct full call chains, spanning files, modules, and services, with the same accuracy as an IDE. This code reasoning, tied with the LLM's threat modelling and analysis, allows for higher fidelity outputs.
Super cool! Just tried it out and it is giving me 100% confidence for two vulnerabilities (one 9.4, one 6.5) that aren't real -- how is that confidence calculated?
The confidence score is calculated by two factors: whether the function call chain represents a valid code path (programmatic correctness) and how well it aligns with the defined threat model for what it thinks is a security vulnerability. False positives usually occur from incorrect assumptions about context, for example, flagging endpoints as missing authentication when such behaviour is actually intended.
Was this an incorrect code path or an incorrect understanding of a security issue?
This is why we focus heavily on threat modelling and defining the security and business invariants that must hold. From a code level, the only context we can infer is through developer intent and data flow analysis.
Something we are working on is custom rules and allowing a user to add context when starting a scan to improve alignment and reduces false positives.
The security issue and POCs provided were not real like they said there was a vuln but I double checked it and it was not an exploitable vuln
https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-s... comes to mind.
I feel for the poor engineers who will have to triage thousands of false positives because $boss was pitched this tool (or one of the competitors) as the true™ solution to all their security problems.
OK, but that's a criticism better aimed at... every security testing tool produced previous to this one, most especially Burp, the Microsoft Word of pentesting and the single greatest source of bullshit bounty submissions for over a decade running.
I wanted to check it out but the oauth flow is asking for permission to write my github email address and profile settings. Is this a bug? If not, what are these permissions needed for?
It also asks for permission to "act on my behalf" which I can understand would be necessary for agent-y stuff but it's not something I'm willing to hand over for a mere vuln scan.
It says "Profile (write) Manage a user's profile settings.", not write email address. The "Act on your behalf" permission is even worse. I agree with you that it should only be asking for read permissions on anything for this purpose.
It was changed
This is a bug, the email-address permissions have been descoped to read-only. Profile settings are either read/write or none, hence the former. If you're concerned about privacy, sign up using email/password.
I was similarly put off but eventually figured out that you can merely create a normal email based login and point the tool to a publicly hosted git repository, which is nice.
did you build your own model? if not, which model performs the best so far?
No we didn't build one, we use the main foundation models. We have evals for each part of the workflow and different models perform better on different tasks, overall the majority of it uses Sonnet 4.
It reminds of AI bug reports in ffmpeg(was it ffmpeg?)
Was it not curl https://arstechnica.com/gadgets/2025/05/open-source-project-...
For all the vulns Gecko found they were manually validated by humans and have a CVE assigned by a CNA. The issue that curl had was because it was a paid bug bounty program they had an influx of AI slop reports that looked like real issues but weren't exploitable.