Page 1 of 1

AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Thu Feb 05, 2026 7:15 pm
by skr
I’ve spent most of my life in forums of various sorts, and one thing I’ve noticed is how much time it takes to “stay in the loop”… or to find that one thing you vaguely remember reading 5–10 years ago. We finally live in a time where improving this is actually practical.

We’ve got 86k+ posts plus the wiki, which is awesome… but it also means finding one specific pinout / fix / scope screenshot from a 30-page thread can feel like archaeology. The built-in search often finds threads, not the exact answer with references, so the same questions naturally loop around again.

I’m wondering if it’s worth exploring an AI-assisted search tool (RAG / retrieval-augmented generation). Not a “bot that answers everything”, and definitely not replacing the forum. More like a grounded technical librarian that can:
  • search by meaning + keywords (part numbers, params, error codes),
  • quote the most relevant snippets,
  • link directly to the source posts/wiki pages (so you can verify in one click).
Some imaginary examples:

Code: Select all

Query:
"Resolver wiring colors"

Top sources:

Forum: "Resolver wiring/pinout for <motor/inverter>" (link)

Forum: "Common resolver wiring mistakes + how to identify pairs" (link)

Answer (source-grounded):

Color codes vary a lot between harnesses, so don't trust colors alone.

Use the method described in the linked posts to identify:

Excitation pair (REF)

SIN pair

COS pair

Then verify with the simple measurements the posts recommend (scope/meter checks) before powering anything.

Links:

https://.../viewtopic.php?t=
...

https://.../viewtopic.php?t=
...

Code: Select all

Query:
"Pinout of connector J7 on board rev 3.2"

Result:
No exact source found that explicitly documents "J7 rev 3.2" pinout.

Closest matches:

Similar board revision thread with photos (link)

Earlier revision pinout (may differ) (link)

Note:
Tool refuses to guess without a cited source. It will only summarize/quote what it can actually find.
Important: I’m not promising I’ll build this next week. Done properly, this is a non-trivial infra + processing project. This thread is just to probe interest before anyone spends serious time on it or asks admins for anything.

That said, I think it could be done with minimal ongoing cost and without adding load to the forum (e.g., using exports rather than scraping, and keeping any indexing off-site). Details later if there’s appetite.

Questions:
  • Would a tool like this be useful to you, or is current search good enough?
  • If useful, what would you want it to do first? (e.g. “find the right thread”, “summarize long threads”, “bridge wiki+forum”, etc.)
  • Any strong objections/concerns to consider early? (privacy, accuracy, server load, community impact)
Disclaimer: I know there’s skepticism around LLMs, especially in a space where people wire up expensive and high-energy hardware. The only version of this I’d support is one that’s strictly source-grounded (quotes + links, “don’t know” when it can’t find it), i.e. an improved index, not a magic oracle.

Also worth discussing the philosophical/pscychological aspects:
could this reduce healthy forum participation? Could it adversely affect paid support dynamics which Openinverter team relies upon to cover costs? Any other ideas how this could be "good on paper" but with possibly ill effects?

I’m not pushing an agenda here, just starting a conversation. I’ve got enough projects already, but if there’s real interest, I could contribute time later.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Thu Feb 05, 2026 8:22 pm
by Jacobsmess
Generally I think it sounds like a good idea but I also thing the concerns raised are valid and may be best answered by moderators/admin/financiers.

One thing I'd absolutely be in favour for is an improved search. At present the search function is limited to 3 letter entries or more, this does limit a quite a few options in the automotive world.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Thu Feb 05, 2026 8:33 pm
by johu
I don't understand all implications yet but can assist if needed. It sounds like a fancy search function?

RE payed support there is little left. Damien quit this a while ago and I did a few months ago. Last man standing is Janosch, I think.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Thu Feb 05, 2026 9:48 pm
by skr
Jacobsmess totally agree on the current search limitations, especially the 3+ character minimum. I often find myself trying to come up with what I could add to search short sutff(IQ, OC, UV, etc.) and part numbers where exact matching matters.

Johu yes, at the core this is basically a fancy search function. The “fancy” part is that it combines keyword search (exact tokens: params, part numbers, acronyms) and semantic search (meaning-based: “resolver wiring colors”, “precharge clicks but inverter doesnt run”) and returns the best quoted snippets + direct links. Even without any “AI answering”, that alone would be a big usability jump.

Infra-wise, in theory it could run for years on a single small box, e.g. an Oracle Always Free ARM VPS, because 80k posts + wiki is not huge in storage/index terms. The only plumbing needed is a place for the indexes and a small search API to live on (hopefully any oracle region has any of the free tier Arm boxes available) and periodic public data export so no scraping is involved. A scheduled job runs on that box to ingest the newest dump and refresh indexes. Probably some extra magic could be added if media assets are sent through any of the free multimodal model endpoints for context aware tagging and referencing (ie "a scope waveform shot in a reverse engineering thread shows successful spin"). IIRC openrouter free endpoint had 50 free requests per day, maybe enough for daily media needs + processing archives backwards. All of this would probably need to be periodically backed up somewhere automatically, so that if it sits there for years abandoned, someone has a way to recover it.

Where a small LLM (1–3B) helps is that on top of the snippet retrieval, a “RAG-suited” tiny model could optionally produce a short sumary (e.g. “most common resolver wiring pitfalls”), extract some actionable checklists from long threads (“measure X, then Y, then Z”), merge wiki + forum context into a one-page answer while still linking every claim and finally highlight conflicting advice (“post A says 10k, post B says 4.7k”) instead of pretending there’s one truth.
So in the end the user is the person who decides on details by dwelling deeper. This search textbox would be a wise guide who has read every post and remembers it all (in theory). This all is in theory, as my LLM+RAG usage has been mostly local and I have never done a RAG hybrid search, but it sounds fun and doable on paper, in confines of a relatively small virtual box.
Problem is that running that tiny model on the free VPS would be sloow (typing messages for single client at the speed of a human tpying at best), but probably still useable. Multiuser sessions (if no bots hit endpoints) could start seeing problems beyond some point. Alternatively- a super guru that spits out blocks in front of your eyes would be either bring your own key (adds friction) or a rate limited key that uses super low cost inference model, where 100 queries end up costing a cent or less.

I could spend time on this when I am looking for things to do when I have to do something else, so I don't have to do what I have to do. At this moment some other projects of similar importance are in that buffer, but imo this would be a nice practice exercise to set up.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Sat Feb 07, 2026 11:14 pm
by skr
I just ran a quick greasemonkey textual scrape of Wiki (300ish pages) through mediawiki api (at sane human speeds) and used various tiny models, only with vector embedding database, so no actual hybrid search by query keywords. Results are pretty underwhelming, to say the least..
Model runs on CPU. Used only tiny models to simulate something which could in theory run on a dedicated lowest tier VPS. B in model name means how many parameters the model has, generally - the more it has, the smarter they are. The more parameters - the more unlikely it can run in a shoebox and needs a circular k shaped economy of datacenters to be sustained.

Most of the local models against 300ish page .md files in vector db returned utter and useless garbage, even with loose/tight hit guiderails with the following system prompt.

Code: Select all

You are a source-grounded technical assistant.

You will receive a user question and retrieved context snippets from a knowledge base.

Rules:
- Use ONLY the provided snippets. Do not use outside knowledge.
- If the answer is not in the snippets, say: "Not found in the provided sources."
- For every key claim, cite the supporting snippet/source link.
- Prefer short, actionable output: bullets, steps, checks.
- If sources disagree, say so and cite both.
- Do not guess pinouts/voltages/safety-critical steps. Ask for measurements instead.

Output:
1) Answer (bullets)
2) Evidence (links)
3) If not found: suggested search keywords
Here are the tiny models, able to run in a dirt cheap or free VPS or even a raspberry pi in a drawer:
image.png
image.png
image.png
When using against deepseek 3.2 endpoint on openrouter it became a bit more useful, but it would require actually paying for inference.
image.png
image.png
image.png
image.png
I will try setting up actual hybrid retrieval workflow and test again, now this is far from promising, most likely because I am doing it wrong :)
Maybe it starts turning more useful with forum posts in the same index, however I am not to keen on trying to scrape the forum.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Thu Feb 26, 2026 10:25 pm
by hjdlsnbc
This sounds interesting. I would like to try it out. Do not replace the old search functionality. Add this search functionality in addition.
I think having such a functionality could also be used the other way around. If you know something but do not know if it is already described in some wiki page, you could ask the AI about the topic. When the AI have no clue what you are talking about, you know starting a wiki page about this topic is required.
skr wrote: Thu Feb 05, 2026 9:48 pm and periodic public data export
How frequent would the AI learn about new wiki entries or new forum posts and could be asked about those? What are typical values other forum with such a AI-assisted search functionality use?

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Tue Mar 03, 2026 4:31 pm
by skr
Imo this could run on a weekly basis or along those lines. There's a sort of catch 22 here, I need to scrape the forum for this to be properly tested and verified, but I don't think scraping the forum is a good idea on live prod, even if higly rate limited.

I am now testing a similar idea for Visforvoltage.com forum, which seems to be on its last breath for the past few years getting db errors, err504 or simply not responding most of the time, and I am archiving it from internet archive cdx api, so what gets hit with traffic is internet archive, not a server standing on its last leg. Will see how it works out for that effort and see what learnings can be applied here, closer to production if Johu is willing to support this.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Fri Mar 06, 2026 7:37 pm
by johu
What would you need me to do on the server side?

Would it help if I moved the site to a docker container and then gave you access to that via ssh?

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Sat Mar 07, 2026 7:44 pm
by skr
I want to first try out local rag/hybrid stuff with visforvoltage.com forum vectrix section I now have locally- it's 2k threads + assets and should show some viability of this idea if it actually exists. Maybe I am delusional in the idea that this can be somewhat straight forward to set up without pulling hair, and in a way in which it actually adds any value.

Maybe no RAG or LLMs need to be involved, just a much better search framework is enough.

On OI server side imo basically a cron with db export script should run in a pre agreed upon format but let's leave it for somewhere in the future. Too early to discuss further at this point imo.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Sat Mar 07, 2026 11:09 pm
by jrbe
I tried searching "5.3v" here a couple days ago, nope. I tried a Google site search of the same, nope. It didn't find anything in wiki or forums, I had to go find it manually. Mini mainboard and gen 3 leaf drop in inverter board development thread both have it.

May be a good sanity check search term to see how this newer method works.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Sun Mar 08, 2026 10:05 pm
by skr
Another problem imo is that OI robots.txt + botwall seems to have killed any google-ability of anything OI, at least with good quality.

Even if I search on google for anything I know exists on OI, the results are sometimes not there or mostly without metadata.

Maybe something as simple as swapping out the phpbb search backend can solve most of the aforementioned internal search issues?
https://dannyda.com/2023/01/02/how-to-h ... -in-phpbb/ - here is ( https://sphinxsearch.com/ ) install for phpbb

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Mon Mar 09, 2026 9:47 pm
by johu
Hmm that's not intended. I though Anubis lets the the useful bots in.

Whats with robots.txt?

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Tue Mar 10, 2026 9:35 am
by Scrappyjoe
> Would it help if I moved the site to a docker container and then gave you access to that via ssh?

Johu, it would be better if you don't give anything or anyone access to the server running the production site. Typically when exposing data from a live site you want to minimise the attack surface and load on the site - so rather put in place a regular snapshot or backup procedure to a network accessible location, like a bucket, and allow _that_ location to be scraped for training. If you'd like assistance with this I can help, I am a data engineer in my day job.

That was what skr was getting at when he said

> The only plumbing needed is a place for the indexes and a small search API to live on (hopefully any oracle region has any of the free tier Arm boxes available) and periodic public data export so no scraping is involved.

What he means is, from the openinverter end, what would need to happen is a scheduled export of _public_ data to some public location. Once that's in place, skr will just hit that export instead. I'm not sure what sort of export options phpBB has, but a cron job on your box that runs an export and then puts the tarball in a bucket would be good enough.

Re: AI-assisted search for the forum/wiki (RAG “librarian”)

Posted: Tue Mar 10, 2026 3:17 pm
by skr
Yea, I don't want access to anything resembling the prod environment. At this point I don't want anything at all, before I validate that a bunch of data can be searched nicely on my own box, locally. Too many projects to tell when that will be.

Robots.txt seems like a full lockdown with User-agent: * Disallow: /

Code: Select all

User-agent: Googlebot
Allow: /

User-agent: Googlebot-Image
Allow: /

User-agent: bingbot
Allow: /

User-agent: DuckDuckBot
Allow: /

User-agent: ia_archiver
Allow: /

# AI / training / agent crawlers you want out
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ChatGPT Agent
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

# Default: allow normal search/indexing, or selectively disallow private paths. This needs attention.
User-agent: *
Allow: /
Something like this may allow the good ones through.

Also imo anubis needs to whitelist at least these:
Googlebot, bingbot, DuckDuckBot, ia_archiver

But this doesn't adress the bad actors slurping stuff up with mock useragents.

somewhere along the way meta tags/content for a lot of search results on google has gone missing.

Maybe searching for posts pre/post anubis could reveal more. A lot of stuff in results shows up as url and no content. I am no seo expert, so hard to comment wtf am I looking at.

This seems abandoned, but maybe a sitemap in robots.txt also could help for search results https://www.phpbb.com/community/viewtopic.php?t=2656243