TraceroAI
Current build: RAG observability and LLMOps for tracing retrieval, inspecting context, evaluating groundedness, and debugging hallucination failures.
I am Chinmai, an Information Science undergraduate from DSCE. I like building AI that is not just impressive in a demo, but traceable, measurable, and useful when someone actually depends on it.
I keep coming back to one idea: AI should make decisions easier to inspect, not harder to trust.
I am studying Information Science at Dayananda Sagar College of Engineering, and most of my work sits in the space between AI models and real product behavior. I have built hiring evaluation infrastructure, astronaut health monitoring, an AI habit coach, and now TraceroAI for RAG observability and LLMOps.
What feels most like me: taking an AI idea, giving it a backend, a data contract, an interface, failure states, and enough evaluation that someone can reason about the output.
This is filtered from my resume and repos, so it reflects what shows up repeatedly in my projects instead of every tool I have ever touched.
If someone wants to understand how I think as an AI engineer, TraceroAI is the current build log and these are the proof-of-work systems around it.
Current build: RAG observability and LLMOps for tracing retrieval, inspecting context, evaluating groundedness, and debugging hallucination failures.
I built SignalStack because resumes alone do not prove engineering ability. It reads real repositories, extracts evidence, and turns proof-of-work into reviewable hiring signals.
This is my IEEE-linked health AI work expanded into a full-stack monitoring system: streaming vitals, time-series storage, model-serving APIs, and dashboard-ready risk outputs.
AI Coach is where I explored personal AI as a product: goals, habits, recent activity, and a coaching layer that responds with context instead of generic motivation.
A focused interface for asking about Chinmai's projects, metrics, skills, and engineering decisions.
Click each quest to unlock how I think about reliable AI systems. Every step maps to a real build.
Trace retrieval, inspect context, and evaluate groundedness so RAG answers can be debugged instead of blindly trusted.
Click anywhere outside the input, type a destination, and the portfolio routes there instantly.
traceroaigithublinkedinemailprojectscasesassistantcontactThese are the stories I would use in an interview: the problem, the engineering choice, the result, and what I learned.
Hiring systems often over-trust keywords, resumes, and ungrounded AI summaries.
Parse repositories, select source evidence, verify authorship signals, score capability separately from confidence, and show reviewers the audit trail.
The workflow surfaces fit score, code evidence, confidence, verification status, production-readiness signals, and recruiter decisions.
A reliable AI evaluator needs retrieval, deterministic checks, scoring design, and review UX, not only an LLM prompt.
Streaming health predictions are fragile when feature order, time windows, model artifacts, and warm-up behavior are not explicit.
Define a 144-step telemetry window, store raw/derived/context measurements, serve XGBoost and IsolationForest outputs, and expose documented API contracts.
The system returns risk labels, probabilities, anomaly scores, alert states, dominant drivers, and forecast outputs through product-facing endpoints.
The model is only useful when the data cadence, inference path, and dashboard semantics are engineered together.
A generic chatbot cannot coach well if it ignores goals, habits, recent logs, intent, and safety boundaries.
Inject only relevant user context, route by intent, block obvious prompt-injection patterns, cache stable responses, and keep mobile workflows simple.
Users can sign up, manage goals, log habits, track streaks, and receive coaching grounded in their actual behavior.
Useful AI products start with clean product state and careful context selection.
RAG apps can fail silently when developers cannot see retrieved chunks, context assembly, groundedness, or hallucination points.
Build an LLMOps workflow for tracing retrieval, inspecting context, evaluating groundedness, and debugging hallucination failures.
The live build frames TraceroAI as a developer-facing observability layer for retrieval quality and LLM behavior.
Reliable RAG needs instrumentation and evaluation surfaces as much as better prompts.
I want this portfolio to show judgment, not perfection. These are the lessons that changed how I build the next version.
This is probably the biggest rule I keep coming back to. If evidence is thin, the system should lower confidence instead of sounding more polished.
The astronaut platform taught me that a model is not always ready just because an endpoint exists. Readiness, warm-up, and missing data need UI states.
In AI Coach, better answers came from sending less context, but the right context. That lesson changed how I think about prompt design.
Fetched from @chinmai-sd-123. Status: fallback.
Right now I am focused on TraceroAI: RAG observability, retrieval traces, groundedness checks, and developer-facing LLM debugging.
My current work-in-motion: an end-to-end RAG observability and LLMOps project for retrieval traces, context inspection, groundedness evaluation, and hallucination debugging.
I am hardening the coach beyond MVP: memory, personalization, rate limits, deployment readiness, and better evaluation around response quality.
These are the topics I naturally come back to while building: evidence, latency, model contracts, and human review.
How SignalStack separates capability, evidence confidence, verification, and production readiness.
Discuss thisWhat real-time prediction systems need beyond a trained model: cadence, artifacts, APIs, and monitoring.
Discuss thisTraceroAI's core idea: make retrieval, context, groundedness, and hallucination failures visible to developers.
Discuss thisThe best way to evaluate me is simple: read the repos, inspect the systems, then ask me how I would make the next version stronger.
I am most interested in AI systems where reliability, evaluation, and user trust matter. Send me the problem, the constraints, and what a good answer should prove.
Bengaluru, India