Intelligent Knowledge Search
Enterprise semantic search over unstructured data — embeddings, transformer LLMs and vector databases with Azure AI Search for hybrid retrieval, enabling contextual Q&A, summarization and secure document discovery.
Agentic AI · Generative AI · Semantic Search & RAG · 18+ years in IT · 8+ years in GenAI.
From enterprise RAG and semantic search to autonomous agent workflows, I architect secure, production-grade AI on Python, FastAPI and Azure AI Search — and I'm now exploring where quantum computing meets machine learning.
My day-to-day is generative and agentic AI on enterprise data; my evenings are increasingly spent on the quantum side of machine learning.
RAG pipelines, semantic search and document intelligence that turn unstructured enterprise data into trustworthy answers.
Multi-agent systems that plan, call tools and complete workflows — built with modern orchestration frameworks.
Predictive modelling, classification and forecasting honed over a decade — the foundation everything else stands on.
Exploring quantum machine learning and variational circuits — preparing for the next shift in compute.
"Dream is not that which you see while sleeping; it is something that does not let you sleep." Dr. A.P.J. Abdul Kalam — from Gopinath's hometown, Ramanathapuram
I'm Gopinath Rathinakali — Project Manager and GenAI Lead at Infosys, Chennai. With 18+ years in IT and 8+ in generative AI, I've moved from classical analytics into building the semantic-search, RAG and agentic systems that enterprises now run on.
I work hands-on as an individual contributor with a strong ownership mindset — designing embedding and transformer pipelines, vector databases, and secure document processing with metadata filtering and role-based access — while mentoring teams on GenAI best practices.
I come from Ramanathapuram, the birthplace of Dr. A.P.J. Abdul Kalam, and that legacy of curiosity is exactly what's pulling me toward quantum machine learning next.
From the embedding model to the API boundary to the dashboard — the tools I design, build and secure with.
Flagship generative-AI builds (★) alongside the predictive-ML engagements that came before — each shown as problem, approach and outcome.
Enterprise semantic search over unstructured data — embeddings, transformer LLMs and vector databases with Azure AI Search for hybrid retrieval, enabling contextual Q&A, summarization and secure document discovery.
A GenAI-powered platform that automates the discovery, modelling, realization and deployment phases of delivery — compressing manual effort across the lifecycle.
Semantic-similarity search and clustering over support tickets using embeddings and LLM-based insights, with automated categorization and SOP generation.
For Johnson Controls: a predictive-maintenance solution across HVAC, fire & security and industrial refrigeration, using advanced analytics and real-time monitoring for proactive asset health.
A regression-based predictive pricing model forecasting OCTG pipe prices six months ahead, supporting proactive procurement, inventory optimization and cost control.
Extracts bank, branch, date, amount and MICR code from scanned cheques, writes to Excel and pushes to a database — replacing slow manual entry.
Estimates delivery dates across source, transhipment and destination ports. Benchmarked four regressors; RandomForestRegressor gave the strongest accuracy.
Predicts the optimal discount to win an open opportunity, plus its conversion probability — retrained on historical deal data at regular intervals.
Open to GenAI and agentic-AI roles, consulting and collaboration. The fastest way to reach me is email or a call.
The complete collection — twenty-five Tamil verses on fatherhood, his mother, his wife Maha, womanhood and the wars the world looks away from, written between the models and the metrics.
A study guide covering GenAI, LLMs, RAG, Semantic Search, ChromaDB, Hybrid Search, LangChain, LangGraph, MCP, and Agentic AI. Each section moves from fundamentals to deeper questions, with concrete examples.
Q1. What is Generative AI and how is it different from traditional (discriminative) AI?
Generative AI creates new content — text, images, code, audio — by learning the underlying distribution of training data. Discriminative AI instead learns boundaries to classify or predict labels.
In probability terms, discriminative models estimate P(y | x) (label given input), while generative models estimate P(x) or P(x | y) so they can sample new x.
Q2. What are foundation models?
Large models trained on broad, unlabeled data that can be adapted (via prompting or fine-tuning) to many downstream tasks. GPT, Claude, and Gemini are foundation models. The key idea is one model, many tasks — instead of training a separate model per task.
Q3. What are common GenAI failure modes you should design around?
Mitigation example: For a customer-support bot, ground answers in your knowledge base via RAG and instruct the model to say "I don't know" when context is missing, rather than guessing.
Q1. Explain the Transformer and why "attention" matters.
The Transformer processes all tokens in parallel and uses self-attention to let each token weigh the relevance of every other token. This captures long-range dependencies far better than RNNs and is highly parallelizable.
Example: In "The trophy didn't fit in the suitcase because it was too big," attention helps the model link "it" to "trophy" rather than "suitcase."
Q2. What is the difference between pre-training, fine-tuning, and prompting?
Rule of thumb: try prompting first, then RAG, then fine-tuning — in increasing order of cost and effort.
Q3. What are tokens, context window, and temperature?
0 = deterministic/focused; higher = more creative/varied.Example: Use temperature=0 for SQL generation (you want one correct answer); use 0.8 for brainstorming taglines.
Q4. What is the difference between fine-tuning and RAG? When use which?
Fine-tuning changes how the model behaves (style, format, reasoning patterns). RAG changes what the model knows at inference time by injecting fresh facts.
Q5. What is hallucination and how do you reduce it?
When the model generates plausible but incorrect content. Reduce it by grounding with RAG, lowering temperature, asking for citations, using "answer only from context" instructions, and adding verification steps.
Q1. What is RAG and why use it?
RAG retrieves relevant documents from an external store and feeds them into the LLM prompt so answers are grounded in your data — reducing hallucination and overcoming the training cutoff without retraining.
Q2. Walk through the RAG pipeline end to end.
Example prompt assembly: ``` Context: {retrieved_chunks}
Question: {user_question} Answer using ONLY the context above. If unknown, say you don't know. ```
Q3. Why does chunking strategy matter? What are common approaches?
Chunks that are too large dilute relevance and waste context; too small lose meaning. Common strategies: fixed-size with overlap, sentence/paragraph-based, and semantic chunking. Overlap (e.g., 50–100 tokens) preserves context across boundaries.
Q4. What metrics evaluate a RAG system?
Q5. What is "lost in the middle"?
LLMs attend better to information at the beginning and end of long contexts than the middle. So ordering retrieved chunks and re-ranking matters — put the most relevant chunks at the edges, and don't stuff too many.
Q6. What is re-ranking and why add it?
Initial vector search is fast but approximate. A re-ranker (often a cross-encoder) re-scores the top candidates by jointly reading query + document, improving final ordering before sending to the LLM. Typical pattern: retrieve top-50 fast, re-rank to top-5.
Q1. What is semantic search and how does it differ from keyword search?
Keyword (lexical) search matches exact terms. Semantic search matches meaning using embeddings, so it finds results even with different wording.
Example: Query "how to reset my password." Keyword search misses a doc titled "recovering account credentials"; semantic search finds it because the meanings are close.
Q2. What are embeddings?
Dense numeric vectors representing meaning. Similar concepts land close together in vector space. Produced by embedding models (e.g., OpenAI text-embedding-3, sentence-transformers).
Example: embed("king") - embed("man") + embed("woman") ≈ embed("queen") — classic vector arithmetic showing semantic structure.
Q3. What similarity metrics are used?
Q4. What is ANN search and why not exact search?
Exact nearest-neighbor over millions of vectors is slow. Approximate Nearest Neighbor (ANN) algorithms like HNSW trade a tiny bit of accuracy for huge speed gains, enabling sub-millisecond search at scale.
Q1. What is ChromaDB?
An open-source, developer-friendly vector database for storing embeddings and running similarity search. Popular in RAG prototypes for being lightweight and easy to run locally (in-memory or persistent).
Q2. What are collections, documents, embeddings, and metadata in Chroma?
Example: ```python import chromadb client = chromadb.Client() collection = client.create_collection("docs")
collection.add( documents=["LangChain is a framework for LLM apps.", "ChromaDB stores embeddings."], metadatas=[{"topic": "langchain"}, {"topic": "chroma"}], ids=["d1", "d2"] )
results = collection.query( query_texts=["What is LangChain?"], n_results=1, where={"topic": "langchain"} # metadata filter ) print(results["documents"]) ```
Q3. What is metadata filtering and why is it useful?
It restricts the search space using attributes (e.g., where={"year": 2024}), combining structured filters with semantic search — useful for tenant isolation, recency, or document-type scoping.
Q4. In-memory vs persistent Chroma — when to use which?
In-memory is great for quick experiments and tests. PersistentClient writes to disk so embeddings survive restarts — needed for any real app.
Q1. What is hybrid search?
A combination of semantic (dense vector) search and keyword (sparse/lexical, e.g., BM25) search, merging both result sets to get the best of meaning-based recall and exact-term precision.
Q2. Why do you need it — isn't semantic search enough?
Semantic search can miss exact identifiers — product codes, names, acronyms, error codes — where literal matching matters.
Example: Searching "error E-4012". Semantic search may drift to "general error troubleshooting," while keyword search nails the exact code. Hybrid captures both.
Q3. What is BM25?
A classic ranking function scoring documents by term frequency and inverse document frequency, with length normalization. It's the standard strong baseline for keyword/lexical relevance.
Q4. How are the two result sets combined?
Often via Reciprocal Rank Fusion (RRF), which combines rankings using score = Σ 1/(k + rank), or via weighted score blending (α dense + (1-α) sparse). RRF is popular because it needs no score calibration between the two systems.
Q1. What is LangChain and what problem does it solve?
A framework for building LLM applications by composing reusable components — models, prompts, retrievers, memory, tools — so you don't hand-wire integrations and orchestration logic.
Q2. What are the core components?
Q3. What is LCEL (LangChain Expression Language)?
A declarative way to pipe components using the | operator, with built-in streaming, batching, and async support.
Example: ```python from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_template("Explain {topic} in one sentence.") chain = prompt | ChatOpenAI() | StrOutputParser() print(chain.invoke({"topic": "RAG"})) ```
Q4. How does memory work, and what are its types?
Memory persists conversation context across turns. Types include buffer memory (full history), windowed memory (last N turns), and summary memory (a running LLM-generated summary to save tokens).
Q5. What is a LangChain Agent vs a Chain?
A chain is a fixed, predetermined sequence of steps. An agent uses the LLM to decide dynamically which tools to call and in what order based on the task.
Q1. What is LangGraph and why use it over plain LangChain chains?
LangGraph models an application as a stateful graph of nodes (steps) and edges (transitions). Unlike linear chains, it natively supports cycles, branching, and persistent state — essential for agents that loop, retry, and make decisions.
Q2. What are nodes, edges, and state?
Example (skeleton): ```python from langgraph.graph import StateGraph, END from typing import TypedDict
class State(TypedDict): question: str answer: str
def retrieve(state): return {"answer": "retrieved context..."} def generate(state): return {"answer": state["answer"] + " -> final answer"}
g = StateGraph(State) g.add_node("retrieve", retrieve) g.add_node("generate", generate) g.set_entry_point("retrieve") g.add_edge("retrieve", "generate") g.add_edge("generate", END) app = g.compile() print(app.invoke({"question": "What is LangGraph?", "answer": ""})) ```
Q3. What are conditional edges and why do they matter for agents?
They route execution based on a function of the state — enabling loops like "if the answer isn't good enough, go back and retrieve again." This is how you implement reflection, retries, and tool-use loops.
Q4. What is checkpointing / persistence in LangGraph?
LangGraph can save state at each step (a checkpointer), enabling memory across sessions, pause/resume, time-travel debugging, and human-in-the-loop approvals (pause, wait for a human, then continue).
Q5. When would you choose LangGraph over a simple agent?
When you need controllable, observable, multi-step or multi-agent workflows with loops, branching, durable state, and human approval gates — i.e., production-grade agentic systems rather than a one-shot chain.
Q1. What is MCP?
An open standard (introduced by Anthropic) for connecting AI applications to external tools and data sources through a consistent interface. It's often described as a "USB-C port for AI" — one protocol instead of a custom integration per tool.
Q2. What problem does MCP solve?
Without it, every app-to-tool integration is bespoke (the "M×N problem": M apps × N tools = M×N custom connectors). MCP standardizes the interface so any MCP-compatible client can talk to any MCP server, turning it into M+N.
Q3. Describe the MCP architecture.
A client-server model:
Servers can expose three primitives:
send_email, query_db).Q4. Give an example use case.
A coding assistant connects via MCP to a GitHub server (read issues, create PRs), a filesystem server (read project files), and a database server (run queries) — all through the same protocol, without custom glue code for each.
Q5. How is MCP different from regular LLM function/tool calling?
Function calling is the model's ability to request a call; MCP is a standardized transport and discovery layer for exposing and connecting those tools across apps. MCP often uses function calling under the hood but adds interoperability, discovery, and reusability so tools aren't locked to one app.
Q1. What is an AI agent?
A system where an LLM acts as a reasoning engine that plans, decides which actions/tools to use, executes them, observes results, and iterates toward a goal — rather than producing a single static response.
Q2. What are the core components of an agent?
Q3. Explain the ReAct pattern.
ReAct = Reasoning + Acting. The agent interleaves thoughts, actions (tool calls), and observations in a loop until it reaches an answer.
Trace example: `` Thought: I need today's weather in Mumbai. Action: weather_api("Mumbai") Observation: 31°C, humid. Thought: I have what I need. Answer: It's 31°C and humid in Mumbai. ``
Q4. What is the difference between a workflow and an agent?
Use workflows when steps are predictable; use agents when the path must adapt to the situation. Prefer the simplest design that works.
Q5. What are multi-agent systems? Give a pattern.
Multiple specialized agents collaborate. Common patterns:
Example: A research assistant where a "planner" splits a question, "searcher" agents gather sources in parallel, and a "synthesizer" writes the final report.
Q6. What are the main challenges in building reliable agents?
Mitigations: step limits, validation/guardrails, human-in-the-loop approvals for risky actions, observability/tracing, and keeping the design as simple as the task allows.
Q1. What is the difference between a list, tuple, set, and dictionary?
[1, 2, 2].(1, 2, 3). Hashable, so usable as dict keys.{1, 2, 3}. Fast membership tests.{"a": 1}.Q2. Explain mutable vs immutable types and a common gotcha.
Immutable (int, str, tuple, frozenset) can't change in place; mutable (list, dict, set) can. A classic gotcha is the mutable default argument:
```python def add(item, bucket=[]): # BAD: bucket is shared across calls bucket.append(item) return bucket
add(1) # [1] add(2) # [1, 2] <- surprise!
def add(item, bucket=None): # FIX if bucket is None: bucket = [] bucket.append(item) return bucket ```
Q3. What are *args and **kwargs?
*args collects extra positional arguments into a tuple; **kwargs collects extra keyword arguments into a dict.
```python def f(*args, **kwargs): print(args, kwargs)
f(1, 2, x=3) # (1, 2) {'x': 3} ```
Q4. What is a decorator? Give an example.
A function that wraps another to extend its behavior without modifying it.
```python import time def timer(fn): def wrapper(*args, **kwargs): start = time.time() result = fn(*args, **kwargs) print(f"{fn.__name__} took {time.time()-start:.4f}s") return result return wrapper
@timer def slow(): time.sleep(1) ```
Q5. What is the difference between a generator and a list comprehension?
A list comprehension [xx for x in range(1000)] builds the whole list in memory. A generator (xx for x in range(1000)) is lazy — it yields one item at a time, saving memory for large/streaming data. yield turns a function into a generator.
Q6. What is the GIL (Global Interpreter Lock)?
In CPython, the GIL allows only one thread to execute Python bytecode at a time. So threads help with I/O-bound work (waiting on network/disk) but not CPU-bound parallelism — for that, use multiprocessing or native extensions.
Q7. Explain is vs ==.
== compares values; is compares identity (same object in memory). a == b can be True while a is b is False. Use is mainly for None checks (if x is None).
Q8. What are list/dict comprehensions and when are they preferred?
Concise, readable transformations: {k: v for k, v in pairs if v}. Preferred over loops for simple mapping/filtering, but avoid cramming complex logic into them.
Q1. What is FastAPI and what makes it stand out?
A modern, high-performance Python web framework for building APIs, built on Starlette (async) and Pydantic (validation). Key strengths: native async/await, automatic request validation, and auto-generated interactive docs (Swagger UI + ReDoc) from type hints.
Q2. How does FastAPI use type hints and Pydantic?
Type hints drive automatic parsing, validation, and documentation. Pydantic models define request/response schemas; invalid input returns a clear 422 error automatically.
```python from fastapi import FastAPI from pydantic import BaseModel
app = FastAPI()
class Item(BaseModel): name: str price: float in_stock: bool = True
@app.post("/items") def create_item(item: Item): return {"received": item.name, "price": item.price} ```
Here FastAPI validates the JSON body against Item before your function runs.
Q3. What is dependency injection in FastAPI?
A built-in system (Depends) to declare reusable dependencies — DB sessions, auth, config — that FastAPI resolves and injects automatically. Great for sharing logic and for testing (you can override dependencies).
```python from fastapi import Depends
def get_db(): db = "db_connection" try: yield db finally: pass # close db
@app.get("/users") def list_users(db=Depends(get_db)): return {"db": db} ```
Q4. Path params vs query params vs request body — how does FastAPI tell them apart?
``python @app.get("/items/{item_id}") # item_id = path param def read(item_id: int, q: str = None): # q = query param ... ``
Q5. When should an endpoint be async def vs def?
Use async def when you await non-blocking I/O (async DB drivers, httpx). Use plain def for blocking/CPU work — FastAPI runs it in a threadpool so it won't block the event loop. Mixing blocking calls inside async def is a common performance bug.
Q6. How do you handle errors in FastAPI?
Raise HTTPException for expected errors, or register custom exception handlers.
```python from fastapi import HTTPException
@app.get("/items/{item_id}") def read(item_id: int): if item_id > 100: raise HTTPException(status_code=404, detail="Item not found") return {"item_id": item_id} ```
Q1. What is Flask and how does it differ from FastAPI/Django?
Flask is a lightweight micro-framework — minimal core, extend via extensions. Compared to Django (batteries-included: ORM, admin, auth out of the box) it's more flexible/less opinionated. Compared to FastAPI, it's traditionally synchronous (WSGI) and lacks built-in type-based validation and auto docs (though async is supported in recent versions).
Q2. Write a minimal Flask app with a route.
```python from flask import Flask, jsonify, request
app = Flask(__name__)
@app.route("/hello/<name>", methods=["GET"]) def hello(name): return jsonify({"message": f"Hello, {name}!"})
@app.route("/data", methods=["POST"]) def data(): body = request.get_json() return jsonify({"received": body}), 201
if __name__ == "__main__": app.run(debug=True) ```
Q3. What is the difference between @app.route methods and accessing request data?
methods=[...] declares which HTTP verbs a route accepts. Incoming data is read via the global request object: request.args (query params), request.get_json() (JSON body), request.form (form data), request.files (uploads).
Q4. What is the application context vs request context?
Flask uses context locals. The request context holds per-request data (request, session); the application context holds app-level data (current_app, g). They're pushed/popped automatically per request — relevant when doing work outside a request (e.g., in scripts or background jobs), where you must push a context manually.
Q5. What are Blueprints?
A way to organize a large app into modular components — group related routes, templates, and static files, then register them on the app. Helps structure and scale projects.
```python from flask import Blueprint users_bp = Blueprint("users", __name__, url_prefix="/users")
@users_bp.route("/") def list_users(): return "all users"
# app.register_blueprint(users_bp) ```
Q6. What is WSGI, and why does it matter for deployment?
WSGI is the standard interface between Python web apps and servers. Flask's built-in server is for development only; in production you run it behind a WSGI server like Gunicorn or uWSGI (often behind Nginx). FastAPI, by contrast, uses ASGI (e.g., Uvicorn) to support async.
Q7. How do extensions work (e.g., SQLAlchemy, Flask-Login)?
Flask keeps the core small; functionality like ORM (Flask-SQLAlchemy), auth (Flask-Login), and migrations (Flask-Migrate) come as extensions you initialize with the app — letting you pick only what you need.
These are real-world "how would you handle this" questions — interviewers want a structured approach, trade-offs, and specifics.
Approach:
last_modified / content hash / change-data-capture). Re-embedding everything hourly is wasteful and risky.doc_id + chunk_index) so updates overwrite cleanly and deletes remove stale chunks. Avoid duplicate chunks creeping in.index_v1 → index_v2 once validated. Readers always hit a consistent, fully-built index — no half-updated state.valid_from / version / source_timestamp so queries can prefer fresh data and you can roll back instantly.Key talking point: "Atomic alias swap + incremental idempotent upserts + a validation gate" — this guarantees zero downtime and no broken search.
Approach:
Pattern: history → (rewrite query) → retrieval-decision gate → [retrieve | reuse cache | answer directly] → generate.
Diagnose first (measure each stage: embedding, vector search, re-rank, LLM generation, network). Usually the LLM generation and over-retrieval dominate. Then optimize:
Key talking point: "Measure the bottleneck, then stream + retrieve less + right-size the model + cache."
Answer: Hybrid search (dense + sparse/BM25) with re-ranking — not pure semantic search.
Why: exact numbers, ticker symbols, line-item names, and codes are where keyword/lexical matching shines; semantic search alone can miss or "approximate" exact figures. Combine:
Key talking point: "Precision on exact figures → hybrid search + table-aware extraction + re-ranking + cited, verified numbers."
Q1. What do you need to run an LLM with no internet connection?
Example stack: Ollama (serves a quantized Llama model) + a local sentence-transformer embedder + Chroma (persistent) → fully offline RAG.
Q2. Why run locally instead of an API?
Data privacy/compliance (data never leaves your network), no per-token cost, no internet dependency, predictable latency, full control. Trade-offs: you manage infra, and open models may be weaker than the largest frontier models.
Q3. How do you improve accuracy of a local/offline LLM?
Q1. How do you confirm an LLM's output is good and validate it?
There's no single accuracy number like classic ML, so combine methods:
Q2. What tools have you used for evaluation?
Q3. How do you validate an agent (multi-step) specifically?
Beyond final answer: check the trajectory (did it call the right tools in a sensible order?), tool-call correctness, intermediate state, and end-state assertions. Single-step evals validate individual decisions cheaply; full-run evals validate the whole task.
| Keyword (lexical) search | Semantic search | |
|---|---|---|
| Matches on | Exact words / tokens (e.g., BM25) | Meaning via embeddings |
| Handles synonyms? | No | Yes |
| Exact codes/numbers? | Excellent | Can miss/approximate |
| Speed/cost | Very cheap | Needs embedding + vector DB |
| Typo/paraphrase tolerance | Low | High |
Example query: "How do I reset my password?"
But the reverse: query "error E-4012" — keyword search nails the exact code; semantic search may drift to generic "error troubleshooting." → This is exactly why hybrid search (both combined) is often the best of both worlds.
Q1. Regression vs Classification — what's the difference?
Memory hook: "How much / how many?" → regression. "Which class / yes-no?" → classification.
Q2. What is time series analysis and how is it different?
Time series data is ordered in time with temporal dependence (today depends on yesterday), so you can't shuffle it. You forecast future values from past patterns: trend, seasonality, and noise.
Q3. Explain ARIMA, SARIMA, and related models.
d to make data stationary) + MA. Good for non-seasonal series with trend.p = AR lags, d = differencing order, q = MA lags.m is the season length. Example: monthly sales with a yearly cycle → m = 12.Other approaches: Exponential Smoothing (Holt-Winters), Prophet (trend + seasonality + holidays, easy to use), and ML/DL methods (XGBoost on lag features, LSTMs).
Q4. What is stationarity and why does it matter?
A stationary series has constant mean/variance over time. ARIMA assumes stationarity, achieved via differencing (the "I"). Test with the ADF (Augmented Dickey-Fuller) test.
Q5. How do you validate a time series model (and what's the catch)?
Don't use random train/test splits — that leaks the future. Use time-based splits / forward chaining (walk-forward validation). Metrics: MAE, RMSE, MAPE.
| Traditional / Predictive AI | Generative AI | Agentic AI | |
|---|---|---|---|
| Goal | Predict/classify from data | Create new content | Achieve a goal autonomously |
| Output | A label or number | Text/image/code/audio | Actions + results over many steps |
| Example | Fraud detection, churn prediction | Write an email, generate an image | Book a trip end-to-end, run research |
| Core tech | Supervised/unsupervised ML | Foundation models / LLMs | LLM + tools + memory + planning loop |
| Autonomy | Low (single prediction) | Low (single response) | High (plans, decides, acts, iterates) |
| Human role | Acts on the prediction | Edits/uses the content | Sets goal, supervises/approves |
Narrative: Traditional AI predicts; Generative AI creates; Agentic AI acts. Agentic AI typically uses a GenAI model as its reasoning engine but adds tools, memory, and a control loop to take real actions toward a goal.
Q1. How does the software development lifecycle change for GenAI/Agentic systems?
Traditional SDLC is deterministic; GenAI systems are probabilistic and non-deterministic, so evaluation, data, and monitoring become first-class. A typical lifecycle:
Q2. What's extra for Agentic AI specifically?
Key talking point: "Evaluation and observability move to the center; the system is probabilistic, so you design for measurement, guardrails, and feedback loops, not just features."
Think of them as three layers of the same stack, chosen by how much control you need:
create_agent). Use it to get started fast and standardize common patterns: simple chains, RAG pipelines, straightforward tool-calling agents.Decision guide:
One-liner: LangChain = building blocks; LangGraph = the controllable runtime; Deep Agents = a batteries-included harness for hard, long-running tasks.
Q1. What's the difference between a (regular/shallow) agent and a Deep Agent?
A regular ("shallow") agent is essentially a ReAct loop: think → call a tool → observe → repeat, until done. It works well for single questions and short tasks but struggles as tasks get long — it loses track of earlier instructions, the context window fills up, and failures are hard to recover from.
A Deep Agent is built for long-horizon, complex tasks and adds infrastructure on top of that loop:
write_todos tool) before acting.Analogy: a shallow agent is a person answering one question with a calculator; a Deep Agent is a project manager who writes a plan, delegates pieces to teammates, keeps notes in shared files, and checks in for approvals. Deep Agents are the architecture behind tools like Claude Code, Deep Research, and Manus.
Q2. When would you pick a simple agent over a Deep Agent?
When the task is short and well-scoped (a couple of tool calls, one question). The Deep Agent's planning/filesystem/sub-agent machinery adds overhead and complexity you don't need — match the tool to the task.
# Part B — Classic Data Science Interview
This part answers a traditional data-science interview set (regression, classification, clustering, dimensionality reduction, NLP) and embeds the end-to-end workflow reference for each technique.
Q1. What does the interview process usually look like?
Resume screening → telephonic screen → a case study / online test (e.g., HackerRank) → one or more technical rounds → HR.
Q2. How long did your project take, and how was the time split?
Typically 3–6 months: roughly 2–3 months on business understanding, data preparation, and exploratory analysis; 1–2 months building, fine-tuning, and iterating models; and about 1 month reviewing results with the business, iterating, and preparing dashboards/decks.
Q3. What packages do you use?
numpy, pandas (data handling), matplotlib, seaborn (visualization), statsmodels (statistical models / p-values), scikit-learn (ML models & evaluation), and nltk (text/NLP).
Q4. What does the team look like?
A 5–6 member team: a lead data scientist, a business consultant, and several senior analysts / data scientists. A senior analyst typically has ~3–4 years' experience; a data scientist 5+.
Q5. How do you treat outliers?
Detect with the z-score rule (e.g., beyond ±3σ, the "6-sigma" spread) and box-and-whisker plots (points beyond 1.5×IQR). Then cap, transform, remove, or keep depending on whether they're genuine signal or errors.
Q6. How do you handle missing values?
Options, in increasing sophistication: drop records (only if few and random), impute with mean/median (or mode for categoricals), or use model-driven imputation (e.g., KNN/regression imputation). Choice depends on missingness mechanism and volume.
Q7. What algorithms do you know?
Q8. How do you deploy your models?
Work with the engineering team; hand off trained model files as pickle objects, served behind an API. A common stack was AngularJS + Flask + Python (today FastAPI is a frequent replacement for Flask).
Q1. What is the DV (dependent variable)? A continuous number (e.g., price, demand).
Q2. What are the IDVs (independent variables)? The predictors — continuous or categorical features used to explain the DV.
Q3. How many observations / how do you split? Commonly 70% train / 30% test, plus cross-validation for a more robust estimate.
Q4. What model do you use? Ordinary Least Squares linear regression: y = a·x1 + b·x2 + … + C. For non-linear relationships, apply variable transformations to linearize:
log(y) = a·x + Cy = a·log(x) + Clog(y) = a·log(x) + CQ5. How do you check goodness of fit? R² for simple regression, Adjusted R² for multiple regression (penalizes useless predictors). IDV p-values should be low (< 0.05) — meaning the variable is statistically significant.
> p-value intuition (the courtroom analogy): the null hypothesis is "the variable is insignificant" — like a judge assuming the accused is innocent. It's the data's job (the prosecution) to prove the probability of that null is near zero. A p-value < 0.05 is enough "evidence" to reject innocence.
Q6. What about MAPE and overfitting? Compute MAPE (also SSE/MAE) on both train and test. Overfitting = low training error but high test error. Underfitting = high training error itself. Good model = comparable train and test error.
Q7. What if an IDV is categorical? Use one-hot encoding for nominal categories (Gender, State); leave ordinal categories (severity level, floor number) as ordered numbers — no encoding needed.
Q8. What is multicollinearity and how do you handle it? Strong correlation between IDVs (redundancy), which destabilizes coefficients. Handle with feature selection, stepwise regression, or regularization (Lasso / Ridge). (Check with correlation matrix / VIF.)
End-to-end workflow (reference): ``` Step 0 Business understanding, data prep, quality checks, missing-value treatment Step 1 Identify DV (continuous) and IDVs (continuous/categorical)
Step 2 Exploratory analysis
Step 3 Model building
Step 4 Model evaluation
Step 5 Go live and start predicting ```
Q1. What is the DV / how many classes? How do you handle class imbalance? The DV is categorical (2+ classes). For imbalance: downsample the majority class or oversample the minority class (e.g., SMOTE), and use imbalance-aware metrics (not just accuracy).
Q2. What are the IDVs? The predictor features. Note: for KNN, scale the IDVs since it relies on distance.
Q3. How many observations / how do you split? 70/30 train-test, plus k-fold cross-validation (5-fold is popular) to average performance across splits.
Q4. What model? KNN, Logistic Regression, Decision Tree, Random Forest, Gradient Boosting — pick the model and hyperparameters that give the best validated result.
Q5. How do you know the model is good? Use the confusion matrix to derive accuracy, sensitivity (recall/TPR), specificity, FPR, precision on both train and test — and avoid overfitting.
| Metric | Meaning |
|---|---|
| Accuracy | Overall correct predictions |
| Precision | Of predicted positives, how many were right |
| Recall / Sensitivity / TPR | Of actual positives, how many were caught |
| Specificity | Of actual negatives, how many were correctly rejected |
| FPR | False alarms among actual negatives |
Q6. How do you evaluate a logistic model specifically? Lift charts, the ROC curve, and AUC (ranges 0.5 = random to 1.0 = perfect).
Q7. How do you evaluate / split a Decision Tree? Splits chosen by Gini coefficient or Information Gain (Entropy) — both measure how "pure" the resulting groups are.
Q8. What if a feature is categorical? One-hot encoding (same rule as regression: encode nominal, keep ordinal as numbers).
End-to-end workflow (reference): ``` Step 0 Business understanding, data prep, quality checks, missing-value treatment Step 1 Identify DV (categorical) and IDVs
Step 2 Exploratory analysis
Step 3 Model building
Step 4 Model evaluation
Step 5 Go live (save model as pickle for deployment) ```
Q1. How many variables? Defined by the segmentation problem (e.g., customer attributes for market/customer segmentation).
Q2. What algorithm?
Q3. How do you choose K? Combine business input with the elbow curve of within-cluster distance (where the curve bends).
Q4. What is cluster profiling? After clustering, describe each segment by its characteristics so the business can act on it (analogous to Nielsen PRIZM consumer segments).
Q5. Do you scale? Yes — normalize features (x − μ)/σ so variables on different scales are comparable (distance-based methods are scale-sensitive).
End-to-end workflow (reference): `` Step 0 Business understanding, data cleaning Step 1 Scale the data if variables aren't comparable Step 2 Exploratory analysis (watch variables with high variance vs mean) Step 3 Cluster: choose optimal K via business need + elbow curve Step 4 Cluster profiling ``
Q1. How many raw variables / how many reduced? You start with D original variables and project them to K new dimensions (K << D).
Q2. What model? Principal Component Analysis (PCA): compute the covariance matrix, perform eigenvalue decomposition, then project the data — shape-wise (n × D) · (D × K) → (n × K).
Q3. How many new dimensions do you keep? Look at the variance captured by each component (eigenvalues) and keep enough components to retain ~70–80% of total variance.
Q4. How do you relate new dimensions back to raw variables? Via factor loadings — how strongly each original variable contributes to each component.
Q5. Why scale first? Like clustering, PCA is variance-sensitive, so standardize variables that aren't comparable.
End-to-end workflow (reference): ``` Step 0 Business understanding, data cleaning Step 1 Scale the data if variables aren't comparable Step 2 Exploratory analysis (watch for highly correlated variables) Step 3 Build PCA, transform to new dimension space
Step 4 Factor analysis (interpret loadings) Step 5 Downstream use: visualization, clustering, regression ```
Q1. What preprocessing do you do with NLTK? Tokenization (split into words/sentences), stop-word removal (drop "the", "is"), stemming (chop to a crude root, e.g., "running" → "run"), and lemmatization (reduce to a valid dictionary form using context).
Q2. How do you turn text into features? Feature extraction into a sparse matrix:
Q3. What downstream tasks follow?
Modern note: classic TF-IDF still works well for many tasks, but embeddings (and LLMs) now capture meaning/context far better — useful to mention as the evolution of these techniques.
| Concept | One-line definition |
|---|---|
| GenAI | AI that creates new content by modeling data distributions |
| LLM | Transformer-based model trained on text to predict tokens |
| RAG | Retrieve external docs, then generate grounded answers |
| Semantic search | Meaning-based retrieval using embeddings |
| ChromaDB | Lightweight open-source vector database |
| Hybrid search | Dense (semantic) + sparse (keyword/BM25) combined |
| LangChain | Framework to compose LLM app components |
| LangGraph | Stateful graph framework for cyclic, agentic workflows |
| MCP | Open standard to connect AI apps to tools/data |
| Agentic AI | LLM that plans, acts via tools, observes, and iterates |
| Python | General-purpose language; mind mutability, GIL, generators |
| FastAPI | Async API framework with Pydantic validation + auto docs |
| Flask | Lightweight WSGI micro-framework, extend via extensions |
| Offline LLM | Open-weights model + local runtime (Ollama) + local RAG |
| LLM validation | Golden sets, LLM-as-judge, RAGAS/LangSmith, guardrails |
| ARIMA/SARIMA | Time-series forecasting; SARIMA adds seasonality |
| Traditional/Gen/Agentic AI | Predict vs create vs act autonomously |
| Deep Agent | Agent + planning, filesystem, sub-agents for long tasks |
| Regression | Predict continuous DV; OLS, check R²/Adj R², MAPE |
| Classification | Predict categorical DV; confusion matrix, ROC/AUC |
| Clustering | Unsupervised grouping; K-means, elbow curve, scale first |
| PCA | Reduce dimensions; keep 70-80% variance, factor loadings |
| NLP/Text | NLTK preprocessing, TF-IDF, sentiment, topic modeling |
Compiled by Gopinath Rathinakali — shared to help others preparing for AI, GenAI and data-science interviews. Good luck.