AI Systems Engineer · IIT Guwahati · NE India

OKRAM jimmy SINGH

// Data Platform Engineer  ·  Applied AI Developer  ·  Systems Architect

Building scalable AI and data systems — real-time streaming pipelines, document intelligence, vector search engines, and LLM-powered workflows across research, government, and enterprise.

3+
Years at IITG
6
Production Systems
1
IEEE Publication
Live pipeline · 94k events/sec · latency 12ms
Core Capabilities

What I Engineer

01
End-to-End AI Pipelines
Ingestion → OCR → tokenization → embeddings → inference → post-processing → indexing & search across legal, OSINT, and R&D domains.
OCRRAGEmbeddingsInference
02
Real-Time Streaming
High-volume event pipelines with Kafka, Spark Streaming, PySpark, Dask, and Flink for social media analytics and OSINT at scale.
KafkaPySparkFlinkDask
03
RAG & Hybrid Search
Weaviate, Milvus, Pinecone, Elasticsearch — reranking, metadata filtering, BM25+vector fusion for multilingual document retrieval.
WeaviateMilvusBM25Pinecone
04
GPU Inference Infrastructure
On-premise GPU deployment with optimized batching, caching, fallback pipelines for NLLB, LLMs, OCR via TensorRT.
TensorRTNLLBLLMsGPU
05
Document AI Platforms
PDF, DOCX, scanned-image conversion with layout-aware extraction and normalization for multilingual NLP and translation workflows.
PDFDOCXLayout AINLP
06
Microservice Architecture
Production APIs integrating OCR, NLP, retrieval, and AI automation. Distributed, event-driven, fault-tolerant with Airflow orchestration.
FastAPIDockerAirflowREST
Selected Work

Major Projects

01 // STREAMING · OSINT
Real-Time Twitter Streaming & Analytics
Large-scale OSINT ingestion with Kafka, Spark Streaming, Cassandra, Elasticsearch. Real-time dashboards for sentiment, trends, and event detection.
KafkaSparkCassandraElasticsearch
02 // DOCUMENT AI
Document-Level AI Translation System
OCR → extraction → NLLB/LLM inference → court-ready output. GPU on-premise pipeline with web-based Word-style editor for legal judgment review.
NLLBLLMsOCRFastAPI
03 // PLATFORM · OCR
Internal OCR & Document Conversion
PDFs, scanned images, Word docs → structured text. Layout-aware extraction for multilingual RAG and translation workflows at IITG.
OCRPDFRAGMultilingual
04 // VECTOR SEARCH
Freeform Vector Search Engine
Hybrid vector + keyword search with embeddings, metadata filters, and reranking via Weaviate and Elasticsearch for multilingual retrieval.
WeaviateBM25Reranking
05 // ERP · AUTOMATION
ERP Automation — IITG R&D
Frappe Framework workflows, document automation, approval pipelines for research management, HR, and asset tracking with RBAC.
FrappePythonRBAC
06 // OSINT · MeitY
Vishleshakee 2 — Social Analytics
MeitY-funded OSINT platform for real-time sentiment, trend detection, event monitoring. PySpark + Kafka + Hadoop + Cassandra + Elasticsearch.
PySparkKafkaHadoop
Work History

Work Experience

May 2025 — Present
Current · Full-time
System Engineer
R&D Section, IIT Guwahati
  • Internal ERP using Frappe Framework — research project management, HR, and asset workflows.
  • Frappe module customization: workflow automation, role/dept permissions, reporting dashboards.
  • Backend infra, containerized services (Docker), database configs for secure, reliable operation.
  • Automated deployment, monitoring and maintenance scripts using Docker, Git, and shell scripting.
Jul 2024 — May 2025
Contract
Assistant Project Engineer
CLST, IIT Guwahati
  • Led AI-assisted legal translation using NLLB and LLM post-processing for Indian language judgments.
  • Translation pipelines, text classification, LLM correction, NER, document-level segmentation/tokenization.
  • On-premise deployment: FastAPI, React, Next.js, MongoDB, PostgreSQL, Redis, Docker.
  • Web-based Word-style editor for editing, reviewing, and formatting translated legal documents.
Jan 2022 — Jun 2024
Project Role
Project Associate — I
OSINT Lab, IIT Guwahati
  • Vishleshakee 2 — MeitY-funded OSINT platform: sentiment analysis, trend detection, event monitoring.
  • PySpark, Kafka, Hadoop, Spark Streaming + Cassandra and Elasticsearch for real-time analytics at scale.
  • Laravel web apps + Python scripts for sentiment analysis, scraping, and API-based data ingestion.
  • Docker deployment and big data orchestration with Apache Airflow and HDFS.
Tools & Technologies

Full Tech Stack

KafkaApache SparkPySparkFlinkAirflowHadoop · HDFSDaskCassandraElasticsearchMongoDBPostgreSQLRedisWeaviateMilvusPineconeHuggingFaceNLLBOpenCVTensorRTFastAPINestJSFrappe · ERPNextLaravelReactNext.jsDockerPrometheus · GrafanaLinux · ShellPythonGo
Get in Touch

Let's Connect

Open to AI Systems Engineering, Data Engineering, and Platform roles. Available for Frappe/ERPNext consulting and NLP/document AI projects.

B.Tech Computer Engineering — Mizoram University, 2020. Currently at IIT Guwahati, Northeast India.

Research Publication — IEEE Xplore
AI & Machine Learning · Document 10337115
View Paper ↗