entangelk — Professional Experience

★ Current work★ 현재 실무

Verified Agentic-RAG — the synthesis검증형 Agentic-RAG — 통합

My current flagship as an AI Solution Engineer, and the synthesis of the three public PoCs in my portfolio — a memory layer, a provider-neutral execution contract, and a verification gate — brought together at production scale on a real legal-domain assistant. 현재 AI Solution Engineer로서 진행 중인 플래그십이자, 포트폴리오의 세 공개 PoC — 메모리 레이어, provider-neutral 실행 계약, 검증 게이트 — 를 실제 법률 도메인 어시스턴트에서 실무 규모로 통합한 작업입니다.

Verified Agentic-RAG System검증형 Agentic-RAG 시스템

Current work현재 실무 Company · not public회사 · 비공개

A working verified RAG (retrieval-augmented generation) assistant for the legal domain. A live pipeline — hybrid retrieval over a Korean-law corpus, a multi-agent answer-research loop, and claim-by-claim grounding verification — turns a non-developer's vague question into a checkable task spec and lets no unverified text reach the user. 법률 도메인을 위한 동작하는 검증형 RAG(검색 증강 생성) 어시스턴트입니다. 한국 법령 코퍼스에 대한 하이브리드 검색, 멀티 에이전트 답변 연구 루프, claim 단위 grounding 검증이 실제로 도는 파이프라인으로, 비개발자의 모호한 질문을 검증 가능한 작업 명세로 바꾸고 검증되지 않은 텍스트는 사용자에게 내보내지 않습니다.

Active, working work-in-progress · company project, source not public. The retrieval-to-gate pipeline runs end-to-end on a live legal corpus; described here at the architecture / decision level only. 동작 중인 진행형 회사 프로젝트로 소스는 비공개입니다. 검색→게이트 파이프라인은 라이브 법령 코퍼스에서 end-to-end로 돌아갑니다. 여기서는 아키텍처·의사결정 수준으로만 서술합니다.

01Contract-first, four independent layersContract-first, 네 개의 독립 레이어

The problem isn't solved by one LLM call, so I split it into four layers — Intake, Knowledge, RAG Harness, Verification/Gate — and fixed the contracts between them before building. Each layer is developed independently against mocks; the boundary objects (Task Brief, Source Ref, Gate Decision) are the single source of truth.이 문제는 LLM 한 번 호출로 풀리지 않습니다. 그래서 Intake / Knowledge / RAG Harness / Verification·Gate 네 레이어로 나누고, 구현보다 먼저 레이어 간 계약을 고정했습니다. 각 레이어는 mock으로 독립 개발하고, 경계 객체(Task Brief·Source Ref·Gate Decision)가 단일 진실 공급원입니다.

02Build the trust foundation before the features기능보다 신뢰 기반을 먼저

I build Knowledge + Verification first — the trust foundation — before the RAG that feeds them. The core invariant: unverified text never reaches the user, and any answer mutation (e.g. redacting an unsupported claim) is re-verified before it ships.RAG보다 Knowledge·Verification을 먼저 만듭니다 — 신뢰 기반부터. 핵심 불변식: 검증되지 않은 텍스트는 사용자에게 나가지 않는다. 답변을 바꾸는 처리(예: 근거 없는 claim 제거)는 재검증을 거친 뒤에만 나갑니다.

"Unverified text never reaches the user.""검증되지 않은 텍스트는 사용자에게 나가지 않는다."

03The vector DB is a cache, not the truth — retrieval is ontology-awareVectorDB는 진실이 아니라 캐시다 — 검색은 ontology-aware

A common RAG trap is treating the vector store as the source of truth. Here it's only a retrieval cache — final evidence is always reloaded from an immutable source snapshot via source_ref. Retrieval is ontology-aware: it returns the smallest unit plus its path in the legal tree (article / clause / item) and reloads the surrounding context on demand, so a citation stays reproducible even if the parser or policy changes.흔한 RAG 함정은 vector store를 진실의 원천으로 다루는 것입니다. 여기서 그것은 검색 캐시일 뿐이고, 최종 근거는 항상 불변 source snapshot에서 source_ref로 재로드됩니다. 검색은 ontology-aware입니다 — 법령 트리(조/항/호/목)에서 최소 단위 + 경로를 돌려주고 상위 맥락은 필요할 때 재조회하므로, parser·정책이 바뀌어도 인용은 재현됩니다.

"The vector DB is a cache. The source snapshot is the truth.""VectorDB는 캐시다. Source Snapshot이 진실이다."

04Schema-guided intake: the schema drives the conversationSchema-guided intake: 스키마가 대화를 주도한다

A vague non-developer request ("is this contract risky?") isn't handed straight to the RAG. A task classifier picks the task type and an LLM refines the utterance into schema slots — it refines, it doesn't reason; unknown fields stay null rather than guessed; and it asks the user only for what's genuinely missing. The resulting Task Brief is itself a verified artifact, and when intake can't be grounded within budget it escalates to a human-review queue instead of guessing.비개발자의 모호한 요청("이 계약서 위험해?")을 RAG에 바로 넘기지 않습니다. task classifier가 유형을 고르고 LLM이 발화를 스키마 슬롯으로 정제합니다 — 추론이 아니라 정제. 모르는 필드는 추측하지 않고 null로 두며, 정말 빠진 것만 사용자에게 묻습니다. 그렇게 만든 Task Brief 자체도 검증 대상이고, 예산 안에서 intake가 grounding되지 않으면 추측 대신 human review 큐로 에스컬레이션합니다.

"The LLM refines the request; it doesn't reason about it. If it doesn't know, the value is null.""LLM은 요청을 정제할 뿐, 추론하지 않는다. 모르면 값은 null이다."

05Agentic research — but the LLM only proposesAgentic 연구 — 그러나 LLM은 제안만 한다

Answer generation is a multi-agent research loop — query planning, claim extraction, evidence linking, semantic verification, revision — with a gate that can send work back (retrieve_more / revise / ask_user) or, when the grounding budget is exhausted, park it for human review. The division of labor is strict: the LLM proposes candidates, the verification layer confirms grounding claim-by-claim, and the gate decides what ships.답변 생성은 멀티 에이전트 연구 루프입니다 — query planning, claim extraction, evidence linking, semantic verification, revision — 그리고 작업을 되돌리거나(retrieve_more / revise / ask_user), grounding 예산이 소진되면 human review로 보류하는 gate가 있습니다. 역할 분담은 엄격합니다: LLM은 후보를 제안하고, 검증 레이어가 claim 단위로 grounding을 확인하며, gate가 무엇을 내보낼지 결정합니다.

06Verified the same way I verify everything내가 모든 걸 검증하는 방식 그대로

I hold this private work to the same bar as my open PoCs, and keep the same separated trails: dated verification records (each a conditional verdict, guarded both ways so a fix can't bring the bug back or start flagging valid cases), daily working logs, a single canonical contract file held in place by drift-lock tests that fail the moment the spec and the code diverge, and a full suite in the hundreds of tests. As the production flagship it accumulates far more of these records than any single PoC — dozens of verification entries, not a handful. AI does the work; the verification decides whether it is real.이 비공개 작업도 공개 PoC와 같은 기준으로, 같은 분리된 자취를 남깁니다 — 변경마다의 날짜별 검증 기록(조건을 명시한 판정 + 양방향 회귀 가드 — 버그 재발과 정상 케이스 오탐 양쪽을 막음), 일일 워킹로그, 명세와 코드가 어긋나는 순간 실패하는 drift-lock 테스트로 고정한 단일 정본 계약 파일, 수백 개 규모의 전체 테스트. 실무 플래그십이라 어떤 단일 PoC보다 훨씬 많은 기록이 쌓입니다 — 검증 기록만 수십 건 규모입니다. AI가 일하고, 검증이 그것이 진짜인지 결정합니다.

How it connects:연결점: This is the production-scale version of the same idea behind my public PoCs — AI proposes, deterministic verification confirms, a human-aware gate decides.이것은 제 공개 PoC와 같은 사상의 실무 규모 버전입니다 — AI는 제안하고, 결정론적 검증이 확인하며, 사람을 고려한 gate가 결정합니다.

Production work현업·상용 작업

Professional experience실무 경력

Internal Automation Tooling for a Non-Developer Team비개발자 팀을 위한 사내 업무 자동화 도구

A solo-built internal FastAPI web app that turns a planning team's repetitive manual work — monthly reporting, data crawling, notification dispatch, and a few AI assists — into self-serve automation.기획팀의 반복 수작업 — 월간 보고서, 데이터 크롤링, 알림 발송, 몇 가지 AI 보조 — 을 셀프서비스 자동화로 바꾼 단독 개발 FastAPI 사내 웹앱.

Decision

I kept the internal platform as one app with a Docker-managed datastore and scheduler instead of splitting it into services it did not need. When policy blocked external data access, I reused existing internal email/SMS endpoints rather than asking the core team to build new APIs, and kept the workflow inside the closed network.필요 이상의 서비스 분리 대신 하나의 사내 앱과 Docker 기반 데이터 저장소·스케줄러로 운영했습니다. 외부 데이터 접근이 정책상 막히자 코어 팀에 신규 API를 요구하지 않고 기존 내부 이메일/SMS 엔드포인트를 재사용했으며, 흐름 전체를 폐쇄망 안에 뒀습니다.

For customer-service reply drafts, the MVP auto-sent only when the model reported 70%+ confidence; everything else went to an administrator. This was a prompt-reported routing score, not a calibrated probability — a lightweight human gate, not a claim of statistical certainty.고객응대 답변 초안 MVP는 모델이 70% 이상 신뢰도를 반환할 때만 자동 발송하고 나머지는 관리자에게 넘겼습니다. 이는 보정된 확률이 아니라 prompt로 받은 routing 점수였습니다 — 통계적 확실성 주장이 아닌 가벼운 human gate입니다.

Impact

Replaced manual collection, aggregation, and report assembly across recurring monthly work, with an estimated 10+ hours saved per month. That is a work-replacement estimate, not an instrumented time study; the durable result is that non-developers operate it themselves.매월 반복되던 수집·집계·보고서 조립을 자동화해 월 10시간 이상을 절감한 것으로 추산했습니다. 이는 계측 실험이 아니라 대체된 수작업 기준의 추정치이며, 지속되는 결과는 비개발자가 직접 운용한다는 점입니다.

Production NLP Categorization Microservice프로덕션 NLP 카테고리 매칭 마이크로서비스

A live AI module matching natural-language user inputs to internal DB categories.자연어 사용자 입력을 내부 DB 카테고리에 매칭하는 라이브 AI 모듈.

Decision

Used BGE-M3 retrieval + a BGE cross-encoder reranker, then selected the category first and searched tags only inside it. That constrained generation back to master data and reduced the common-tag bias of a single global top-k.BGE-M3 검색 + BGE cross-encoder reranker를 사용하고, 카테고리를 먼저 확정한 뒤 그 안에서만 태그를 찾았습니다. 결과를 master data 안으로 제한하고, 전체 태그 단일 top-k의 공통어 편중을 줄이기 위한 구조입니다.

Why

To make the core backend team able to adopt it with zero restructuring, I shipped it containerized (Docker) with full Swagger docs — a plug-and-play service, not a code dependency.코어 백엔드 팀이 재구조화 없이 도입할 수 있도록, Docker 컨테이너 + Swagger 문서로 전달했습니다 — 코드 의존성이 아니라 plug-and-play 서비스로.

Impact

Plugged into the live commercial environment without core-backend restructuring; incremental embedding updates cut a measured full refresh from 125 seconds to 2 seconds (61.5×).코어 백엔드 재구조화 없이 상용 환경에 투입했고, 증분 임베딩 업데이트로 전체 갱신 125초를 2초(실측 61.5배)로 줄였습니다.

Large-Scale Asset Generation & Review Pipeline대규모 에셋 생성·검수 파이프라인

Generation, review, and migration of 22,068 service image assets through a resumable internal pipeline.22,068건의 서비스 이미지 에셋을 재생성·검수·마이그레이션한 재개 가능한 사내 파이프라인.

Pipeline

More than bulk generation: a five-stage flow — computer-vision analysis, semantic metadata classification, generation, and validation/storage — that also migrated legacy image assets, not just produced new ones.단순 대량 생성이 아니라 — 컴퓨터비전 분석 → 시맨틱 메타데이터 분류 → 생성 → 검수/저장의 5단계 흐름이며, 신규 생성뿐 아니라 레거시 이미지 자산 마이그레이션까지 포함했습니다.

Decision

For a bounded internal migration I excluded a message queue and kept generation sequential: external API limits, duplicate prevention, and recovery cost mattered more than throughput. Checkpoints, generation indexes, and failure CSVs made long batches resumable without hiding partial failure.범위가 정해진 사내 마이그레이션이라 메시지 큐를 배제하고 생성을 직렬화했습니다. 처리량보다 외부 API 제한·중복 방지·복구 비용이 더 중요했기 때문입니다. 체크포인트·생성 인덱스·실패 CSV로 장시간 배치를 부분 실패를 숨기지 않고 재개할 수 있게 했습니다.

Why

The review dashboard rendered 22,000+ entries in over 5 minutes (a client-side-rendering bottleneck); I cut it to near-instant with lazy loading and browser caching instead of a heavier rebuild.검수 대시보드가 22,000+ 항목을 5분 넘게 렌더(CSR=클라이언트 사이드 렌더링 병목)했는데, 무거운 재구축 대신 lazy loading·브라우저 캐싱으로 거의 즉시로 단축했습니다.

Scaling

Compared six generation-model paths, including Flux and Imagen variants, then chose Gemini for the production batch. The choice followed the measured quality/speed/operability trade-off; model selection was an experiment, not a default assumption.Flux·Imagen 계열을 포함한 생성 모델 경로 6개를 비교한 뒤 대량 배치에는 Gemini를 선택했습니다. 품질·속도·운영성의 실측 트레이드오프에 따른 선택이었고, 모델을 처음부터 정해둔 것이 아닙니다.

Production work,told as the decision behind it. 현업 결과물,그 뒤의 판단으로 정리합니다.

Verified Agentic-RAG — the synthesis검증형 Agentic-RAG — 통합

Verified Agentic-RAG System검증형 Agentic-RAG 시스템

Professional experience실무 경력

Internal Automation Tooling for a Non-Developer Team비개발자 팀을 위한 사내 업무 자동화 도구

Production NLP Categorization Microservice프로덕션 NLP 카테고리 매칭 마이크로서비스

Large-Scale Asset Generation & Review Pipeline대규모 에셋 생성·검수 파이프라인

Production work,
told as the decision behind it. 현업 결과물,
그 뒤의 판단으로 정리합니다.