
DeepFile
2025
At DeepFile, I focused on diagnosing and resolving a high-priority product issue: the platform's file selection logic was returning inconsistent and sometimes irrelevant documents. I led an investigation into the indexing pipeline and semantic search stack, isolating the root causes within how embeddings were generated and how the cross-encoder reranking interacted with metadata filters. After mapping the architecture end to end, I redefined key stages of the file selection flow by adjusting how we parsed, embedded, and ranked documents. I worked closely with the CTO to validate improvements against a set of internal QA benchmarks. In parallel, I helped build out our SharePoint integration, which involved adding multi-user support, securely storing per-user access tokens using Fernet encryption, and developing logic to merge SharePoint documents with the platform's existing semantic search infrastructure. These contributions stabilized the product's core behavior, enabled enterprise client use cases, and laid the groundwork for future LLM-driven document intelligence features.
Software Engineer Intern (Jun 2025 – Aug 2025)
React MSAL File Picker
1import React, { useState } from "react";
2import { useMsal, AuthenticatedTemplate, UnauthenticatedTemplate } from "@azure/msal-react";
3import { openFilePickerPersonal } from "../lib/msal/personal";
4import { openFilePickerOrg } from "../lib/msal/org";
5import axios from "axios";
6import { checkPersonalAccount, checkOrgAccount, getDownloadUrl } from "../lib/msal/helpers";
7import { msalPersonalScopes, msalOrgScopes, msalConsumerAuthority, msalOrgAuthority } from "../lib/msal/constants";
8
9function MsalContent() {
10 const { instance } = useMsal();
11 const [files, setFiles] = useState<unknown[]>([]);
12
13 const handlePicked = async (items: unknown[], token: string) => {
14 const enriched = await Promise.all(
15 items.map(async (i) => ({
16 ...i,
17 downloadUrl: await getDownloadUrl(token, i.driveId, i.id),
18 }))
19 );
20 setFiles(enriched);
21 };
22
23 // ...rest of snippet omitted for brevity
24}
FastAPI Upload Endpoint
1from fastapi import APIRouter, HTTPException
2from pydantic import BaseModel
3import requests
4from io import BytesIO
5from docx import Document
6
7router = APIRouter(prefix="/sharepoint", tags=["sharepoint"])
8
9# Replace with your actual S3 bucket name
10BUCKET_NAME = "deepfile-dev-wn"
11
12class SharePointFileRequest(BaseModel):
13 file_name: str
14 url: str
15 ID: str
16
17@router.post("/uploadFile")
18def upload_file(request: SharePointFileRequest):
19 print(f"File Name: {request.file_name}")
20 print(f"URL: {request.url}")
21 print(f"ID: {request.ID}")
22
23 # Step 1: Download the file from the provided URL (streaming)
24 file_ext = os.path.splitext(request.file_name)[1]
25 print(f"File extension: {file_ext}")
26 with requests.get(request.url, stream=True) as response:
27 if response.status_code != 200:
28 raise HTTPException(
29 status_code=response.status_code,
30 detail=f"Failed to download file from URL: {request.url}",
31 )
32 content = BytesIO()
33 for chunk in response.iter_content(chunk_size=8192):
34 if chunk:
35 content.write(chunk)
36 content.seek(0)
37
38 # Load the .docx file from memory
39 doc = Document(content)
40 for para in doc.paragraphs:
41 print(para.text)
42
43 # Step 2: (Optional) Upload to S3 or integrate with semantic indexing
44 # s3.upload_fileobj(BytesIO(response.content), BUCKET_NAME, request.file_name)
45
46 return {"message": f"Successfully processed {request.file_name}"}
47