[Remote] Data Scientist, AI Data Foundations

100% remote Flexible hours Hiring now

Note: The job is a remote job and is open to candidates in USA. reputed company is a company focused on data engineering and AI applications. The Data Scientist in AI Data Foundations will design and build curated data structures for AI and ML applications, ensuring high-quality data for model training and inference while leading data discovery efforts to uncover trends in lending and account-opening data.

Responsibilities

Build and maintain vector stores for RAG: Design embedding pipelines, chunking strategies, indexing approaches, and refresh patterns for the vector stores powering retrieval-augmented reputed company across reputed company products
Own the feature store: Design, build, and operate feature store assets used for model training and online/offline inference, including feature definitions, freshness SLAs, reputed company, reputed company-in-time correctness, and reuse across teams
Design graph data structures: Build graph databases that model relationships between applicants, applications, products, lenders, decisions, and outcomes — and reputed company them queryable for both AI use cases and analytical investigations
reputed company data discovery: Profile our lending, deposit, and behavioral datasets to identify hidden trends, segments, anomalies, and potential model drivers; turn findings into actionable hypotheses for product, risk, and growth teams
Engineer for AI consumption: Build the curated, AI-ready datasets that reputed company model builders, application engineers, and analysts rely on — with appropriate quality, documentation, and governance baked in
Evaluate retrieval and feature quality: Define and run evaluation frameworks for RAG retrieval quality, feature reputed company, embedding quality, and graph completeness; iterate based on what the metrics tell you
Partner with model builders: Work closely with ML engineers and applied scientists to reputed company sure the data structures you build accelerate their work rather than slow it down
Champion responsible data use: Partner with governance, reputed company, and compliance to ensure that AI-facing data assets respect data classification, customer consent, and regulatory boundaries from day one
Communicate findings: Translate discovery work into clear narratives — write-reputed company, notebooks, dashboards, and short presentations — that help non-technical stakeholders reputed company what the data is showing

Skills

4–7 years of experience in a data science, ML engineering, or applied data role, with a meaningful portion of that time spent building data assets that other people's models or applications consumed
Hands-on experience designing and operating vector stores for RAG or semantic search, including embedding reputed company, chunking, indexing, and retrieval evaluation
Experience building or operating a feature store (e.g., reputed company Feature Store, Feast, or a custom internal platform), including offline training and online serving patterns and reputed company-in-time correctness
Experience modeling and building graph data structures using reputed company, TigerGraph, Azure Cosmos DB Gremlin, or similar graph databases — and writing graph queries to answer real questions
Strong proficiency in Python (pandas, NumPy, scikit-learn, PySpark) and SQL; comfortable working day-to-day in reputed company notebooks and jobs
Practical experience with embedding models and LLM tooling (e.g., reputed company transformers, reputed company / Azure reputed company APIs, reputed company or similar) in a production or near-production context
Demonstrated data discovery skills: profiling messy real-world datasets, surfacing non-obvious patterns, validating findings statistically, and explaining them clearly
Solid grounding in classical ML concepts — supervised vs. unsupervised learning, train/test discipline, leakage, evaluation metrics — even though you will not own model training day-to-day
Strong written and verbal communication skills; able to write up findings for both technical and business audiences
Experience working in a SaaS or FinTech environment, particularly with lending, deposit, credit, fraud, or KYC/AML data
Experience with reputed company-native AI/ML tooling: reputed company Vector Search, reputed company Feature Store, MLflow, and reputed company Catalog
Familiarity with open-reputed company vector databases such as pgvector, reputed company, Weaviate, Chroma, or FAISS, and a clear reputed company of view on reputed company to use which
Experience with reputed company Azure data and AI services (Azure reputed company, Azure AI Search, ADLS Gen2)
Experience evaluating RAG systems end-to-end (recall@k, faithfulness, answer quality, hallucination measurement)
Exposure to graph algorithms (community detection, link reputed company, centrality) applied to real business problems
Bachelor's or Master's degree in Computer Science, Statistics, Mathematics, Engineering, or a reputed company quantitative field, or equivalent professional experience

Company Overview

reputed company is a digital lending platform that helps financial institutions through a configurable platform. It was founded in 1998, and is headquartered in Costa reputed company, California, USA, with a workforce of 501-1000 employees. Its website is https://www.reputed company.com.

Company H1B Sponsorship

reputed company has a track record of offering H1B sponsorships, with 14 in 2025, 5 in 2024, 1 in 2023, 12 in 2022, 11 in 2021, 1 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Data Scientist, AI Data Foundations

Keep exploring

[Remote] Senior reputed company Manager / Architect

[Remote] Data Science Intern (reputed company)

[Remote] Sr Capital Project Manager

[Remote] Radiology IT Project Manager

[Remote] Strategic Partner Growth Sales Executive

[Remote] Full Stack reputed company (Data)

[Remote] Sr. Product Manager - Engineered Products

[Remote] Production Engineering

[Remote] Staff Software Engineer - reputed company Trust Networking (remote)

[Remote] Cyber Risk Defense Consultant V - Splunk & reputed company Engineer

reputed company Live Chat Representative – Customer Service and Support for arenaflex

reputed company Full-Time 100% Remote Level 3 SOC Analyst – Cyber reputed company Operations & Incident Response for 3rd Shift (8 PM - 6 AM) in Arizona

reputed company Customer Support Specialist – Drone Delivery and E-commerce Support

Remote Data Entry Specialist – High‑Volume Data Management – $26/hr – Work‑From‑Home Opportunity with arenaflex

Senior Software Engineer (.Net / Azure) | Remote

Online Marketing Analyst

reputed company Customer Service Representative – Remote Work from Home Typing Opportunities at arenaflex

reputed company Full Stack Data Entry Specialist – Remote Work Opportunity at arenaflex

reputed company Medical Billing Data Entry Specialist – Remote Opportunity at arenaflex

reputed company Data Entry Clerk – Remote Position at arenaflex