AI research & product

Building AI that works with people, not just for them.

We're an artificial intelligence research and product company. Our work focuses on human-AI collaboration, model adaptability, and making frontier systems more widely understood and accessible.

Latest · Research
Interaction models: a scalable approach to human-AI collaboration

We are scientists, engineers, and builders who've created some of the most widely used AI products and open-source tools. While AI capabilities have advanced dramatically, key gaps remain — in scientific understanding, accessibility, and the ability for people to customize these systems to their own needs and values.

To bridge those gaps, we're building systems that are more widely understood, more adaptable, and genuinely capable across a broader range of human expertise.

Science is better when shared.
Scientific progress is a collective effort. We publish technical posts, papers, and code regularly — sharing improves both public understanding and our own research culture.
AI that works for everyone.
Instead of focusing solely on autonomous systems, we build multimodal systems that collaborate with people — adaptable to the full spectrum of human expertise and need.
Solid foundations matter.
Model intelligence and infrastructure quality are the cornerstones of everything we do. We build for the long haul — reliability, efficiency, and security over shortcuts.
Learning by doing.
Products and research co-evolve. Real-world deployment keeps our work grounded and guides us toward the most impactful problems — including in AI safety.
Interaction models: a scalable approach to human-AI collaboration May 2026
Rethinking fine-tuning: adaptability without catastrophic forgetting Apr 2026
Multimodal grounding in open-ended tasks Mar 2026
Evaluation gaps: what benchmarks miss about real-world capability Feb 2026

Work on hard problems with us.

We're building AI systems that push technical boundaries while delivering real value to as many people as possible. Our team combines rigorous engineering with creative exploration.

Research

Ideas we're working on, openly shared.

We publish technical posts, papers, and findings as we go — honest progress on hard problems in AI.

Interaction models: a scalable approach to human-AI collaboration
May 2026
We introduce interaction models — a framework for structuring ongoing human-AI collaboration that scales across tasks, domains, and user expertise levels without sacrificing adaptability.
Language modelsMultimodal
Rethinking fine-tuning: adaptability without catastrophic forgetting
Apr 2026
A study of parameter-efficient fine-tuning strategies that preserve general capability while enabling deep specialization. We share recipes, ablations, and failure modes we encountered.
Language modelsInfrastructure
Multimodal grounding in open-ended tasks
Mar 2026
How do vision-language models connect visual inputs to abstract reasoning? We examine grounding quality across tasks where the answer isn't in the image — and propose evaluation criteria.
Multimodal
Evaluation gaps: what benchmarks miss about real-world capability
Feb 2026
Standard NLP benchmarks increasingly fail to predict useful AI behavior. We diagnose the disconnect and propose a complementary evaluation methodology grounded in deployment outcomes.
SafetyLanguage models
Reward hacking at scale: patterns and mitigations
Jan 2026
An empirical study of reward hacking behaviors across RLHF-trained models at different scales. We document consistent failure patterns and test targeted mitigations in training.
SafetyRL
Projects

Things we've built and released.

From interactive demos to open-source tools — practical outputs of our research, built to be used, forked, and built upon.

CollabChat
An interactive demo of real-time human-AI co-writing. Explore how interaction models change the feel of collaborative text generation.
AdaptKit
Open-source library for parameter-efficient fine-tuning. Implements LoRA, IA³, and our custom hybrid adapters with one-line integration.
GroundBench
A benchmark suite for multimodal grounding evaluation. 4,200 tasks across 12 visual reasoning categories with human baselines included.
EvalScope
A lightweight tool for running, comparing, and visualizing LLM evaluations across custom task sets. Designed for research teams with limited infra.
RewardLens
Visualize and audit reward model behavior during RLHF training. Surfaces reward hacking patterns in real time across training runs.
OpenDomain-QA
A curated open-domain question-answering dataset spanning 18 professional fields, built with domain experts and released under CC-BY.
News

Updates from Thinking Innovative Lab.

Press mentions, announcements, and milestones. For research updates, see our blog.

May
21
Announcement
Thinking Innovative Lab launches out of stealth with seed funding
We're excited to share that we've raised a seed round to accelerate our research into human-AI collaboration and adaptable AI systems.
May
14
Press · TechCrunch
The labs betting that human-AI collaboration beats full autonomy
TechCrunch covers a wave of research labs, including Thinking Innovative Lab, focused on building AI that works alongside people rather than replacing them.
Apr
30
Announcement
AdaptKit v1.0 released — open-source fine-tuning for everyone
Our parameter-efficient fine-tuning library is now publicly available. Over 200 stars on GitHub in the first 48 hours.
Apr
12
Event · NeurIPS Workshop
Talk: rethinking evaluation for real-world AI systems
Our research lead presented findings from our evaluation gaps study at the NeurIPS Evaluation Workshop. Slides and recording now available.
Mar
05
Press · Wired
Why some AI researchers think benchmarks are broken
Wired features our work on evaluation gaps, alongside other researchers questioning whether standard NLP benchmarks predict real-world performance.
Careers

Work on problems that matter.

We're looking for scientists, engineers, and builders who want to shape how AI develops — working openly, rigorously, and collaboratively.

Small teams, big ownership.
Everyone owns meaningful pieces of the work. There are no large layers of management between an idea and shipping it.
Research and product, together.
We don't separate the people who think from the people who build. Good ideas come from both directions.
We publish what we learn.
Writing, sharing code, and giving talks are part of the job — not side projects. We think openly about what we're doing.
High bar, low ego.
We care deeply about quality and give direct feedback. We don't confuse confidence with certainty or seniority with correctness.
Research scientist — language models
ResearchFull-time · Remote
ML engineer — training infrastructure
EngineeringFull-time · Remote
Product engineer — human-AI interfaces
EngineeringFull-time · Remote
Research engineer — multimodal systems
ResearchFull-time · Remote
Product designer — AI experiences
DesignFull-time · Remote
Operations & people lead
OperationsFull-time · Remote
Fully remote
Work from anywhere. Async-friendly with no mandatory office.
Competitive equity
Early-stage equity with meaningful upside and transparent vesting.
Learning budget
Annual budget for courses, conferences, and research access.
Health coverage
Comprehensive medical, dental, and vision for you and dependants.
Flexible time off
Unlimited PTO with a minimum encouraged — we mean it.
Equipment stipend
Full home office setup covered on day one.

Don't see the right role?

We're always interested in hearing from exceptional people. Send us a note about what you're working on and why you'd like to join.

thinkinginnovativelab@gmail.com