Stockholm MLOps Summer Bash 2026 #37, June 11, 2026.
Insights from the Stockholm MLOps community
“The future AI winners won’t be the companies with the biggest models — but the ones controlling where the models run.”
— Yasaar Biladama, Eggsplain
That line captured the deeper story of the Stockholm MLOps Summer Bash.
On the surface, this was a startup showcase. Ten companies. Ten lightning talks. A rooftop gathering. A room full of founders, engineers, operators, AI leaders and practitioners.
But underneath the format, a much more important signal emerged.
This was not an event about AI models.
It was an event about what happens after the model.
Across the evening, speakers were not debating whether AI can write code, generate text, analyze images, automate workflows or support decision-making. Those questions increasingly felt settled.
Instead, the real questions were operational.
Where does AI run? Who controls it? How is it verified? How does it integrate into existing workflows? How does it adapt when the world changes? How do organizations trust the output once AI becomes part of critical systems?
The strongest signal from the Summer Bash was clear:
AI is moving from capability into operations.
Summary — Key Insights From The Event
The event showed that control is becoming a product requirement, not just an infrastructure preference.
Verification is becoming part of the AI stack.
Organizations increasingly struggle less with data volume and more with meaning, context and interpretation.
Domain expertise is becoming the moat above the model.
Runtime adaptation is becoming more important as AI moves into physical, dynamic and operational environments.
Human systems are becoming observable.
And across almost every talk, the same pattern appeared: the hardest AI problems increasingly live outside the model.
They live in workflows, definitions, trust, governance, deployment, behavior and operational control.
What This Event Was Really About
The Summer Bash was not really about ten startups.
It was about a shift in where AI value is moving.
The first phase of the AI wave was dominated by capability. Organizations wanted to know what models could do. Could they write code? Could they automate analysis? Could they generate reports? Could they understand images? Could they summarize information?
The companies at Summer Bash were operating in the next phase.
They were asking what happens when AI enters hospitals, software teams, sports organizations, cybersecurity workflows, recruiting processes, financial systems, factories and on-prem environments.
That is where the discussion changes.
A demo can be impressive without being reliable. A model can generate an answer without understanding the business definition behind it. An agent can write code without knowing how a team wants code reviewed. A system can collect data without helping anyone understand what that data means.
The evening repeatedly showed that production AI is not just about generating outputs.
It is about making those outputs usable, governed, trusted and meaningful.
Control Is Becoming A Product Requirement
“Sovereignty is the freedom to choose.”
— Yasaar Biladama, Eggsplain
One of the strongest recurring themes throughout the Summer Bash was control.
Not control as an abstract compliance word.
Control as a product requirement.
Eggsplain made this explicit by framing sovereignty as freedom of deployment. Their argument was not simply about geography or regulation. It was about giving organizations the ability to decide where AI should run depending on the sensitivity of the workload.
Nordan AI arrived at a similar point from a more practical deployment angle. Their work with smaller local models, on-prem systems and secure environments showed that many organizations cannot simply assume cloud deployment is available.
“We thought we would use the cloud, and then we learned we couldn’t.”
— Nordan AI
That is the reality behind many sovereign AI discussions.
It is not always ideology.
Sometimes the environment simply does not allow the easy option.
Sentaro connected control to cybersecurity. In an environment where attackers use AI to scale phishing, impersonation and social engineering, defensive systems need to understand communication context and respond at machine speed.
Ivy Interactive connected control to software development. If coding agents are creating plans, commits and pull requests, organizations need review gates, verification workflows and human approval points.
The broader insight is that AI is no longer just a capability layer.
Once AI enters production, organizations need to control where it runs, what it accesses, how it behaves, how it is reviewed and what happens when it fails.
There Is No Validator For Meaning
“It wasn’t a hallucination. Your metric definitions have no validator.”
— Carl Larsson, Yari Metrics
One of the most important observations of the evening came from revenue operations.
Carl Larsson described a situation where AI produced misleading business conclusions. The natural assumption was that the model hallucinated.
It had not.
The model was faithfully interpreting the information it had been given.
The real problem was that nobody in the organization had agreed on what key business metrics actually meant. Revenue meant one thing in one system, something slightly different in another, and something else again inside spreadsheets and reporting workflows.
AI did not create the inconsistency.
AI exposed it.
This is a major enterprise AI lesson.
Organizations have invested heavily in validating schemas, pipelines, permissions and data quality. Much less attention has been paid to validating meaning.
Humans have historically acted as the translation layer between conflicting definitions. As AI systems become more involved in reporting, forecasting and decision support, that ambiguity becomes operational risk.
Many organizations believe they have data quality problems.
Increasingly, they may discover they have semantic consistency problems.
Verification Is Becoming Infrastructure
“The only way to know if you and the agent agree is to review the plan.”
— Niels Bosma, Ivy Interactive
Across the event, the most operational companies were not focused on generating outputs.
They were focused on making outputs trustworthy.
Ivy Interactive showed this clearly in software development. Their workflow does not stop when an agent writes code. It moves through planning, verification gates, builds, tests, reviews, commits and pull requests.
The core question is not whether AI can write code.
The question is whether teams can trust what it wrote.
“The question isn’t whether AI can write code. The question is whether AI can learn to write your code the way your team wants it.”
— Niels Bosma, Ivy Interactive
QVoxl showed the healthcare version of the same problem. AI-generated clinical reports cannot simply sound impressive. They need to preserve diagnostic quality, fit clinical workflows, meet security expectations and be ready for clinician review.
“Hospitals require systems that are robust, secure and integrated into existing workflows from day one.”
— Sophie Schauman-Haigh, QVoxl
Sentaro brought the same pattern into cybersecurity. Their work focuses on evaluating the full communication context rather than relying on a single signal. Message intent, sender behavior, urgency, authentication signals and communication patterns all become part of the verification layer.
The broader signal is clear.
Generation is becoming easier.
Verification is becoming strategic.
The future AI stack will not only generate outputs. It will validate, review, audit, ground and govern them before they create real-world consequences.
Understanding Is Becoming More Valuable Than Measurement
“The last decade was about measuring. The next is about understanding.”
— Hampus Jildenbäck, Kin
Several startups challenged one of the oldest assumptions in technology: that more data naturally creates better decisions.
Kin made this argument in elite sports. Athletes are measured constantly through wearables, training systems and biometric signals. Yet teams still struggle to understand why performance changes.
“We measure everything now, but nobody really knows what to do with it.”
— Hampus Jildenbäck, Kin
Yari Metrics made the same point in finance and revenue operations. The issue was not lack of data. The issue was lack of shared meaning.
TeamTalks applied the same idea to human collaboration. Organizations measure software systems, customer behavior and financial outcomes, but the team itself often remains invisible.
QVoxl showed the healthcare version. Medical diagnostics already produces large amounts of measurement data. The bottleneck is turning that data into structured clinical insight and usable reports.
The shared theme is that the AI opportunity is moving beyond measurement.
The next layer of value comes from interpretation, meaning and context.
Domain Expertise Is Becoming The Moat Above The Model
“Generic solutions only get you to the surface.”
— Tom Robbins, Tavi
A clear pattern emerged across the startups.
The strongest companies were not competing on model access.
They were competing on domain understanding.
Tavi showed this in recruiting. Their agent does not simply search for candidates. It maps markets, understands company context, enriches profiles and supports outreach inside the workflow.
Tom Robbins made the point directly:
“Don’t assume you understand someone’s job until you try doing it yourself.”
— Tom Robbins, Tavi
QVoxl showed the same pattern in healthcare. Diagnostic reporting is not just text generation. It requires knowledge of clinical measurements, workflow constraints, compliance, security and diagnostic quality.
Kin showed it in elite performance.
“An athlete, by definition, is an outlier.”
— Levon Movsisyan, Kin
That single line explains why generic wellness models are not enough for elite sport. The user is not average. The system has to understand outliers.
Yari Metrics showed the same thing in finance and revenue operations. The difficult problem is not connecting to data. It is understanding how revenue, margin, pricing and product definitions actually work across systems.
The broader implication is important.
As models become more available, the moat moves upward into workflow knowledge, domain data, integration depth and operational expertise.
Runtime Is Becoming More Important Than Training
“Adaptation, not scale, is the unlock for edge AI.”
— Alexey Sidorov, Lazy Dynamics
Many AI discussions still focus on training.
Summer Bash repeatedly focused on runtime.
Lazy Dynamics made this most explicit. Their technology focuses on adaptation during inference. Physical AI systems need to adjust as environments change. Sensors drift. Conditions evolve. Hardware degrades. Real-world environments rarely stay static.
A model trained in the lab may fail when deployed into reality.
That observation extends beyond robotics.
Sentaro faces the same runtime challenge in cybersecurity. Attackers adapt. Communication channels change. Threats evolve. Static defenses are not enough.
Kin also depends on runtime learning. The system becomes more valuable as athletes check in over time. Day seven creates early patterns. Day ninety creates a much richer contextual model.
Ivy Interactive adds runtime verification to software development. The value is not only in generating code once. It is in continuously verifying, iterating and reviewing the work.
The broader signal is that production AI systems cannot be static.
They need feedback loops, memory, monitoring, adaptation and operational control after deployment.
Human Systems Are Becoming Observable
“The most important system in any organization remains largely invisible.”
— Sebastian Wettemark, TeamTalks
One of the more surprising themes of the event was how often the human layer appeared as the real system being modeled.
TeamTalks focused directly on this. Their premise is that teams are among the most important systems in any organization, yet they remain difficult to observe. By analyzing conversations and behavioral patterns, they try to reveal psychological safety, participation imbalance, hesitation, conflict and blind spots.
Kin approached the same idea through sports. Performance is not only physiological. It is contextual, emotional and behavioral. Voice check-ins become a way to capture what biometric data misses.
“Voice is going to be a complete game changer in unlocking your everyday context.”
— Hampus Jildenbäck, Kin
Nordan AI described adoption as a human challenge. Getting people to use AI requires leadership support, incentives, psychological safety and grassroots energy.
Yari Metrics showed how organizational ambiguity becomes technical risk. If no one owns the definition of revenue or margin, AI cannot solve that.
The organization itself becomes the bottleneck.
The larger pattern is that AI in production is not only about machines operating better.
It is also about making human systems more visible, understandable and governable.
Smaller, Specialized Systems Are Becoming More Interesting
“You can do a hell of a lot with smaller models.”
— Tim Isbister, Nordan AI
The Summer Bash repeatedly challenged the assumption that bigger is always better.
Nordan AI argued that organizations can do a lot with smaller models when they understand the constraints and design the system properly. Smaller models require more explicit prompting, better routing, stronger structure and careful evaluation, but they can operate effectively in environments where frontier models are unavailable.
“It is better to have many small agents that are very specific.”
— Tim Isbister, Nordan AI
Lazy Dynamics made a related point from the edge AI perspective. A smaller system that adapts in real time may be more useful than a larger static system.
Eggsplain emphasized specialized local deployments where companies do not necessarily need one giant general-purpose model. They may need a specific model for quality control, industrial monitoring or local workflows.
Tavi made the same argument at the product level. A generic agent can do many things superficially, but a specialized agent that understands a profession can create more value.
The theme is not that large models stop mattering.
It is that production AI will increasingly include many smaller, specialized, local, adaptive and domain-specific systems.
AI Is Moving From Creation To Orchestration
“Agents are sexy. Deterministic, guardrailed automation beats them for most production workflows.”
— Carl Larsson, Yari Metrics
Many talks pointed toward the same architectural transition.
The hard part is no longer simply creating an output.
The hard part is orchestrating the system around the output.
Ivy Interactive demonstrated this in software development. The important workflow includes planning, parallel agent execution, verification gates, review actions and pull requests.
Eggsplain described orchestration across deployment environments: on-premises systems, sovereign cloud, hybrid deployments, European data centers and dedicated hardware.
Tavi described orchestration across recruiting workflows: market mapping, candidate enrichment, company context, shortlists and outreach.
Nordan AI described orchestration across smaller local agents, routers, context summarization and structured tool use.
The broader shift is from single AI interactions to managed AI workflows.
That is where much of the real operational complexity now lives.
Patterns Across Talks
Although the companies came from very different markets, the same patterns kept appearing.
Eggsplain, Nordan AI and Sentaro all pointed toward control as a core requirement. Yari Metrics, QVoxl, Ivy Interactive and Sentaro all showed that verification is becoming part of production AI infrastructure. Kin, TeamTalks and Tavi showed that context and human behavior are becoming increasingly important sources of intelligence. Lazy Dynamics and Nordan AI challenged the assumption that bigger models are always better.
The most important pattern was that almost no company was competing primarily on model access.
Their advantages came from workflow knowledge, deployment control, verification systems, domain expertise, runtime adaptation and operational understanding.
The moat is moving above the model.
Signals From The Room
No formal audience polling or Mentimeter data was included in the material used for this extraction, but the room still produced a clear signal.
The audience was not being sold on whether AI matters.
That debate was over.
The conversation moved quickly toward harder operational questions. How should systems be deployed? How should outputs be verified? What happens when AI enters regulated workflows? How do organizations maintain control? What should be automated, and what still requires human review?
The strongest audience signal was that grounded examples mattered most.
The Yari Metrics story about broken metric definitions, the Nordan AI lessons from local model deployment, the Ivy workflow for verification, and the QVoxl example of doctors spending half their time writing reports all worked because they described concrete operational pain.
That reflects a maturing ecosystem.
The Stockholm AI community is moving beyond generic AI excitement and into specialized production problems.
Most Memorable Quotes From The Evening
“It wasn’t a hallucination. Your metric definitions have no validator.”
— Carl Larsson, Yari Metrics
“The future AI winners won’t be the companies with the biggest models — but the ones controlling where the models run.”
— Yasaar Biladama, Eggsplain
“Verification is becoming infrastructure.”
— Stockholm MLOps Insight
“Adaptation, not scale, is the unlock for edge AI.”
— Alexey Sidorov, Lazy Dynamics
“The last decade was about measuring. The next is about understanding.”
— Hampus Jildenbäck, Kin
“The question isn’t whether AI can write code. The question is whether AI can learn to write your code the way your team wants it.”
— Niels Bosma, Ivy Interactive
“You can do a hell of a lot with smaller models.”
— Tim Isbister, Nordan AI
“Generic solutions only get you to the surface.”
— Tom Robbins, Tavi
What This Event Signals
The Stockholm MLOps Summer Bash 2026 signals that AI in production is moving into a new phase.
The first phase was dominated by capability. Organizations wanted to know what models could do.
The next phase is different.
The question is no longer only whether AI can perform a task. The question is whether AI can perform that task reliably, securely, economically and under real-world constraints.
This shift changes the center of gravity of AI infrastructure.
The important layers increasingly include deployment control, verification systems, domain-specific workflows, human approval loops, behavioral context, semantic definitions, runtime adaptation and operational governance.
It also changes the startup landscape.
The most interesting companies may not be those claiming access to the best model.
They may be the companies building the systems that make AI useful after deployment.
The strongest conclusion from the Summer Bash is that AI is growing up.
The demo era is not over, but it is no longer enough.
Production AI belongs to teams that understand systems, constraints, humans, workflows and trust.
The future of AI in production will not be defined only by what intelligence can generate.
It will be defined by whether organizations can operate that intelligence safely, meaningfully and reliably in the environments where it actually matters.
Speakers & Companies Featured
Yasaar Biladama — Eggsplain
Alexey Sidorov — Lazy Dynamics
Carl Larsson — Yari Metrics
Hampus Jildenbäck — Kin
Levon Movsisyan — Kin
Johan Malmqvist — QVoxl
Sophie Schauman-Haigh — QVoxl
Fredrik Geijer Hæggström — Sentaro
Tom Robbins — Tavi
Sebastian Wettemark — TeamTalks
Niels Bosma — Ivy Interactive
Tim Isbister — Nordan AI
Carl Carlheim-Gyllensköld — Nordan AI
Event Details
Location: IBM, Stockholm
Date: June 11, 2026
Event: Stockholm MLOps Summer Bash 2026
Format: Startup lightning talks and community gathering
Topics: Production AI, Sovereign AI, AI Verification, AI Orchestration, Human Systems, Edge AI, Healthcare AI, Cybersecurity AI, AI Coding Agents, Domain AI