AI Engineer Melbourne 2026 Keynote Livestream — Day 1 — Full Transcript (June 3, 2026)

[00:00:00] that I should I'm really good at sort of summarizing themes and [00:00:29] and so I've prepared six themes that I wanted to set your palette with I guess for the rest of this conference the first thing I wanted to talk about is how AI is increasingly not just models here's two side-by-side shots of open the icing into a model is increasingly the harness right yeah 21 is pivoting to because is as well and also AI is also about services and data and brand and products right so I think this all these trends [00:00:59] are really good for AI engineers because it is everything outside the model but the second trend also shows that the models are and I think the realest world benchmark is probably the amount of code that's written by AI [00:01:11] this is a screenshot of cloud code commits in github over time as of February it was about four to five percent of all code right now it's probably sitting at around 10 percent and towards the end of the year it's going to be around 40 to 50 percent everyone is sufficiently appreciating that is currently changing the entire world [00:01:30] cloud code code code agents are I mean agents are also just self-hosting and enabling slot forks which are really interesting and also of course obviously there's the old cloud moment which I don't need to tell you guys about you've heard enough about it already [00:01:42] I'm also very fine which starts to mark the emergence of agents in non-verifiable domains and I think lastly the high potential is exceeding normal knowledge work [00:01:54] knowledge work towards the frontier science and technology and math software development lifecycle first starting with the IDEs last year the stevia this is called the Jevons paradox as well and really this is my argument around why AI engineering will be the last job why you guys all have job security [00:02:12] and why you guys will take over the world so I hope over the next two days you have really fruitful discussions and I'm looking forward to catch up online see ya [00:02:19] all right and now and now Sean really owes me one so we'll be getting here before too long don't you worry all right so to kick off the talks proper we have George Cameron so George is an Aussie who now is in San Francisco there seems to be a bit of a pattern there [00:02:42] and started something kind of to solve their own problems a couple of years ago there's a great interview with Sean Wang on latent space with the founders there at artificial analysis which is the firm that he founded so if you want to hear the backstory and everything they do I highly recommend that [00:02:58] but what they do is essentially they burn lots of tokens to objectively verify across all kinds of benches and benchmarks their own and and open ones that you know what how the models rank how they compare so this is very valuable as part of the conversations we're going to have this morning where we're going to look about tokenomics and and how kind of the cost of using inference is starting to build into our product strategies but to [00:03:27] set the scene with all the insights that he has into the modern model landscape would you please welcome George Cameron from artificial analysis [00:03:34] George Cameron from artificial analysis [00:03:40] George Cameron from artificial analysis [00:03:54] all righty we there we go great thanks for having me here so I'm George one of the co-founders of artificial analysis very quickly about us or an independent AI benchmarking company we help millions of users with our website artificial analysis dot AI understand what's happening in AI and choose between all the different technologies with our independent benchmarks [00:04:09] um and we're also quite widely referred to in the industry at the start of this week Jensen referred to us for the Nemo Tron 3 ultra release last week [00:04:12] Anthropic with opus 4.8 and now gdp val a benchmark the week prior to that [00:04:19] Sundar with a GPT sorry Gemini 3.5 flash release um and so very happy to be here and share a bit about um what we do and give an overview of how we see the state of AI in June 2026 [00:04:26] um [00:04:33] we benchmark across the AI stacks we benchmark [00:04:37] we benchmark across the AI stacks we benchmark agents models [00:04:41] uh in [00:04:43] in [00:04:45] in [00:04:47] inference providers and [00:04:57] and hardware [00:04:58] uh and we also benchmark um kind of text focused language models but also image video speech uh and music models as well [00:05:08] i think to start off with um or provided essentially this chart it's a bit hectic um which shows the leading releases from the leading AI labs over the last few years [00:05:23] and it's a bit hectic but i think that's probably no surprise to us in this room because it's been a hectic few years [00:05:31] there's been a lot of releases uh and a lot of progress and i think [00:05:35] why i like to start with this chat is it shows that AI progress has not slowed down [00:05:41] i think claims of that have been greatly uh greatly exaggerated i heard a lot of that [00:05:46] in late 25 but then um opus 4.6 came out [00:05:51] um agents started working for much longer horizons and uh people quieted down a bit [00:05:56] and i think this chart shows that really well [00:05:58] is that there's more dots on this chart than ever [00:06:01] um [00:06:02] if you look at the last three months here [00:06:04] which shows there's been more leading releases [00:06:07] um and it's going up and to the right pretty quickly which is showing progress in intelligence [00:06:17] i'll first introduce our artificial analysis intelligence index metric [00:06:21] now this is a synthesis metric of ten benchmarks that we run to test language model intelligence [00:06:27] which provides a high level synthesis overview uh and relative comparison of the intelligence of these models [00:06:36] we have the current set of leading models on this chart here and i think to note [00:06:42] claude opus 4.8 last week took the mantle from gbt 5.5 um as the leading language model in terms of intelligence [00:06:52] but i think rather than kind of stopping there if we're able to stop there it'd make our jobs a lot easier [00:06:58] but i think there's a the story is is there's a reason still uh to use other models considering the cost the speed and other trade-offs at play amongst language models today [00:07:10] i'll first take a look at different perspectives on thinking around the current state of models [00:07:17] you can see here that it's very much a us and china story when looking at where the labs making these leading language models are based [00:07:25] also present on this chart is france with mistral south korea with a number of labs and a very successful sovereign ai initiative [00:07:36] um and then also the united arab emirates um i think to us in this room notedly missing is australia [00:07:44] australia doesn't have kind of language models that are competitive with the frontier in terms of intelligence [00:07:50] um and from my perspective i don't think that's going to happen uh happen soon either [00:07:57] another perspective is looking at open weights or colloquially open source models um compared to that of proprietary models [00:08:06] proprietary models and proprietary intelligence achieved on the y-axis we have our intelligence index [00:08:12] uh and then on the x-axis release date and what these lines plot is the leading model that is kind of proprietary [00:08:20] compared to open weights and open weights has always trailed proprietary intelligence this chart is really going back to [00:08:27] um kind of mid mid-23 with the llama 270b release and you can see that yes open weights is a [00:08:35] little trailed open uh sorry proprietary intelligence but it's in a sense kind of kept up roughly three to nine months [00:08:43] behind that of proprietary intelligence this was not a given it was an open question uh years ago [00:08:50] um or only a few years ago people asking around the commercial models um that could support open weights [00:08:57] um but i think what we've seen it has kept up for a number of reasons i think our perspective is this is going to continue [00:09:04] um with there being sufficient incentive there um particularly for labs wanting to serve those [00:09:10] looking for open weights models for their flexibility the ability to fine tune uh and other reasons [00:09:15] and then secondly i think another perspective on this is if open weights is kind of three to nine months [00:09:21] behind we're looking at opus maybe 4.5 charity gbt 5.2 for the intelligence that um you can essentially get with open [00:09:33] open weights models with a kimike 2.6 or the the latest kind of deep seek uh v4 pro um and so it remains an option um you could do a lot with with opus 4.5 or gbt 5.2 and so it remains an option um and it looks like that's going to continue based on based on the last few years and our house perspective [00:09:55] um we benchmark using quantitative benchmarks but we try and essentially ensure that those benchmarks are aligned from real world use [00:10:03] you don't have um models hill climbing on on a number that doesn't uh relate to increasing in essentially the usability of these models [00:10:13] uh and and the use cases uh and the use cases uh and the use cases uh and the use cases we're exploring with ai now and so we like to sense check um essentially the progress and i think what we tell here with our gdp [00:10:23] val aa agentic benchmark that uses an open ai data set we turned it into an agentic benchmark is a perspective of okay [00:10:31] a leading model a year ago with claude four a claude four sonnet sorry and then now with gbd 5.5 [00:10:39] and you can see progress is being made in terms of real world knowledge work output on the left we have a table useful on the right we have essentially additional synthesis um executive summary uh provided kind of very useful totals um and kind of further analysis than we had a year ago [00:11:01] so agents have progressed on economically valuable tasks and we also like to look at kind of maybe less knowledge worky uh economically valuable tasks as well progress has been made uh there as well [00:11:15] uh and so this is a kind of music video mood board um task within gdp val aa and you can see that this is comparing to kind of uh some some open weight models but [00:11:27] uh on the left we have what gemma um e4b i mean it's amazing that it even did this but i think as a music video mood board maybe i'm not creative enough but i don't think i could use you use the one on my right here um but if we compare that with a gbd 5.5 maybe i could kind of do something with this uh and so i think it shows that there are still real differences uh between the between the models [00:11:51] and the kind of benchmarks do correlate to uh real world capabilities [00:11:57] so i think one kind of two things that are hard to reconcile in ai today is one that we have cheaper intelligence so it's cheaper than ever to access gpt4 level of intelligence [00:12:11] kind of going back to what i was saying earlier you can get a kind of kimike 2.6 now cheaper than you can get opus 4.5 uh you know six months ago [00:12:21] and so i think this speaks to you can get intelligence cheaper than ever but we're also spending more than ever [00:12:28] uh within our companies we're kind of upgraded to the 200 a month uh claude code or codex plans and kind of wise [00:12:36] that is and i kind of break it down into six drivers here that we see is the most critical or kind of breeze [00:12:42] through them each of this could be a talk but like smaller models able to achieve more intelligence is the first [00:12:48] lowering cost increased sparsity of the models lower proportion of active parameters to total kind of [00:12:54] when inference compute is at scale the kind of number of active parameters is really the driver of cost how [00:13:00] you amortize hardware uh next is kind of software efficiency gains to call out here in the inference stack [00:13:08] in kind of vllm or sg lang um optimizations are happening every day that of flash attention also [00:13:17] kind of closer to the model exploring different quantizations we're no longer at bf16 now we're at [00:13:23] kind of kind of four bit precisions whether that's in for with kind of moonshot models or whether that's [00:13:28] like nvfp4 uh with other models um we're making kind of better efficiency trade-offs essentially to offer [00:13:37] cheaper intelligence and increase efficiency third is hardware efficiency hardware is kind of costing [00:13:44] more but is also offering more and so a nvl 72 node is able to offer a lower cost when serving at scale [00:13:55] than your h100 node even though it costs more and so that higher efficiency can translate to lower costs [00:14:01] offered on what's raising costs larger models and how do you reconcile this with the first it's that [00:14:06] everybody wants or not like not every use case but there's an insatiable demand for frontier intelligence [00:14:13] we see that continuing i think for yourself i'd love to be more intelligent i'd love the people i work [00:14:18] with to be more intelligent and so i think that translates to models and so there's an insatiable [00:14:23] demand for kind of larger models increases cost next reasoning models labs are increasing intelligence [00:14:31] through more reasoning tokens everybody knows we pay per token and so higher cost lastly agents so it used to [00:14:39] be you um you know in chat tpt you would uh ask a question or send a request to the api and you get [00:14:46] the response back maybe make that maybe let's put that into a word through code now you're having agents [00:14:52] doing exploration of other files of uh checking its own work and it's improving the output but we're [00:14:59] dealing with multiple turns for the same task and so in our benchmarks we're commonly seeing 60 turns [00:15:07] uh for a gdp valet task um and so kind of 20 to 100 turns is quite sensible for many knowledge work tasks [00:15:15] and it acts as a multiplier on the cost this is a chart showing kind of model language model inference [00:15:25] price uh falling so here we've bucketed models in their intelligence so in our intelligence index [00:15:33] 0 to 10 10 to 20 etc and we've looked at okay when was that model when was that intelligence achieved [00:15:40] which is the first dot of each line and then where are we now in terms of the the latest release uh or [00:15:47] the cheapest release that is able to access that intelligence and what we see here is really in kind [00:15:52] of six to uh 18 month periods you have the cost of that intelligence falling often 10 to 100 times [00:16:04] and this is something that we can all think about and take advantage of if a task was able to you could [00:16:09] do it with opus 4.5 odds are now that you can there's orders of magnitude at play it's not [00:16:15] 2x cheaper as you can go 10x cheaper in many cases by choosing a cheaper model that has more recently [00:16:23] been uh been been released and so i think both in station of demand but for like kind of increased [00:16:30] intelligence but especially for like real world more defined use cases uh often there's 10 to 100 times [00:16:37] uh cheaper options uh at play in terms of model selection and this is that played out again so here [00:16:45] we have a chart of intelligence and then the cost to us of running benchmarks and so here's the on the [00:16:52] x-axis we have the cost to run our intelligence index those 10 benchmarks and um i don't know if we'd be [00:17:00] able to afford it back in when we kind of started at kind of in 23 24 but now it's costing above 4 000 [00:17:07] us to run these 10 benchmarks for frontier models that are pushing what ai intelligence is able to [00:17:14] achieve and so you can see opus 4.8 costs over 4 000 but there's a clear kind of pareto curve that one [00:17:22] should think about when selecting models for use cases thinking about the cost uh sensitivity um and [00:17:29] there's options at play um amongst the pareto curve that really does span uh 100x plus in terms of cost [00:17:36] differences [00:17:40] again could talk about it a lot more detail but this is a chart just showing what's underpinning [00:17:45] a lot of the efficiency gains and that's hardware improvements so you can see that the [00:17:51] kind of a b200 node is able to offer greater output speed per query but also greater system throughput it [00:17:59] can scale to more users greater concurrency and so this allows amortizing the cost of that hardware over [00:18:06] more users which is supporting lowering of intelligence costs and inference costs [00:18:16] i think agents are a big a big category and it requires different ways to think about each of the [00:18:22] agent uh categories we track seven fast growing agent categories that we see as having achieved product [00:18:30] market fit in a sense um and a fast scaling so that's coding agents general work agents not just chat gpt but [00:18:37] but your coat your local codex your claude co-work uh that have really reached a level of maturity a lot [00:18:45] more to go but a level of maturity for real world use next chat bots presentation agents ocr data analysis and [00:18:54] customer support as seven agent categories that have really reached a level of maturity um and product market fit and they're continuing to grow very quickly [00:18:59] double-clicking into that which is probably most interesting to our ai engineering audience here and [00:19:10] that is coding agents so coding agents they're not the model they're the model and the harness both the drivers [00:19:17] of performance because when we're using coding agents we're not just using opus 4.7 where you or opus 4.8 [00:19:25] 8 we're using opus 4.8 in cloud code in cursor cli [00:19:34] so sarah is now head of ai at notion we've lost her now for some reason um can we get sarah back there she is [00:19:41] uh so she's the head of ai at notion but she's formerly been at robin hood and google and she she's the [00:19:47] engineering lead on ai modeling uh reasoning agentic orchestration and model infrastructure uh in addition [00:19:53] so she was recently interviewed by sean wang on latent space i highly recommend just go to latent space [00:19:59] you keep up with everything uh and she taught so if you want to hear more of backs her personal [00:20:03] backstory and what they're doing at notion but she's going to there's a bit of a theme emerging as i [00:20:07] mentioned around tokens the cost of tokens how that feeds back a kind of back pressure of itself of its [00:20:13] own back into our product strategy so to hear about what they're doing there uh at notion would you please [00:20:20] welcome sarah sachs hi hello we're good yeah okay thank you for having me if i was in the same cinema [00:20:31] i would be giving artificial intelligence a gigantic high five um so i'll just assume that you see me and [00:20:37] you're doing it um the reason is because i completely agree with everything that was just presented [00:20:42] and i think what i want to talk about is the challenge of actually serving that and making [00:20:47] those decisions um raise your hand if you work for a fortune 50 company okay good for you three of you [00:20:55] four of you raise your hand if you don't and raise your hand if you're serving ai products to your [00:21:00] customers okay what about us right what does it mean to work at when you don't have the scale and the [00:21:06] negotiating power but you want to be delivering what's best to your customers um but first like [00:21:11] any engineering manager it's never great to take credit for everything that you're doing this is [00:21:15] just a sample of the team um that actually builds everything i'm about to talk about so i always like to [00:21:20] really include kind of just a sample of everything we're building so i want to walk through some real [00:21:27] scenarios and again i we didn't share notes so these will feel very familiar with what you just saw [00:21:33] um these are real i won't name names but you can um quickly google so exhibit a a reasoning model [00:21:39] gets upgraded but don't worry it's the same price okay um it might be the same price per token but [00:21:46] you run it on the same exact task and what happens it's three times as many output tokens okay um maybe [00:21:53] that makes it better it reasons more um but what do you do right um let's look at exhibit b [00:22:00] there's a successor model so increment your random decibel by 0.1 okay it's 40 percent more expensive [00:22:09] than its predecessor but breaking news the predecessor is being deprecated in four months [00:22:14] and you've built your whole product on top of it are you increasing your prices by 40 no hopefully not [00:22:21] so what do you do well if you're one of the four people that raised your hand it's fine your ceo gets [00:22:26] to go on bloomberg and complain about it and you probably get a great deal but what about everyone [00:22:30] else right the fortune 500 they have these dedicated ai teams they get to kind of negotiate and they [00:22:37] negotiate alone and they negotiate with incredible leverage and i i have a hypothesis that that's leaving [00:22:43] behind a lot of the market and it's not an efficient market um you're basically left today i see this all the [00:22:50] time with our customers and i see this with my partners in the industry you're left behind with no leverage [00:22:56] my job at notion is to represent that fortune 5 million because they're our customers [00:23:01] especially with usage-based pricing the accessibility of ai is determined by the price [00:23:06] by the price of those workflows um you know we're handling um tens and tens of trillion tokens a month [00:23:13] but we also have over 100 million customers what does that mean it means that i get to play that job [00:23:20] of negotiating at scale but bringing it to the rest of you and there's a couple lessons i learned that i [00:23:25] don't think you need those trillions of tokens a month to use and i want to share that with all of you [00:23:30] because i actually think in this economy there's a lot to be gained and i think the market's not fair to us [00:23:36] so we see this right um for 200 a month you can consume 5 000 and compute how is that possible [00:23:45] because anthropic has the cost of goods served for themselves and by the way we're very close partners [00:23:50] with them and i think for very frontier reasoning tasks this is appropriate okay for really hard tasks [00:23:56] i want to pay more because we're not there yet but for a lot of situations that's that's not where we are [00:24:03] so the supplier is your competitor right how many of you see in the next 10 years that you could be [00:24:08] competing with open air anthropic on some product that you're building okay i feel like not enough [00:24:13] of you are raising your hands which is interesting um but this is the way that it's working right now [00:24:19] and i see this happening in two variations one i see that people are locked in they have no exit [00:24:26] when you see very large applied ai companies with extremely extremely vocal partnerships with frontier [00:24:33] labs there's a very high chance that they're locked into that one vendor that they've taken all of their [00:24:39] spend and they've committed it to that one frontier lab in exchange for exorbitant discounts but they're [00:24:44] stuck okay um that means that tomorrow um if open weight or another frontier model comes out they [00:24:52] don't have the optionality to leave because they've committed 20 million dollars to one of them [00:25:03] that uses the best model available whatever it is so for instance that notion um we just launched this [00:25:27] this is our manage agents product we believe that there's value in optionality you'll see in this [00:25:33] example you can use decagon agents move them so clod code can write a fix and then perhaps have codex [00:25:39] review it and then put it in a task list for humans to review right we can charge you sticker price if not a [00:25:46] small discount on these models but the benefit that you're getting is the experience around it [00:25:53] and it's not just capability we see this all the time now i think kind of i wrote a twitter article [00:25:58] that a lot of people identified with a couple months ago now i think the pot's boiling on capability [00:26:05] alone you should always be thinking about cost per capability per second artificial analysis is a [00:26:11] wonderful resource to do that if you don't have your own resources and like it's it's happening right [00:26:17] we see these crazy stories this is just in the past week headlines that have come out where if you only [00:26:23] focus on capability and you don't think about latency or cost you've put yourself in a rough position [00:26:29] it's pretty funny honestly um i think i love these tweets if you're on ai twitter which half of you are [00:26:34] it's it's a funny time to be alive um but it's real right uh but you don't want this to be your customers [00:26:42] okay and it's not appropriate to put your customers in this position of whatever this guy's doing with [00:26:47] his cigar not all traffic is equal we heard a little bit in that graph where we saw you know where [00:26:54] different models are living in notion it looks like this changing a database field triaging an email [00:27:01] you know looking like this dude okay let's get him on minimax as soon as possible summarizing meeting [00:27:08] notes these are all things that we actually don't want to pay 40 percent more when gpt6 comes out [00:27:14] or whatever is next we want to either be using open weight or we want to be rling but for the frontier [00:27:21] tasks we need to be giving that frontier to our customers data analysis deep research the point is [00:27:26] that our customers need us to choose for them otherwise this will be their headlines okay and your product is [00:27:33] inaccessible the problem again is that this isn't how our providers are working right now they're [00:27:40] incentivized basically with two paths again i won't name names um use your critical reasoning or your [00:27:44] favorite model to figure it out either you are the best reasoning model you are the example in all of [00:27:50] these tweets right you are the example of great and no one really questions your price because you're the [00:27:55] first one that passed whatever benchmark you said you passed the second is you're slightly worse but that's [00:28:01] okay you just need to be about 30 seconds per million tokens cheaper and you have the rest of the market [00:28:06] to you what what about everyone else right we're in a duopoly right now where things aren't priced for [00:28:15] their needs complex tasks though i think are appropriate i want to keep highlighting this we see a bifurcation [00:28:22] at least in notion usage and i want you all to be investigating and thinking about there is a place for complex [00:28:29] expensive reasoning models the trap is putting everything there and the best frontier model is [00:28:35] changing fast this is an older slide you saw brand new from artificial analysis okay but the point is [00:28:41] that it's changing constantly we're changing our default model for our customers probably every three to four [00:28:47] weeks um it's a lot of work and we have very advanced evals and teams to do it but you shouldn't need [00:28:52] that i mean there are a lot of benchmarks you can be using if you're switching all the time you can't be [00:28:58] locking your product into a particular provider because if you are the second you go from january to [00:29:03] february you're giving your users a worse experience okay it's not worth the discount i don't think so [00:29:10] i think you're putting a big risk on your company and you're and you're willing to fall behind and [00:29:14] depending on the business for that kind of saturated knowledge work land that's fine but if you want to [00:29:20] be offering frontier you shouldn't be doing this optionality is leverage um you should be ready to walk [00:29:31] at all times for a durable business you should not be locking yourself into a single provider and you [00:29:36] should be comfortable knowing what the landscape of models are so that you're ready to maintain your [00:29:41] margins and maintain your business in a way that you're confident brings your users the best experience [00:29:47] we do this with our auto model so today we have a model at notion that's called auto this is our [00:29:53] model picker it's due for a refresh the number of models is getting long but we let users choose [00:30:00] because sometimes that latency quality cost trade-off is not something you can assume for your customers [00:30:06] you'd be surprised how many people really want to spend a ton of money on email triage [00:30:10] and how many people really don't care about how accurate the research tasks are i mean it's a [00:30:15] phenomenal user research exercise that we could have a whole other conversation on [00:30:19] um and so this is how we do it um we choose for our customers the majority of the time [00:30:25] but 75 percent of the time they they stick there but that 25 is valid and we give them opportunities to [00:30:32] leave with that auto model we have the opportunity to actually think about the task at hand [00:30:38] and give them the most quality that they need at the best price and latency okay so if you're setting [00:30:45] up an email triage agent it's very unlikely that auto will be opus because you're going to get your [00:30:50] first usage based pricing bill and say no way notion i'm out of here okay this is the playbook okay all of [00:30:57] you that love the taking your phone out with slides this is the time this will be online but i get it okay [00:31:04] um build for multi-model be ready to switch understand what the model providers are evaluate on value not [00:31:12] tokens what does that mean this is an example we tweeted this a couple months ago on web search providers [00:31:19] it might be that a certain web search provider is cheaper per token or per request how many requests [00:31:25] is your agent doing how accurate are the results you should be looking at the entire task when evaluating [00:31:32] you should not be thinking about a particular api call that's where they get you okay think about [00:31:38] your use case you are the expert on quality for your use case switch fast switch often give them [00:31:46] something back do you know why these evals were awesome i love competitive dynamics in a market [00:31:53] i love it if we say parallel we chose you but it's close stay up there everyone else on the list here's [00:31:59] exactly why we didn't choose you please fix it a rising tide lives all ships position yourself in a [00:32:05] way where you're getting what you want from your providers forego discounts for optionality i think [00:32:11] we talked about that already and there's a third option which i'm sensing is the theme of the day [00:32:18] which is open weight that moderate tasks we're very heavily considering now and we have live in [00:32:23] production a lot of open weight traffic as well as reinforcement learned models they're strong [00:32:28] enough to handle workloads i don't think that they are going to be in the upper right quadrant of [00:32:33] capability soon but the gap is closing and it also gives you negotiation leverage complete financial [00:32:40] dependence choose the inference provider of your choice right i would say 2.6 was the most was the first [00:32:47] time this really happened for us there's been a lot since then but 2.6 was the first moment [00:32:53] where we actually saw it compare to gpt52 and quality [00:32:59] so here's that eval we saw we see scores and these are on notion specific tasks again you own your product [00:33:08] you get to decide what works for your product for notion specific tasks that we thought the auto model [00:33:15] needed to do for a subset of them we score just fine can make two six is great and errors are important [00:33:23] by the way errors are what you end up paying for um just as a side note you're still paying token rates [00:33:27] on errors and retries think about that in another presentation but it's not just the token cost we look [00:33:34] at the average number of tokens there's some crazy things on this like opus 47 46 versus sonnet look at the token [00:33:41] consumption forget the price on opus 4 sonnet look at the token consumption right this is why understanding [00:33:47] your whole trajectory is really important for understanding your model [00:33:53] i love um philip keely has a great book called inference engineering um even if you're not an inference [00:33:58] nerd it's good to read we don't need to be frontier with open weight it's just a gap in time right there's [00:34:06] product that we served six months ago that our customers love and they don't want to randomly pay [00:34:10] for more tomorrow right we're trusting that open weight is closing that gap but we need to have the [00:34:17] right evals and stay on top of it to know when it's there and that's why investing in your evals are [00:34:22] important how do you win these negotiations well the first thing that you need to do to be ready is [00:34:29] think about your architecture not just your model choice for us we've noticed that harness engineering and [00:34:34] architecture decisions can account for about 3x the change in price as model selection [00:34:41] sometimes that means using a native harness like the codex and the cloud apis sometimes it means [00:34:46] using open source like pi and sometimes it means creating your own harness because there are particular [00:34:50] capabilities you want and you understand the implications of how prompt caching etc might affect [00:34:56] how you work okay but what if you don't need an llm at all i know we said that this is welcome to token [00:35:05] town but we're leaving okay we're now departing token town and i want to take us back to an old world [00:35:12] an old world where we used cpus to do our jobs this is my nano banana i think generated image of an engineer [00:35:23] trying to turn a csv into a pdf and post it on a webhook why would he ever need an llm to do that [00:35:32] why would you want that repeated task constantly using reasoning tokens to navigate mcps okay well [00:35:40] because frontier labs want you to token max they want you to do it in a way that they control their [00:35:44] capacity it's its own economic issue but they want you to token max your users don't your users want [00:35:50] you to outcome max that's why we launched our developer platform and something called workers [00:35:55] at notion we believe that many tasks do better work on cpus determinism is a valuable thing state [00:36:01] machines existed right the pendulum has gone so far because it can but that's not durable software [00:36:08] and it's not affordable software so we've partnered with for sale on the capabilities to launch what we [00:36:14] call workers where you can actually call on internal apis computer sandboxes host small small code [00:36:22] as an action that your llm calls we've seen this decreased token cost by up to 80 percent for some [00:36:27] of our customers on repeated tasks it's wild out there i mean i feel like if i go on a week vacation [00:36:37] and i do a good job not looking at twitter it's like i i hibernated for two years okay um i get it [00:36:44] it's exhausting take breaks take care of yourself but it's crazy and the market is so young it's so opaque [00:36:50] it's moving so fast and you are a player in it i find it very sad that we're all kind of rolling over [00:36:56] but we're all together and our negotiating power is so much stronger if we all advocate for ourselves [00:37:04] and i also think that you owe it to your customers i'm really proud i'm so proud and i come to work [00:37:09] every day because of the way that i think we make ai accessible and valuable for the fortune 5 million [00:37:16] but not everyone uses notion and i want all of you to do that too i want us to make ai something that [00:37:22] isn't a meme i mean in california i read a statistic that ai is more unpopular than ice which is our [00:37:28] immigration controls it's crazy we don't have to you know it's not a popular institution ice so that [00:37:33] means ai is really unpopular why because we're not creating valuable work and we're not doing it at [00:37:39] the right price and people aren't seeing the value and there's a ton of twitter memes but there's real work to be done [00:37:44] and that's our obligation right i think it's a powerful technology and i think it's our duty [00:37:51] to bring it to our customers in a way that's responsible durable and appropriate so i'm on twitter [00:37:57] a lot as you saw please tweet at me you can always email me i'll be here in this hemisphere till the end [00:38:03] of the week in melbourne um thank you for your time thank you for hosting me i love australia and have a great [00:38:09] time uh that's been unnerving i expected to see her right there it felt like she was in the room with [00:38:22] us uh so next up we have someone who whose handiwork i'm sure you will know if you do not know him it was [00:38:30] less than a year ago we had him down in melbourne i just just on a year ago we had in melbourne we did an [00:38:35] unconference and we had a room of about 50 people and then almost exactly a year ago we're in this [00:38:41] in cinema one or cinema two with i guess maybe a couple of hundred people uh since then he's gone [00:38:47] and spoke in what 19 countries or like all over the world because of ralph wiggum uh which i i don't [00:38:55] know if people even know about ralph wiggum anymore like the simpsons character but uh jeff kind of was [00:39:01] innovating ideas about black pressure and loops which were crazy and novel 12 months ago which [00:39:07] are now being literally there's there's the what the if you're using uh if you're using things like uh [00:39:14] codex and the goal or cursor grind mode all these long-running tasks that's implementation of ralph [00:39:20] you're having a data that's where it came from but that was novel back then it's so it's very exciting [00:39:24] for to close this loop so to speak uh and to have him back on stage where we've gone from 50 people [00:39:30] in a little room down at deacon university's downtown center to having 600 people here across a couple of [00:39:37] cinemas so please welcome or maybe curse the man who brought us well with the ralph loop um jeff huntley [00:39:44] thank you john hello everyone um my name is jeff huntley um i am here today i want you to agree with [00:39:54] me or disagree with me i don't know where this is going like i'm going to say some pretty provocative [00:40:00] things like software development now costs less than minimum wage i want you to think deeply about this [00:40:06] software has been commoditized it's kind of similar to this iphone in my hand anyone can now be a [00:40:11] photographer doesn't mean that they're a wedding photographer product managers can now be a software [00:40:15] developer doesn't mean they're a software engineer because a lot of things uh really changed over the [00:40:21] last year so with that with that introduction and done i'd like to say hi mum and i do not work for [00:40:27] anyone i do not represent anyone these are my own ideas and thinks of where things are going you see [00:40:34] it's been about a year since i introduced year and a half since introduced the idea of long-running [00:40:39] tasks and agents you're now using them day to day and i've been trying to figure out where everything [00:40:45] goes from here so here's me giving a talk at alasian about a week before alasian did their layoffs [00:40:52] talking about the unit economics of business have forever changed whoops and i want you to think [00:40:58] deeply about this because the economics of business have fundamentally changed like software is now easy [00:41:05] to create doesn't mean it is like you're creating the right things anyone is now a software developer [00:41:12] you see here here's a meetup i went to about a hundred days ago before i started doing a lot of [00:41:19] my world tour and talks there's this cursor meetup and here is roslyn roslyn is not a classical software [00:41:28] developer by any means at this meetup there was product managers designers they're all having the [00:41:34] time and their lives folks not really software engineers up there because our skill set has been [00:41:40] commoditized you see in my travels when i first started doing my travels about a hundred days ago [00:41:49] i did a side quest over in auckland i went to lord of the rings i was like yeah side quest let's go do it [00:41:54] and my tour guide operator his tour guide operates like jeff what do you do i was like oh i do ai [00:42:01] please don't judge me and he's he's like no how good is ai like i'm building all these things i'm [00:42:07] like what does it mean for our profession when a tool guide operator is token maxing you see everyone [00:42:14] is now a software developer folks everyone is now a software developer i want you to deeply understand [00:42:20] that it's been commoditized anyone can now write software previously software was gated you ever [00:42:26] understood that you can control the computer or you got a user interface that you can click and [00:42:31] configure to get the business outcomes but that's just completely changed now that's just smashed the [00:42:35] paradigms have smashed everyone can now control the computer they can now write code previously this [00:42:40] was gate keep it and it's kind of weird because society's been structured around a scarcity of knowledge [00:42:48] software developers were gatekeeping like no you can't do that oh they'll take two weeks whatever [00:42:54] but it wasn't just that it was like accountants lawyers all the white collar professions we charge a [00:43:00] lot of money because time like expertise means we can charge more for that expertise but what does it mean [00:43:08] when now we've got ai what does it mean if someone wants to do uh wants to do work of a principal [00:43:14] software engineer and uh they start they get a skills pack and they're doing property-based testing [00:43:20] and deterministic system testing i saw something yesterday pewdiepie believe it or not pewdiepie [00:43:27] is writing better tests than most software engineers here right here today go look at his github [00:43:32] he's using antivisys with bomadil with property-based testing meanwhile you're using playwright what does it mean [00:43:38] when these skills that used to be used to be like used to be scarce and now abundance and we've got a youtuber [00:43:48] like just doing better software testing the most software developers or software engineers here today [00:43:53] wow okay so if i read one time around about 2024 i was originally said oh things are going to change [00:44:03] things are going to change and i first wrote uh the publications like hey like ide no one's going [00:44:09] to be using an ide anymore and people were calling me mad absolutely mad it's like no no jeff i love [00:44:15] intellij i love jeff brains it's going to be here forever i'm like no it's cooked it's gone [00:44:20] um and uh for the people in the room here i'm sure there are a few that might be still using their [00:44:25] favorite ide but like it's gone it really is gone it's been replaced for cloud-based workflows or some [00:44:31] other thing idees are now really diff review tools okay so we zoom time a little bit further we have [00:44:39] another oh moment in time now this was uh this was christmas so years passed and society is now slowly [00:44:49] having the same moment in time i want you to carefully think about this no matter how much [00:44:55] ai gets good and it's getting like this slope-on-slope derivative of getting good [00:45:00] it it takes time downtime before people realize society downtime the reason they had the oh moment [00:45:07] wasn't that that the lm's were getting good they were already good for societal disruption in 2024 [00:45:13] they're already good enough they had it they required a lot of skill to get those outcomes [00:45:18] but uh now they're just they're just set and go they just work but it took the downtime the public [00:45:24] holidays for people to really sit down and realize that things were getting good you see [00:45:32] the people around me back from 2024 and even before that like the people who are really getting good [00:45:37] with ai is we've been putting in deliberate intentional practice we classify them like musical instruments [00:45:45] you see muses don't just pick up a guitar and give it a strum and go oh that that guitar is crap and [00:45:51] throw it on the ground they treat it like a calculator but why is it the employees themselves like we've been [00:45:56] forcing these guitars down into corporate and we're just going like please play the guitar please play the car [00:46:02] please get the guitar please like token leaderboard what else have you it's really just literally a [00:46:07] curiosity test will you pick up the guitar and invest in yourselves folks but not everyone is going [00:46:16] to be musically inclined is my take you see i think there's now two classes of companies we now have the [00:46:24] companies that are already lean in the sense the startups of the last year that are going no we're never going to [00:46:30] hire more than 50 people in our company apart from a field engineering meanwhile we've got everyone down [00:46:34] the down on the bottom which is every single other company out here and they've got to go for a j curve [00:46:39] transformation program takes three or four years normally that's fine but here we here we've got [00:46:45] brand new clayton christians and startups building with slope on slope like pace being able to as the [00:46:52] models get better come in to attack the every every single other company out there so [00:46:58] oh this is going to get interesting you might have seen this block lays off nearly half at start [00:47:03] because of ai i mean there's been a few things on backwards and forwards is this right is it not [00:47:09] right my honest take is jack is right [00:47:14] ai will will allow to reimagine the organizational charts within most companies now i want you to think [00:47:22] about spotify and how they produce those two different videos of like how agile's done squads [00:47:29] tribes guilds and all these things and all it took was like that video and every single company out [00:47:34] there bloody carbon copied it and like they ramped it into their organization it's going to take one case [00:47:39] study what we have right now we have jack toby at shopify and a few other executives they're basically [00:47:45] getting their organization and like a deck of cards throwing up in the air 52 card pickup style in when [00:47:52] it might kill their company but they might find the winning blueprint it's only going to take one [00:47:56] company to actually get this blueprint and everyone's just going to copy it through one case study pay [00:48:03] attention folks so for the last five months i've been traveling around uh from australia to south korea [00:48:09] new zealand san fran europe and just kind of having conversations with venture capitalists [00:48:16] and the question that's on every single lp's mind right now and putting pressure on their gp is why [00:48:22] does someone need to raise c capital now what is the point of uh pre-c capital previously used to need [00:48:28] money raise money to be able to hire the team to build your thing but now you can just build the [00:48:32] thing by expressing what you want to be built so the disruption is not just within our profession as [00:48:38] software developers and in product it's upstream in finance as well like what is the point of capital [00:48:44] if it's just a five-man show this is the question that's on everyone's minds there are answers and [00:48:49] there's nuances come find me and i'll go very deep into this because the question on people's mind is [00:48:55] like is software still investable it is but it has to be charged on a unit economics of outcome not [00:49:03] charging per seat that's legacy sass so for no particular reason at all every story needs a [00:49:09] punching bag i'm going to choose sap concur [00:49:14] now according to lincoln sap has a fixed overhead of 6 800 people that's 6 800 employees they have to [00:49:22] go for a j curve people transformation program they were built like this [00:49:29] most of the companies today are all being built like this the idea is you just like you get your [00:49:35] shippers and your builders and you just add middle middle management a middle layer on top of that [00:49:39] organization when you think deeply about this because the new companies are coming to market today [00:49:45] they are not wanting to do this and there's companies such as block that are willing to do some [00:49:51] wild experimentations to find out what is next and what is different because the question that's on [00:49:57] every executive's mind right now is how long does it take to transform 6 800 employees and do i have [00:50:02] enough time to transform but by the time my business gets disrupted by ai not necessarily [00:50:09] ai disrupts you but there's it's going to enable new competitors to market what happens when you're [00:50:14] you're a team players they're like it's going to take too long to do the transformation this company [00:50:19] i don't think this this uh this company's going to be able to do the transformation they quit [00:50:23] and they just create a fortin brand new company has anyone noticed that the minimum hire at the [00:50:28] labs these days are cto's like uh workday you're like the person who's in charge with the people [00:50:36] transformation within workday itself it's like nah i'm just going to get a job at the lab and then [00:50:40] i'm going to destroy workday in the reverse fashion oh so the question is why would you transform 6 800 [00:50:46] employees if you're not bailing ship you're thinking like why would i transform more because we all know [00:50:53] that smaller teams get better outcomes and here's a story from a founder in new zealand we're smaller [00:51:00] but effectively cut two-thirds by saying we wouldn't backfill folks for the developers not in the room [00:51:07] go have a talk with uh people in business and finance like this is the quiet thing that's not [00:51:13] being said aloud it's not necessarily there's ai layoffs as such just backfields are stopped i've [00:51:20] been stopped for quite a while for a while and it was from this founder was on the best decisions to [00:51:24] get rid of all the people who are sick of hearing about ai the detractors they're 20 people now down [00:51:30] from 60 and they're getting more output than they ever have before notice the date this is three years [00:51:36] ago folks if you might thinking about making changes to your organization response to ai you're not late [00:51:43] but you're not early you see this is going to be hard for people you think about the you think about [00:51:50] all the people who have done game of thrones social political activities this is one of the most [00:51:55] disruptive parts of ai is no person is going to give away their power it's going to turn into kind [00:52:02] of hungry games as it goes through this j-curve transformation program because as we figure out [00:52:08] whether this is even possible not that i'm advocating for it but there are a lot of there's there's a lot [00:52:14] of a lot of companies working on figuring out the right the design for this organization thing and [00:52:20] there's even companies in sf that are actually working on building products to enable this type of [00:52:26] thing we call it the ai operating system who deep i don't know where this goes i want you to think [00:52:34] deeply about it but one thing i know for sure is experience as a software developer today does not [00:52:39] guarantee relevance tomorrow you mean like software developers trade time and skill for money [00:52:49] if a company is having problems adopting ai that's a company issue that's not your own issue [00:52:56] right if if you're working for a company that has banned ai outright you should quit that company [00:53:01] put your family unit first invest in yourselves folks you mean if they're having problems with ai [00:53:07] that's a company issue the company's got to fix their own particular issues [00:53:14] i want people to think quickly i wrote this about almost a year and a half ago when i was a tech [00:53:20] lead at canva and this is my own personal journey and also for interviewing other engineers at canva [00:53:27] and we found that as we rolled out these tools people fell somewhere along this [00:53:32] this was me i was great i was like ai is not good enough prove it to me it's not hype you start [00:53:39] experimenting with it slowly and slowly and slowly and you go oh well i have a job in the future [00:53:45] and eventually you get past it and you start building with ai and then that you're a consumer [00:53:50] and next thing you know you start learning how it all works under the hood and then you're actually a [00:53:56] builder with ai now for the leaders in the room i want the question is how do you actually build [00:54:04] the bridge to support your staff across this chasm the other question is for leaders you might be [00:54:09] noticing why there's a line in it this year from the time i gave this talk last year it's simple [00:54:15] i don't hire people on the left of the line anymore there's a large pool of people who have been [00:54:21] curious i call it a curiosity test who understand how all everything all works this is the line this [00:54:28] you should be looking for this a senior engineer should be able to explain how ai works under the hood [00:54:34] if i was to ask you what a primary key is you're like jeff what are you doing you should testing me [00:54:38] like this used to be the the most basic question you ask your intern junior like what's the primary key [00:54:43] here's the database have fun mate don't drop it um but why is it when i ask the software engineer [00:54:50] what does an agent show me one build me one on a whiteboard they they freeze up or they can't get [00:54:55] in the specifics an agent is really simple folks it's this this is the big scary boogeyman that [00:55:03] everyone's scared about it's a wild true loop it takes it it takes the prompt adds it to an array [00:55:09] you send it off for inferencing and you look into whether it needs to execute a tool to automatically [00:55:15] copy and paste that response back and it sends it off for another turn it's really simple senior software [00:55:19] engineer should be able to be be able to explain this as a sequence diagram on a graph this is the [00:55:25] new bar from at least from my point of view for interviewing see how deep they can go on this [00:55:31] knowledge because it's a curiosity test i have a workshop for the software engineers who haven't [00:55:36] done this you can build your own cursor cloud code and uh pi like it's 300 lines of code it's really [00:55:44] simple build your own coding agent it is so simple it's going to be really interesting to see how this [00:55:50] all pans out folks like i don't know where this goes i don't think anyone does if anyone says they know [00:55:56] for sure they're selling you horse shit um because for a lot of people they think nothing has really [00:56:03] changed and this scares me deeply scares me deeply we're already seeing the the start of like the uh [00:56:11] anti-ai aspects in society people like oh it's using too much too much water it's no it's a closed loop [00:56:19] system like like it ah we're already seeing some of the outrage but uh punch up you see because a lot [00:56:28] of people are pretending that nothing has really changed but really ai is kind of borrowing under the [00:56:33] foundation and the safety net of many people and their family units you know they wake up you have no job [00:56:40] it's like why ai but really was it really ai was the person not investing in themselves [00:56:46] and stay relevant closing ponderos removing waste ins from your systems and processes is a bigger [00:56:55] accelerator than ai itself i have clients here in australia a banking one that had one git repo per ui [00:57:04] component in the design atom design library and atom this is stupid that is waste um if you have multiple [00:57:12] sources of truth you've got stuff in jira you've got other things in google docs and other things like [00:57:16] that that's also waste you know like establishing single sources of truth is really important also [00:57:23] this is interesting enough this is how you figure out who you should hire for engineering manager you should [00:57:28] be asking them what is ai broken in the systems and processes like do you use agile anymore how do you [00:57:35] not use agile what have you changed where was the waste what did you do what are the outcomes here [00:57:40] this is your leading indicator for engineering manager what to hire the old saying was ideas are [00:57:47] everything the ideas are worth nothing execution is everything but what does it mean if you can just [00:57:52] literally like when i want something i go to a company's website take a screenshot of their marketing [00:57:58] material rip a fart into my coding agent and i get that feature like i literally go hand handbag [00:58:04] shopping for features for sass things these days so if everything here has been inverted ideas [00:58:12] what to build is still very important and it is one of the hardest questions i've ever before that's ever [00:58:17] been but the idea that like execution is uh is more important like no no no ideas are more important [00:58:25] execution because the execution is now commoditized this is going to be really hard for a lot of folks [00:58:32] for the software engineers in the room you might have a conversation today it's like what do you do [00:58:37] oh i'm a golang developer i'm like cool do you use neo vim do you use intellij do you use vs code [00:58:45] and uh you it might be quite rude but i go none of that matters anymore [00:58:49] you ruby doesn't matter there's an identity that you speak to i work for this bank i work for this [00:58:55] tech stack and i spend years of experience it doesn't matter and this is one of the things that gets [00:59:00] people to get their oh moment and like get really stuck deer in the headlights [00:59:05] is all these things is these functions or these identity of who you are uh have been erased by ai [00:59:13] not just in software but forever nearly every single field and really it really it comes down into [00:59:20] have you been investing yourself have you been curious folks because [00:59:27] it is so important to really invest in yourself you're having ai rolled out in your company they're [00:59:33] sending you a message the message is just pick up the guitar learn how these things work under the hood [00:59:40] like there are many engineers who have implemented some of my ideas in the last year and they're just [00:59:44] instant promotions i've had many talks over the last 100 days where this has taken place [00:59:50] please build an agent be an engineer who was curious understand what a [00:59:55] engineers and the piston tool calls and everything else like that don't be just someone who's just a [01:00:01] complete consumer thank you thank you so much jeff uh i've seen that talk quite a few times i've seen [01:00:14] it evolve over the last year but well done uh and uh yeah one of quite a few australians will have on [01:00:21] stage in the next couple of days who are really having an impact like george are you going to move [01:00:25] to san fran or yeah no right good that's good to see good to see all right so next up we're going to [01:00:31] try the high wire act again and we have igor costa so come in igor hello are we can we have fun all [01:00:40] right hi everyone i'm sure it's good yeah we'll work this one out i must i kind of like here we are [01:00:46] igor can you hear me i can awesome all right so i have had the privilege of getting to know [01:00:51] igor over the last few months because uh i've been really interested in the technology he's been [01:00:57] developing uh in particular he's been doing some interesting work around evolutionary algorithms [01:01:02] which is something i've kind of had a 40-year-long obsession with uh so igor was a igor was a former [01:01:08] leader of github's ai efforts behind copilot and over the last year or so he has founded autohand [01:01:18] and they're doing a lot of really interesting work in this space that he's going to explain a little [01:01:22] bit about um he's going to talk to us about why our coding agents forget everything and the [01:01:28] architecture of memory uh so please welcome igor costa thank you [01:01:37] it's very kind of him appreciate it hello everyone um thank you for watching today and everyone that is [01:01:43] watching on youtube later i appreciate it my name is igor as john said um in today talk i'm talking [01:01:50] about like why agents forget things um but before i start how many of you have at least 100 agents [01:01:57] running right now none of you okay cool um and it's fascinating like like you're gonna see a lot of [01:02:07] exciting people here today but hey mom i think i'm gonna take i'm gonna copy what jeff said hey mom [01:02:13] hey wifey kids at home um so this is an advertisement and in melbourne right here um it says electricity [01:02:22] it's a massive new thing um and i'm very fascinated about the history and in philosophy and i think that's [01:02:30] what accumulated probably accumulated my whole experience in this space and we had that little [01:02:36] character over there i don't know if you remember about copilot we also have that little head span that [01:02:41] we have a logo and then we call a name we had that in the 1880 50s it took us probably 150 years to get [01:02:48] where we are today and then it's the same thing electricity gives how every household and the power of [01:02:54] countless workers without wages leaps or rest isn't that what we're doing right now [01:02:59] but the problem they're not that very smart so they're building stateless and that's why i left [01:03:05] github to build my own i said i think we're approaching the problem in the wrong way um and [01:03:10] it has to take a lot of courage because bootstrapping in the age of ai it's not it's not an easy game [01:03:16] it's very expensive so we built the first coding cli and the coding cli that we did was very simple [01:03:22] we did to optimize how can i run at least 20 or 30 instance of this on a daily basis and [01:03:29] i've tested all these you know all the frontier labs uh agents out there and they don't scale more [01:03:36] than 20. my computer gets completely useless like it consumes a lot of memory and i got hooked on that [01:03:43] and i was like i'm gonna build my own so you know you start a session halfway through the session [01:03:49] probably 10 or 15 message the agent basically like forgets what you do what's doing it's like bloody [01:03:55] damn it that's not what i want and we keep adding stuff we're keeping adding more stuff to it we've [01:04:01] added context window we increase response um and we're treating this as normal i'm not um like the [01:04:09] window for context science we've launched copilot like 20 probably 2021 in august we were the first one [01:04:15] to create like a generative ai product in the hands of people three months later was chat to pt um we've [01:04:21] increased from 4 000 tokens we were battling how to get feeding everything in the context to be able to [01:04:27] answer 10 suggestions on auto-completion um to a million and we sort of like plateau because if you [01:04:34] look at the benchmarks that you know other presentations did here we sort of like didn't go beyond that point [01:04:41] and and in context keep growing right but the problem is we fear things that we create and in the way that [01:04:49] we create things is like we're treating the same thing as context and memory they're very different [01:04:55] things and probably they're the same thing i don't know i don't know the answer yet i'm discovering i'm [01:05:00] curious like everyone else here um so to understand better let's talk about like the nine types i i've [01:05:07] probably researched this as much as i could probably already too many papers 75 to be honest and there's [01:05:14] quite a lot of them like we're we're still treating them as the working memory episodic memory semantic [01:05:19] memory but probably to compact this into 18 minutes and then continue with presentation i probably focus [01:05:26] on four so let's talk about a semantic major memory we the industry came up with this concept of like [01:05:32] agents.md you define the rules of engagement you define how you access things you define [01:05:38] the framework that you're using but turns out that at least seven out of ten they don't follow what [01:05:44] you write there because that's a limitation on the on the models and then we went to procedure memories [01:05:50] like you know what how can we increase the quality of our training data set at scale excuses i think in [01:05:57] my opinion is the most um social experimentation ever and putting hands of everyone say hey let's create a [01:06:03] skill we can collectively accumulate all your experiences memory is a accumulation of experiences [01:06:10] and we can train and get our models better it turns out have you seen the jump between the latest models [01:06:16] that we released this week because every is a new model the marginal it's they're all marginal gains right [01:06:22] so we're trying to scale i think in an accidental way we we've we've banalized the word scaling very much [01:06:30] um and then we talk about episodic memory there's by far i think probably the last two weeks that when [01:06:37] coding agents start having the idea of experimentation and memory but episodic memory it's something that [01:06:43] is very peculiar in coding agents because time is a dimension that memory always ignores [01:06:51] so you're working yourself with a bunch of agents it's easy but when you work in a group of teams [01:06:56] it's very hard it's a hard problem to feel because everyone has a different opinion a different different [01:07:01] different opinion means no consensus no consensus means drifting collapsing right it's really hard [01:07:07] especially when when you try to do this and if you hear that or you've been across this you're absolutely right [01:07:13] shut down your session because you are completely lost um so then comes the reflective memories that [01:07:21] there's the last one that is very important and i think that's the models now turn to realizing that [01:07:26] there's a shift in the industry towards the area um where you adapt you know what we'll be doing [01:07:32] differently or probably what happened during the session and then there's like what happened here [01:07:38] probably there are like three or five startups that are trying to solve this problem um but it's still [01:07:43] though memory will change all of this stuff and maybe we uh that's that's my philosophical question and [01:07:51] i'm a great observer of nature maybe we're approaching this in the wrong in the wrong way i believe [01:07:59] intelligence should be distributed not concentrated in four players not including ourselves i don't believe [01:08:04] that and that's why we're building something different so this is our coding cli i need to understand first [01:08:10] remember we've open source the cli um it's basically like a coding agent has the same functionality as [01:08:16] codex um cloud code and gemini um cursor agent but it's open it's transparent it's not p as well p didn't [01:08:25] exist when we started code our coding cli i think they started mario started three weeks later and great [01:08:31] concept as well but we put everything inside that was basically how i would like to make things and [01:08:38] one single agent you you sort of like we we've implemented this thing of like memory preferences or [01:08:43] user preferences from the very beginning so we started using basic format like there's nothing more um [01:08:51] i'd say faster than a file reading in your in your ssd there's nothing more like if someone's trying to [01:08:57] sell you uh an awesome kv database or vector database you're basically adopting sap um there's nothing [01:09:08] more faster than that i i've benchmarked this um and then we came with this concept at out of hand that [01:09:15] we open so it's called commander i think two weeks later charlie from conductor build you could create like [01:09:21] more now probably like 30 or four coding agents uh adis out there agentic development environment [01:09:28] and it's a we we've put it like the collective memory it was an experimentation and still an [01:09:32] experimentation this has been running our coding cli visions has been downloaded probably 300 000 times [01:09:39] and every day is 150 000 people using and the more you use the better it gets because it's accumulate [01:09:44] the memory um we've put those four memories that i've discussed here and then i discuss about the shared [01:09:49] memory so how can i make these things like a single source of truth right and i mean how can i make [01:09:56] agents to collaborate like human dues with a consensus and it's believe it or not the same behavior that [01:10:02] we've observed in humans or you know in organization charts happen to agents as well they don't get they [01:10:09] don't agree with each other um so i wrote this paper so this is a paper called um agent spawn the idea is to [01:10:18] mimic what happens in nature when you have a kid you genetically share some of the traces or your [01:10:26] aspects of physical ability or intellectual ability to your kids so they have 50 50 from your partner [01:10:32] and and so on so the agents instead instead of like you try to load everything in the context you load what [01:10:41] is task at hand you reflect upon and then you spin a new version of that with a different opinion so then [01:10:48] you can sort of like guide them how do you how to do this and why i did this um it's very simple [01:10:55] because i'm being very obsessed with the idea of like long horizon coding agents running in a very very [01:11:00] long time more than 48 hours um it turns out there's not a lot of people doing this and i think probably [01:11:09] were we are still discovering a lot of stuff um and it was very simple we wanted to detect the drift like [01:11:16] if if the model you know in a random way tries to mimic the behavior of the context that was given in the [01:11:24] beginning and then gives the answer that you're absolutely right probably was not the right right [01:11:29] thing so and we transfer that memory to a different agent and then maybe that agent doesn't have the [01:11:35] same ability or same skill sets or the same of instructions that we have before and then [01:11:42] we had another problem we had the problem with collapse um so like i said they don't come up with [01:11:48] an agreement um it's a very simple problem you know in a nutshell but when you see them working [01:11:55] together it's quite hard so how do you do this so we've we've created like a memory sort of like a [01:12:01] model system so we abandoned the idea of large language models we only use large language model as a [01:12:07] a dependency not as a first class citizen i saw the notion they're showing the logos of the models it's [01:12:13] kind of cool probably it's like in a in a six months in a year's time they're not going to use that [01:12:17] anymore because it is relevant um we are moving so fast so then i said myself well what's the best idea [01:12:25] to validate this against the evolve well let's use the taste the linux trouble has so i want i run i [01:12:32] built this application we ran this to for more than 10 months as you can see here it's still running [01:12:37] we are still trying to migrate the linux kernel to rust so there's 10 months running there we haven't [01:12:45] been successful i'm sorry linux not today but we are getting closer um when we started the experiment [01:12:52] we probably had only 12 percent migrated successfully um and in the secret sauce behind this is from an ai [01:12:59] lab in that's why we abandoned the yellow line in the model from an ai lab in singapore um they came up [01:13:06] with this new architectural hierarchical reasoning model so the the memory is the model um so instead [01:13:13] of like sort of like trying to do things in a different way or incompatibleized way it's one thing [01:13:19] right so you can do with smaller dense models based on your data sets that you already have which means [01:13:25] like your customer data or any transactional data that you have your sessions um you can do 20 million [01:13:32] parameters model up to 2 billion parameters model doesn't mean that we can train this much faster [01:13:39] so our cycles for training now it went from a weekly basis checkpoints to probably like three hours or [01:13:47] five hours depending on the day and we can see a progression of that um but there's we we really think [01:13:54] that it's like okay you solve the problem no we are not there yet we are very early on the on the journey but [01:14:00] there is a lot of i'd say um there's a lot of evidence there and in in the absence of evidence [01:14:10] we use the gut feeling right as humans but when you have evidence in the data you sort of like know that [01:14:15] might be the way that you should go and i treat this with with you know skepticism i i don't take it [01:14:22] granted like okay the data is telling me this probably we should approach in a different way but because we're [01:14:27] using now in an agentic way experimentation is very cheap so when i start training my own models which [01:14:34] is probably half a million dollars right there sunk um nowadays like it costs you 500 to train [01:14:41] right so the cost of ownership of the cost of intelligence is dramatically reducing and we believe [01:14:48] that you should own your ai we believe that the whole stack should be in your hands [01:14:53] because that's how we create societies so here's some of the challenge that we couldn't face or we [01:14:58] couldn't fix it memory correctness um it's a very hard problem we haven't managed to sort it out yet we [01:15:05] are still researching this area i can't publish the paper yet but we are we are close to solve this [01:15:11] the second one it's the memory as a first class citizen for training signals so models learns memory stores [01:15:19] they're very separate ways how can we make sure that when we do for example multi-lower on on [01:15:26] recursive language models can be adapted much quicker than than you know you're trying to think [01:15:36] yeah i think this presentation um and that's my story um i try to make it as quick as possible [01:15:48] and simplify as i can but before i finish off obviously i still have a few more minutes i i love [01:15:55] the idea of like uh how do we interpret the world and as i was walking through here did you guys hear [01:16:00] the aboriginals talking to you [01:16:06] the talk was to share a little bit from the trenches what we experienced when we started building full [01:16:34] duplex voice agents with python and the sort of challenges that we had and when we were scaling [01:16:40] for countries like india where they were looking at cost effective means of potentially calling [01:16:45] probably 10 million users per week or even high value customers where even the slightest glitch or pause [01:16:52] would really ruin the voice user experience and and this these are hard learned lessons and there's a [01:16:58] the reason why we chose rust uh i could literally complete this entire conversation by saying that [01:17:03] hey python's conversational phase or essentially map that to a conversational phase and then use that [01:17:08] phase to control how the conversation flows through right pretty simple um the idea is that uh [01:17:15] yeah uh state machines and regex go really well together you don't need an llm to constantly operate on the [01:17:21] the transcript uh you could essentially have a regex of a million odd patterns to essentially [01:17:26] do what a small capable llm can do this is one of those things that we completely forgot like for example [01:17:33] if you want the customer to repeat after what you just said as an example you're collecting debt you want [01:17:39] to quote a regulation and say that do you understand you've got to repeat to consent [01:17:43] these are sorts of things that you can compute with regex in a conversational flow and it gives ultra [01:17:49] low latency you don't need to waste an llm call to do that right and bring in a certain amount of [01:17:54] determinism into a really capable model that can handle arbitrariness right um so it needs to be [01:18:00] deterministic where it must be example if you want to model it as a dag let's say it needs to take these [01:18:06] three steps before the fourth step needs to be taken you need that kind of control loops that's the sort of [01:18:12] capabilities that we're building no sdks that these frontier model companies are building have this [01:18:17] because they they've not seen too many customers deploy this at this scale what we're doing is [01:18:22] seeing the initial bunch of customers who are really pushing the limits of this eventually you'll see [01:18:27] these capabilities sort of absorb get absorbed back into the product but right now it's really the wild west [01:18:34] so i'm not saying python adk is bad it it is not suitable for a real-time hot path if you want to use [01:18:40] production voice agents at scale with a millisecond budget you know you've got to move to rust [01:18:46] and the idea is that the latency budget is the architecture here for voice agents you do not have [01:18:52] any uh you know sort of uh let's say uh headroom for you to absorb like text agent where the model's [01:19:01] probably thinking the user's probably reading you don't have any ux forgiveness it's completely unforgiving [01:19:06] so therefore we chose rust and we rebuilt adk and we sort of were able to make a million calls a day [01:19:13] outbound at scale receive over 100 000 calls a day inbound fairly demanding contact centers on vernak [01:19:20] languages in india but it's sort of a testimony to what sort of scale the system can sustain [01:19:27] that is it i've shared the repo along with the crates link please do take it around for a spin [01:19:33] i have a contributing guide in there i'm happy and i'm looking for contributors from the community [01:19:38] we will not change the license type it will be apache 2.0 and it will continue to be apache 2.0 [01:19:46] thank you awesome thanks so much famshi uh now normally at this time i spend a whole bunch of time [01:19:59] explaining everything that's happening and thank and we just have so much going on that i'm not going to [01:20:04] do that i'm just going to say a couple of things to make sure you get the most out of the event but [01:20:09] the key thing is probably just look at those qr codes so let me go through them so there's a live [01:20:15] schedule and if anything changes we'll be updating that and that will also have news that pops up [01:20:20] during the day if anything comes up if you've not yet logged into attendees.webranch.org that's the [01:20:27] second one we're talking about making connections so we have an opt-in directory of everyone here [01:20:33] right so if you opt in uh you can see everyone else has opted in and you can see who's here even make [01:20:39] connections and get hopefully surface people here you might otherwise not just bump into [01:20:45] you with serendipity um we've also with Jana who did the opening video a whole bunch of great videos [01:20:51] that are that are happening on social media she's interviewing people and so on we put together a [01:20:56] series of quests uh and the idea behind that they're not pointless exercises they're actually [01:21:01] things that will hopefully help you break the ice with other people maybe help you reflect on the talks [01:21:06] that you you see and your experience here and we've got a couple of great prizes we've got a couple of [01:21:12] segway e-scooters uh as prizes for people who do those quests uh so all the details are there if you [01:21:22] log into attendees.webranch.org and go to the quests or that link takes you straight there and as i said [01:21:27] it's really all a big part of it's about helping kind of you get the most out of this experience so huge [01:21:33] thanks to Jana for helping us put that together and then who has their own like core or agent that [01:21:39] they kind of carry around with them all right well you can bring them to the conference right so that [01:21:44] final uh qr code there uh basically we've set it up so there's a live stream of all the captions [01:21:50] uh so basically we do a real-time captioning and that's available as a live stream and also what we do [01:21:58] is we capture the each frame or every few seconds we capture a frame we give it to Gemini uh Gemini [01:22:06] then goes and converts it to structured uh content uh it'll do text descriptions it will have all the [01:22:12] text there any link becomes a link everything's available to your agent right so essentially your [01:22:17] agent can be at the conference with you i've written up what we've done there so it's all running on [01:22:23] cloudflare by the way people are really sleeping on cloudflare is an amazing developer program uh [01:22:28] platform now they are a partner of the conference but actually i've been using them for a long time [01:22:32] and incredible platform right so bring your agent now this is totally an experiment this is very much [01:22:37] alpha we've been updating it as we go people on the stream they've been saying hey it's been saying this [01:22:43] i've been feeding that back into my coding system that's been updating so it's very uh real time [01:22:49] uh it's a bit of an experiment but hopefully you'll enjoy that and finally we've got like a ton of [01:22:54] amazing partners you probably saw when you got into zinc so here's what's going to happen we have some [01:22:59] food and coffee up here uh we've got an hour break and we're going to come back for ai engineering is in [01:23:06] this cinema software engineering is in cinema too then if you have a leadership pass we have the leadership [01:23:13] sessions down in the swinburne it's just on the on the entrance where you came in we'll we'll have [01:23:17] volunteers right underneath us we have a whole bunch of catering and coffee in particular you have [01:23:23] specific dietary requirements not not vegetarian things but if you've asked for very specific ones [01:23:28] they will all be served there uh but the majority of the catering and the coffee and all the expo is [01:23:34] down at zinc so we're given an hour for each of the breaks give it time so if it looks too busy here [01:23:39] please just go straight down there we also have what we're calling the hallway track so that's going [01:23:43] to happen during the breaks and even some of it during these sessions there's a bunch of uh sponsors [01:23:47] who are coming to talk but also we've got some invited speakers all of this is in your schedule [01:23:52] on live.webdirections.org so take advantage of that as well and the last thing is just a little [01:23:57] encouragement you to go and visit all of our wonderful partners some are community some of the [01:24:02] startups some are agencies and then some really well-known names including our amazing sponsors google cloud [01:24:08] and google deep mind if you check in with all of them one of the people who checks in with everyone [01:24:13] will win a mac mini so you can safely hopefully keep it keep your claw isolated from anything else [01:24:20] uh and run actually probably run lots of claws on your mac mini anyway so there's a whole bunch of [01:24:25] stuff there uh let's finish up now before we head downstairs or head over to zinc by thanking all of our [01:24:31] wonderful speakers from this morning that's it uh we've got the wonderful volunteers in the red t-shirts [01:24:40] our engineering they're here to help if you've got a question just ask them if you need something [01:24:44] ask them they'll get back to me or rosemary or jenny who are taking ultimate responsibility for [01:24:49] everything we'll solve any problem to the extent we possibly can thanks again take a great back break [01:24:54] remember it's about five minutes walk up from zinc we'll be back here at exactly 12 30 so one hour [01:25:00] and one minute we've done pretty well this morning uh until then go and enjoy and see you back in an hour [01:25:15] so [01:25:45] so [01:26:15] so [01:26:26] you

AI Engineer Melbourne 2026 Keynote Livestream — Day 1

Related Transcripts from AI Engineer

Transcribe Any Video or Podcast — Free