May 14, 2026

Taking Down Prod with Eve

Taking Down Prod with Eve
Apple Podcasts podcast player iconSpotify podcast player iconYouTube podcast player iconOvercast podcast player iconPodcast Addict podcast player iconPocketCasts podcast player iconAmazon Music podcast player iconiHeartRadio podcast player iconPlayerFM podcast player iconCastbox podcast player iconDeezer podcast player iconCastro podcast player iconGoodpods podcast player iconRSS Feed podcast player icon
Apple Podcasts podcast player iconSpotify podcast player iconYouTube podcast player iconOvercast podcast player iconPodcast Addict podcast player iconPocketCasts podcast player iconAmazon Music podcast player iconiHeartRadio podcast player iconPlayerFM podcast player iconCastbox podcast player iconDeezer podcast player iconCastro podcast player iconGoodpods podcast player iconRSS Feed podcast player icon

If you're a platform engineer you've had to exit vi at least once. Eve is stuck with it for the rest of her social media life. She's a Sr Platform Engineer and obviously knows a ton about Kubernetes and CI/CD. Enough to make you cry laughing. We got to chat with her about what it's like to be doing s̶y̶s̶a̶d̶m̶i̶n̶ ̶d̶e̶v̶o̶p̶s̶ j̶e̶n̶k̶i̶n̶s̶ d̶e̶p̶l̶o̶y̶m̶e̶n̶t̶ platform engineering work in the age of AI and what people should stop doing.


Chapters

00:00 Introduction
02:37 Eve's Journey into Tech
05:22 CI/CD and Kubernetes
08:08 Psychological Safety in Tech
11:01 The Importance of Documentation
13:28 GitOps vs. DevOps
16:09 Platform Engineering and Developer Experience
19:01 Best Practices in CI/CD
22:09 The Role of AI in Development
24:39 Advice for New Engineers
27:19 Closing Thoughts

Welcome to Fork Around and Find Out, the podcast about building, running, and maintaining software
and systems.
Hello, everyone, and welcome back to Fork Around and Find Out.
I am Justin Garrison, your host, and with me as always is Autumn Nash.
How's it going, Autumn?
Struggling.
Like, have you been sick more lately?
Because I'm like, I've never been sick this much in my entire life this year.
Like, I've been so sick.
Even working from home.
It's the kids.
It's the schools and the kids.
It's just germ factories.
I don't even go outside, Justin.
Maybe that's the problem.
Maybe some grass would help.
Whose side are you on?
On today's show, we have Eve.
Eve's a senior platform engineer, and I think some people might know her better as help
me exit VI on pretty much every social platform.
Welcome to the show, Eve.
Thanks for having me.
Yeah, I've been following you since the TikTok days.
I haven't been on TikTok for a while, but when I found you on TikTok and then I found
you again on Instagram, I think your skits are hilarious and topical for me, and I just
love your insight.
So I wanted to reach out and have you on the show.
Well, I really appreciate it.
Even everything down to the user manual.
Also, your shirt is really loud, Red.
Your shirt is really rad, too.
Thanks.
And I know you talk a bit about Kubernetes, you talk a bit about cloud, you talk a bit
about platform and CI.
Can you give us a little background on what you've done in the past, like kind of how
you came into tech and what you're doing?
Yeah.
So how I came into tech, I had a whole career path before tech where I wanted to be a diplomat
and that I decided not to do that.
My formal education is in mathematics, so I moved back to the States from living abroad
for several years and tried to look into what can I do with a math degree that isn't becoming
an actuary or an accountant.
And I saw a YouTube video that was using machine learning to estimate coffee prices in response
to drought conditions in Brazil or something, and I was like, I want to do that.
So I went like the data analysis, data science path, which got me more into data engineering
at my first job, which was like a very small early startup.
And then the person who had run our Kubernetes cluster and who had managed our deployments
left and I stepped into that role and they were like, this is your job now.
I started my TikTok.
This is actually a fun fact is that from my TikTok, I got my next job in tech, which was
at the New York Times.
I was there for four and a half years, but it started with someone leaving a comment
on one of my TikTok videos saying, I'm hiring DevOps engineers at the New York Times, add
me on LinkedIn.
And I was like, this is fake.
Yeah.
And it wasn't.
Wow.
And then I was, I was the, um, I worked on our like CI CD team.
I was the technical lead and subject matter expert for CI.
And then I was toward the end of my tenure, the engineering lead for application delivery
over the company.
And then I left.
And for the last three months, I've been a senior platform engineer at like an AI health
tech.
Um, I guess like late stage startup, later stage.
What made you want to start doing skits on like TikTok and Instagram?
A desire for attention.
And also just that there was a lot in tech that I found challenging that I have always
had the desire to laugh at things that I find challenging or find humor in them.
So taking sort of frustrating experiences or having to learn all of the things all at
once all the time and turning that into a skit.
And then people liked it.
So I kept making them.
That's the realest answer though.
Like people will be like, like they'll come up with some like answer that you're like,
that sounds nice.
But you know what I mean?
Like that was like a real answer.
But also I don't think you know how much like, I think that really helps when you're in tech
because things can be so frustrating.
Right.
And a lot of times we're like taught to like pretend like you know everything and like
fake it before you make it and like pretend like you know all the things and it's impossible
to know all the things.
So like I think sometimes when you're having a really frustrating day like I was talking
to a bunch of college students and I was like it's like mostly failure and then one good
win and then failure and like a couple good wins you know and like when you see someone
else who's like this process is horrible or like this is like being able to relate like
sometimes like Justin will send me like an Instagram meme and I'm like you made my whole
day because I was so frustrated in that moment you know and like you just feel so seen.
So I think that's actually like I think those do a lot more for people than people think
it does you know.
Yeah.
My role puts me very close to production all the time.
And so I feel like my life is just living on the highs of fixing a really cool bug and
then the lows of causing a production incident just over and over.
So you have to find humor in it.
A circle of life right there.
How would you say I mean this is you're now like five or six years doing this from from
probably a pile of yaml that is now your responsibility to being able to kind of guide
companies and plan things out and have a little more influence.
How has how has things changed for you in those areas from when you first started.
What the heck is Kubernetes to now you know platform engineer like building and planning
this stuff.
Yeah.
I think the the most recent sort of revelation that I had is that like really nothing is
ever going to look like the textbook.
I think for a long time I was like this is what CI CD should look like and we should
all be doing get ups and we should all be doing trunk based development and we should
all be making really frequent tiny changes that just get released to prod.
And then when I joined this company one of the reasons they hired me was it was for my
CI CD expertise and sort of like standardizing their pipeline deployment workflow.
And I came in and I was like oh we're not we're not going to be able to get there without
a bunch of steps in between.
And sort of people talk about incremental change a lot in software development but just
just acknowledging the incremental change of the platonic ideal of a Kubernetes cluster
and no configuration drift and the the actual system that I think going back to not everyone
can know everything.
I think a lot of companies do ask application developers to also know a lot about infrastructure
a lot about Kubernetes a lot about like their build test deploy pipelines.
And I can't come in being like well why didn't you like architect this better because they
architected it as best they could.
And I'm also just architecting it as best I can.
And I think now where I'm in a position where I'm making these decisions largely by myself
like I'm writing design docs and taking feedback and talking to folks but I'm now the one sort
of executing and moving and changing and I'm like man I hope this doesn't blow up in my
face in some way down the line.
I've been doing this long enough that I always expect it to blow up in my face or like that's
that's kind of one of the I guess my mindset changes over 25 years of doing this was it's
all about it's all a matter of how big it's going to blow up my face and at what timeline
it's going to blow up in my face.
Right.
Like at some point I know it's going to sound bad though but I think that sometimes because
you're a dude when it blows up in your face it's okay.
Oh yeah.
Like when you're a girl and it blows up in your face they're just like we knew you couldn't
do it.
You're just like dude like this is a normal hiccup.
One hundred percent true that yes I have the benefit of the doubt but also I try to give
other people that same leniency and benefit of the doubt.
Which is important because I think you do like that's like what you the article you
brought that a few months ago that said like the safety of your work environment and being
able to like experiment and like fail you know like I think psychological safety is
the is the key indicator of high performance teams.
Yeah absolutely.
And and having that psychological safety of like this is going to fail at some points
and also it's always easy to like dump on what someone else did from the past and say
like oh why didn't you just design it better.
Right.
It's like well like how things get design changes tools change people's knowledge changes
over time like all of this stuff is growing and changing.
And it's like sometimes I look at the Git commit and I'm like oh that was me.
I think that was I did do that.
Git blame gets real sometimes like but I think what Eve was saying something like I think
the longer I've been in this industry when I am interviewing with a team and you're kind
of interviewing them back or kind of just talking to people at conferences some of the
stuff that you've just mentioned just how I know that I want to work with someone when
they talk about how they you know like have done things wrong or how they've realized
that you're coming into this new environment and you're trying to figure out why things
are done this way and like how you're going to get it from point A to point B and you
know it's not going to be easy because when somebody comes in and they're like it's going
to be so easy we're going to rewrite everything I'm like okay you're one of those people like
when people come in and they don't ask why something was built a certain way or they
just think they're going to like get a wand and magically make something completely different
and I'm just like you didn't ask enough questions to be confident in that idea you know what
I mean so everything is easy that shows your expertise except and then you just end up
missing that important context of why those decisions were made or how you're going to
get from point A to point B and then just being like let's switch it over that yeah
that's never going to go wrong totally even I were talking in the pre-show that like all
the details right you're finding these obscure little bugs that are affecting you or your
team in various ways that are buried in a github comments or or comment in a code or
something like you have to find those things and know they exist it's like being a detective
that's what cracks me up I was reading one of the books that your wife sent me and like
I was thinking about like being an engineer is kind of just being a detective like there's
so much context that you have to get and you have to know what context you need and how
much to get into the weeds and how much not to and it's like I think that's the fun part
for me but I was thinking that's how you know Eve's good at her job because she was like
how do I get from this to that and we know that like once you start working with production
and scale that you're like the textbook looks like a baby compared to what I just got thrown
into like I remember when the first time I saw Jenkins pipeline that looks so like ridiculous
that it was like the class that I was taking like just they were not the same beast you
know and you're just like now you really have to figure it out so I think that's how you
know that you're really good at your job because you like walked into it and you're like oh
no I also talking about obscure bugs this was this was not an obscure bug this was in
the documentation and just missed it is that with this new so I created a new sort of build
once deploy many style of deployment where you just promote artifacts to wherever you
want them in Argo Argo picks them up and I had a team come back to me and they were like
well how do we do a hotfix and I was like oh you don't do a hotfix but I have to support
hotfixes so I was like well you you check out like the last time the commit of the last
production artifact that you built and then you make a PR on that and then you take that
build and you you ship that and then I caused a production incident telling teams to do
that because it turns out that when you use like the github shot in a in a github action
on a PR it creates a merge commit with main and just merged whatever was in the tip of
main with this PR you have to use like the PR headshot and I was like how was I supposed
to know that and then it was like literally in the documentation for github action but
see like but be real nobody reads all of the documentation you have to be able to scan
it and know what you need and you might not have been looking for that thing you know
what I mean but yeah I was I was going through it with a co-worker and and he was sort of
like well they don't say that in the documentation then he was like and then he was like well
they could have surfaced that better it's like thanks for supporting me that that's
a good co-worker okay like you need someone that you can cry with you can that's your
your double eyes to look over your PRs and documentation and be like me too bro you know
what I really love about that is I bet the person who wrote that line in the docs also
had the outage they're servicing it exactly where they think it should be because they
already broke it and they're like no one else will ever do this again because I wrote
it down and no it's not how it works it's funny because I feel like people keep saying
like well we don't need to build documentation anymore because it gets outdated and I'm like
like I feel like we've got like lost the art of documentation we just assume that like
it'll be there or that like AI will write it later and I'm just like I think that we
forget how important those things are like the amount of times I've heard that like with
people building things in production just like like you you still need to write things
down like I write things down for myself I will write the code and then forget about
it later so I'm just like
Eve I'm curious your opinion on the the current state of infrastructure deployment and sort
of especially how it relates in Kubernetes is kind of this weird way of doing things
but how do you feel the differences are between something that would be traditionally we call
CI-CD something we call GitOps and something we call platform engineering where do you
draw the line between those things and how do you use a definition based off of that
face you're making like it just so man I think it's hard or harder to do kind of like platform
engineering in a GitOps way where at least when I think of GitOps I'm thinking specifically
of something like Argo CD like watching and reconciling with whatever the state in Git
is and when I think about like Terraform I'm a lot less like you're not really watching
something you're probably triggering triggering with a with a merge or with a promotion of
some kind but I would say that oh this is a very hard question for me
I'm really curious you're in the trenches doing it and like I I come at it mostly from a like an
academic definition side of things where like the the open GitOps has a open I think it's like
open GitOps dot dev or whatever like they have a definition of like what is GitOps of how it was
defined how we've works and Alexis like defined it back in the day when they coined the term and
it was different than CSEs and DevOps what was that what's the difference between GitOps and
DevOps I actually gave gave a keynote several years ago at GitOps con that was called the
difference between DevOps and GitOps is writing it down but it was a play on Mythbusters the
difference between like science and screwing around is writing it is write it down yeah
absolutely absolutely I came to ask this question yeah it was just us taking Mythbusters
gifts and and putting like text behind them and being like use the main branch
because it relates something like no I just think those are the best talks because it
relates something interesting to like what you're trying to teach so going back to the
question I would say that like platform engineering is the foundational build like
building blocks that then allows app developers to do CICD because I think CICD relies on
infrastructure being as close as possible to each other across environments so you do want
something that reconciles like like Terraform that watches state so that if I go in and I
create you know a new ECR repo or I change some settings on on something I don't know I can't
think of examples off the top of my head but I go in I change an auto scaling group in AWS
that that gets overwritten so I'm not expecting that in staging just because I've like manually
changed it in dev and then the next time I sort of say like Terraform make everything nice again
and make it as close as possible to all of these environments so then I can have a more
predictable application deployment experience I think that really used to bite people when
there have been a bunch of manual changes that weren't documented anywhere or that were a part
of like testing or debugging and then forgotten about and not updated that that make it like
really hard to do application deployment well and I guess like infrastructure can also I think
reach into like monitoring but I think of it mostly as as sort of like the the building blocks
of getting you know Argo set up Argo like app of apps or project to projects and and that sort
of configuration your Terraform your basics there that you can then sort of CICD application code
on top of that's really interesting your distinction here which if I'm understanding
what you're saying correctly like the platform engineering piece is like what the developers
the application developers are interfacing with behind that is usually something like an Argo
which has a as a closer reconciliation loop to what they're doing right it's closer to
the applications it's doing the application deployment but then the CICD the Terraform
piece is underlying the foundation for the Argo and so it's like you have these layers of
the further away from the developer you go the less frequently you have to change it necessarily
and in the more I don't say stable but you don't necessarily need Terraform in a for loop right
Terraform can be we only change this when we we update ECR we don't need this reconciled every
minute right but the application and the Argo the application the the containers however you're
defining that in Kubernetes that is hey the dev we don't waste the dev's time so we need to make
that they also get that loop to get it faster and that is where web hooks and constant pulls
come into play yeah I think I think platform engineering and platform has really really
today I think when people think of platform engineering they think a lot of like an
internal developer platform so like a developer comes to the platform team or a UI and they say
I want to make a new app and the platform team is sort of like here you go and it's all like nice
and packaged and it's like you don't have to worry about like setting up a namespace or setting up
like you just have to worry about like your application code and then we'll integrate it
with the cloud or with Kubernetes or set up all your Terraform hopefully it's documented too like
that's always like every platform like one of the best things I did at one of my previous jobs was I
said okay here's an example applicant I wrote the app you deploy it on our stack and I gave everyone
three days I was like you have to deploy it the entire thing and like we it was a platform team we
had three groups and we said okay like I did the infrastructure we had a deployment team we had a
monitoring a CI team and we're like everyone knows their piece really well but no one knew everyone
else's and we didn't document it enough and so as I was like here's the app you figure out how to
containerize it you deploy it and not a single person out of all 12 people managed to do it in
and I was just like that's eye-opening because we assume our developers are figuring out and we
don't even know how the whole thing goes together to deploy it in that amount of time and uh all
back to documentation right like you should be writing those docs to give people like you can
give them a package that says here this should be easy but everyone has something they need special
or different or some configuration they don't know where it exists yeah I think an issue that I ran
into at my new company is that there wasn't a really good way to do rollbacks and so a lot of
teams were doing rollbacks of their like kubernetes replica sets via the argo ui but then because it's
argo it would just redeploy whatever it was in git and it would be like haha and I'm like why is that
even there in argo but so teams were then like turning off autosync but then they'd forget that
they turned off autosync and they'd be like hey we made changes but those changes aren't showing up
in argo and I'm just like oh man you really when you're working in a platform team when you're
trying to focus on developer experience you have to make the correct thing the easiest thing to do
I think that's really important that I I could write like a super complicated like cicd process
for deployment and rollback that's like beautiful and that I could be like this is a textbook and
I'm gonna take this to conferences and no one would use it they'd all be like we hate this
we're doing the old thing we'd rather face the argo autosync issue so you just have to make it
whatever it is it has to be easy to understand and easy to do or else developers aren't going
to do it and I don't blame them for that I also try to do the easiest thing possible at all times
how much do you think that comes down to just what's the default right like that like the
default setting for a thing I always heard this the phrase the tyranny of the defaults right
whatever is the default saying is is what is put on people whether good or bad and and if if the
you know if the default is don't have sync no one's going to have sync right even if it's the
best you know the best way you should use it yeah I think I think that's very true I think
there are things that that I've I've seen like across my career that I I sort of come in and
I'm like well if you want to do it this way like why are we using argo like if you don't care about
syncing like let's just deploy to a kubernetes cluster and we can push new kubernetes manifests
and like you're gonna have the same result except you're not gonna have like a little argo ui with
with all of your objects but like why use argo or I've worked with teams who had like a canary
rollout but there was no kind of like testing on that so they just had a rollout over like say
three minutes and they were like well whenever we have to do a rollback or a roll forward then we
have to go through this rollout again and it really slows us down and I was like why do you
have this rollout like it's not it's not doing anything functionally worthwhile that you get
with a canary where you're like testing some of the traffic and making sure there aren't issues
and then like rolling back I'm like if it's just like three minutes to to have some gates on your
replica set as it rolls out like kubernetes does that or like argo does that automatically with a
new like it makes sure the new one can come up before it brings the old one down so why do you
want canary and they're like oh well like canary is the the best practice the like fancy thing to
do and I'm like yeah but you're not like it's not doing anything for you yeah sometimes I think
why do we have this is definitely like a staff level question that's like like hey why is this
why does this exist and no one answers it like yeah you just got a promotion right there I think
that sometimes people get so in love with either the academic version of something or like the
this is how we should do it the fancy way that they don't even ask the questions like that or
just in general like sometimes people will make this really complicated version of how to solve
something and you're like you could have done that with a cron job and some like glue yeah like
I mean kubernetes in general right like that's that people people deploy kubernetes just because
it was the guide right they read a guy in a blog post whatever and they're like oh this said do
like it's a wordpress blog like just you can put that on a vm like it's okay sometimes I almost
think that that's why I like I enjoyed platform and release engineering more than just software
engineering because platform and release engineering is like how do we get this done
and it needs to work and sometimes like working with just software engineering is like really
theoretical and it just you go in circles for so long that I'm just like can we just build something
that works yeah the the bridge between code and software is super fascinating of like when when
someone wrote something and then it reaches a customer is is a very complicated mess and then
sometimes software engineering can be so disconjoined from actual customers after being
a solutions architect or a pm you know so it's like you just see how like sometimes these jobs
and these titles make you so siloed and it's so different how you interact with code and the actual
shipped product you know what do you like I guess what do you think now I mean you're obviously
been more in infrastructure and you know you're very successful at that but how do you see like
I guess the industry different from going from like more of like mathematics and data science
to like infrastructure like is there anything that you weren't expecting or anything that you
end up liking more now that you're working with more platform and not so much like data and numbers
I don't think so I think what something from when when I got my mathematics degree is that I had to
do some computer science courses and I absolutely hated them and I was like I never want to code
ever I think because mathematics you have it's it's very clear how to get from point a to point
b which is you have your axioms and your laws and your theorems and then you're like proofs and I
think coding I went into a coding class and we were trying to like solve like trinomials or
something and I was like I don't know where to start and my professor in office hours was just
like just start somewhere and I was like I can't do that like what do you mean just start somewhere
and I do feel like platform engineering has has more of that structure still that's like closer
to mathematics than traditional like back-end development where you're just kind of like how
am I going to do this and there are algorithms and data structures available but you just kind
of like decide what you want and start going for it I do think that platform engineering has
has a little more of that like lawful structure that I liked from mathematics but I'm not I'm
not sure there's anything that has been surprising to me maybe I was I was studying for some like
data data structure and algorithms questions like when I was interviewing for a new position
and I did have a moment where I was like oh man algorithms are just math I was like I never
thought of that before this is just graph theory um that might be the only the only sort of like
wow moment of something that like probably should have been obvious to me but had a different name
so I figured they weren't the same different domains different names different terms for the
same sort of thing right like the implementation might be slightly different but yeah it is just
it's just the same thing that you were already learning you write it down to tell the computer
how to do it yeah and I learned math backwards yeah because I was like math seems scary I always
thought math seemed scary until you like actually like I learned it backwards from learning computer
like engineering concepts and then I was like this just like rules and structure it's funny you
mentioned the computer science class too because it I was at university and the only class I ever
dropped was a systems engineering class because I literally had no idea what they were talking
about it was it was specifically Unix like they were teaching Unix in the early 2000s and I went
for two weeks and I was like I haven't understood a single word this professor said like I have to
drop this class and it was just so foreign to me because I at the time I didn't even have a computer
like I had no no computer of my own I was never into programming I didn't know any of this stuff
and I was like I don't know what they're saying and so I just have to get out of this class and
maybe I'll learn it later but then coming at it from a different angle of a very practical
standpoint of like I need to get my wi-fi drivers working on Linux and I have no internet access
right now really back in the day like that was the only way I could get online no smartphone to look
things up and I'm like I have to figure this out from man pages is a different type of systems
engineering that was more practical and because I could start somewhere it made it much more
approachable for me to say oh I know how to do this side of it yeah I I dropped my one in drop
but I hate it my first computer science class because it was like web design and it was dream
weaver and I absolutely hate it I think I took it's like when you get a math degree at some point
if you're doing applied math I guess I guess not but my my degree was like much more theoretical
and I remember I took it must have been number theory where we learned about like prime number
encryption and I was just like when am I ever gonna need this and then I was like oh awesome
are there any things that you see people do that are considered best practices that you're like
that's just dumb like don't do like you mentioned promoting artifacts like from one environment to
another and I've always known that as a like oh yeah don't don't you don't have to rebuild because
the artifacts the same you should change the shah in your manifest or something like promote it that
way or retag it and I've always kind of felt like that was a weird anti-pattern where it's like no
I'm going to rebuild because I'm going to go to a different repo like I'm going to put a different
ECR or something I don't want production pulling directly from the dev environments container
registry because I'm going to break something there so I have to separate these things anyway
so I'm not promoting the artifact I'm actually rebuilding in a new environment and I hope that
it's reproducible but in many cases containers aren't reproducible right like you're going to
get a different update for some library or something so it's like it's not the same thing
ideally it would be but I often see that it's more work to try to do real artifact promotion
than it is to just like we're just gonna let ci go in hope and pray that this is close enough
when we roll it out yeah I think I think it's really common to build at each environment and
I think one of the reasons that the build one deploy many best practice is a best practice is
one you're deploying the exact same thing that you tested so you tested it on dev you put it on
staging it's the same thing you test it on staging you put it on prod it's the same thing so you're
not getting you know the causing like flipping zeros and ones somewhere every 10th artifact or
something so it's very slightly different but I think also it saves time one of the things that
I hear the most from teams is that they're like ci pipelines are too slow and they're like it takes
way too long to like run all our tests and I'm like well on the one hand like I have literally
worked with a team who like part of their ci tests was they like opened an sftp server and like
transferred like all these files in sftp and I'm like yeah and they're like no we're gonna leave
that in and I'm like okay but I do think that builds take a really long time and I think um
I've recently run into this where I had never really spent too much time looking closely at how
teams write their docker files but I think um what I have found since I started doing that
is that many docker files are basically like one layer and so and then they're like well maybe we
can cache so it will be faster and I'm like it's not gonna cache anything if it's one layer
the whole like the cache takes the layers that aren't changed and then all the layers that you
build on top of that like it rebuilds those that's the point and so caching isn't gonna
change anything but so I think the the two it's it's more standard to build and write your code
in a way where you're not like hard coding any environment configuration into the container into
the built like artifact so it can go through but I would say mostly for consistency and then
secondly for time it can be more complex to to sort of get it either move the image from your
dev ecr to your stage ecr to your prod ecr or have like a central ecr that all environments
can reach out to and grab I don't have a strong opinion about re-tagging I know some people
who focus um a lot of my colleagues who I've worked with who were like sre specialists are
like oh we hate re-tagging because then it like messes up the metrics and the like
tracing I don't really care personally I'm not an sre once
go somewhere else for your uptime but I think for me it's it's mostly the consistency and the
time saved that now like that artifact is built and also I think if you're doing if you aren't
doing really frequent deploys it is difficult to get back the state you would have to I guess like
check out the commit that you wanted to promote and then rebuild from that rather than just like
have that artifact built at the time that that commit was made and then it's just there
and so even if you're making tons of changes on main that's not blocking you from taking
this artifact that was built in the past and moving it through to production which is another
another issue that I've seen frequently is teams you know they're like oh we're gonna do a release
tonight no one merge anything so that our code base is where we want it to be and then someone
doesn't see slack and they merge something in and they're like oh god now we have to run all
the tests again now we have to like revert this like it's like so much more complicated than just
like we built this artifact at this time and now it's just there forever or you forget a commit
that you had to put a patch on top of something and you're like third all the way after a 12-hour
build yeah yeah that one cherry pick that you're like oh I don't think it's making it I'm not doing
this again take the patch we need it no I remember this is why it used to take like three days to
build like you can't mess that up because like then you're like missing your sla for sure a lot
of my content and skits where I talk about ci I feel like I bring in like is your ci like opening
an ssh server and sftp in the entire internet and I just feel like I bring that back like over and
over again because I just remember this one team that I was like why are you doing this which is
only seconded by a team that would stand up an entire postgres server and like test all of their
migration changes against this like ephemeral postgres server in ci and then break it all down
and they were like we need our builds to finish in under two minutes and I was like you need to
stop doing that they were like you can't stop doing that and I was like okay I feel like when
you like like being a solutions architect is very much learning like the best practices you know
and then going from that to like being an engineer in production and I was like oh none of this is
they're like you can't do that or you'll break everything and you're like yeah I will give you
the diagram in the AWS blog posts that does not look like anything in reality oh my god like I
remember like I got on one team and they were a data team and they were like everything is like
never ever like test in production with like huge tables of data and they're like we're just
gonna drop this table really quick and I'm like what do you mean or they're like we just made a
test and we just copied a whole redshift table over and I'm like do you know how much redshift
what do you mean like I've just like I've spent like the first six months of that job just horrified
the whole time it's like you can do this I think I got to a point where again my job puts me very
close to production and where I'm often like teams who are having issue with with CI or need
to do a quick deployment or rollback like will often like pull me into incidents and there was
a point when I was like so comfortable with working on production that I would just sort of like get
on production and look around and one day my manager sent me a slack about some changes that
I was making in dev like that I was doing the proper way and she was just like I just want to
make sure you're like are you doing this on production right now and I was like oh I need to
take a step back from how frequently I'm on production if people are actively like you're
doing this the right way right do you I feel like it's like the videos or skits that you make that
I'm imagining like I haven't seen all of them but like sometimes I feel like that's like like
engineer therapy like when someone else has the same like problem or they've seen something
absolutely ridiculous like after you have like a bad day or like you have like something go on
and then you see it and you're like oh my god someone else goes through the same problem
like it makes you feel so much better is that I completely agree and I feel like if you were
talking about the difference between DevOps and GitOps right like DevOps was all about making
that blameless right like DevOps is always about the culture it was always about the
people make mistakes but we're trying our best sort of thing and I honestly feel like GitOps is
not get out specifically but just the era of GitOps has moved us away from caring about the
people and the culture and more about just the tools and execution of things and and the even
myself as a very privileged white dude in tech me bringing down something in 2026 is a lot different
than me bringing down something in 2014 and in just the the way that that's handled and unless
I blame it on AI is like the only only way I can escape out of this today of like oh well like I
told Claude to do it and like whoops like it did something wrong like that's the only thing that I
think is like this is okay still is that is that a problem for juniors and people getting in the
industry and people that are making their first production whoops like that is a rite of passage
passage for a lot of people and knowing that other people do it is reassuring that you can get better
and you can still learn from it and you can still be involved in this community and and still
provide value even if you've taken down production yeah I think it's it's something that more senior
engineers need to do more explicitly now that like there are fewer or with GitOps I guess
ostensibly there are fewer production outages that are happening because you're like oops I went and
changed this and didn't write down that I changed it I think a blameless culture is still really
important that's something that I screen for when I'm interviewing as I talk about you know like
how often are people causing incidents like what does the incident response look like what are your
retrospectives look like and I think it truly is just like an exposure therapy experience
of causing an issue and hopefully being in a job with mentors around who are like oh yeah I took
down prod in the past too that's okay and not you know whenever I a skit that has something to do
with taking down prod or an outage that happened inevitably there are comments being like up someone
got fired today and and I think really the majority of of companies that's not the case unless like
it's some like gross negligence thing or you purposely went in and you were like I'm gonna
delete our kubernetes production cluster then I think you might get fired but I think you know
with GitOps it's it's much more there needs to be a greater sort of explicit effort in telling
juniors like hey it's okay this happens all the time and even I like I just caused this slight
incident related to the merge commits that were happening in CI that I wasn't expecting and I like
sent a colleague a message on slack and I was like this is my fault and he just sent me back
like blameless with a smiley face and I was like yeah but it's my fault like it is and also I can
understand that like it's a human thing that happens and especially happens in software development
but I think it's really really important to keep to try to keep the culture of like the truly
blameless retrospective where where you're just yeah one of the things about like the DevOps era
of operations was that someone pushing the key that took down production often would turn around
to say actually we're all to blame because again the tyranny of the defaults the easy thing to do
should be the right thing but also this thing should be protected right like I should never
be a button that can take down production right like those sorts of things like I I've told the
story before when I when I took down Disney Animation because I told every server that we
had to go update Google Chrome like that should be a thing that wasn't possible I did hit the key
it was my fault but also it was at a time when we said actually like we should maybe put some
more safeguards around that like maybe we shouldn't have thousands of connections go out to Google at
once we should be able to block some of that or at least verify like are you really sure about this
those are things that that have gone away and at least I don't see them anymore I don't see
I don't see people publicly talking about it though like I think because if you think about
it like so I think you can't ignore the fact that we've all been through layoffs we've all been
through you know like that like the job market's tight right so like there's so much more pressure
and then when you add on the fact that everybody is like every company is like use as much AI as
you can and then use this and just you need to be as productive as possible and turn out like
the deadlines are tighter people are working with much less there's no there hasn't been
headcount in four years for anything you know what I mean so we're all learning to use AI
we're learning to do it with less and I think it's also like how long can that be sustainable
you know like when we were doing it in the first six months or the first year it was like okay
hopefully it won't be this way forever but now we're like four years into like doing as much as
we can with no headcount and surviving you know what I mean on vibes and coffee or whatever so
like I think when you add all those different factors of like you're pushing teams to like the
brink you've also or you're incorporating AI you're changing these processes they're doing
like you want them to be 10 times more productive and then there is like no blamelessness so like
you know what I mean like it just seems like we're just going to like compound the amount of mistakes
that could have been learned from but instead they're going to be either hidden or pushed like
or not talked about or people are going to instead have to fear getting in trouble like we went back
to like the safety for like like in the workplace you know what I mean like the mental safety like
safe environment so I think like I think those things are not just not happening being blameless
but they're almost compounded and I think we're going to see what happens in the fallout from that
every time the industry goes through a shift and people are relearning and retooling there's always
learning pains outages things that we make assumptions about how they worked before this
happened when we started moving to automation for for you know config management it happened
when we're starting to move to cloud moving to containerization AI is not different AI is going
to be the exact same thing I think but I think it's different though because before we've invested
billions of dollars into saying that this fixes all your problems it does it for you and we can
get rid of humans right so this product people are more interested in this product being perfect
than we are into like giving developers grace because right now developers are expendable
right so they're already like we don't need you we can fire you and use this thing for you
so I think like when we went into it with automation or kubernetes or whatever the like
CICD it was like hey we're going to try this new thing and it might not work or we might go
through some bumps but like we were understood where we were right this new thing we people are
more interested in it being perfect and it's solving the world's problems than admitting it's
a tool whether you like the tool love the tool hate the tool it's a tool right and it's good
and using at this scale for this like increased productivity if it was 100 percent perfect it
would be like a learning curve right so the fact that we're like taking it and like you have to
admit like at this point it doesn't matter what company it is they are more interested in you
using it and pretending like it's going to solve all the problems than the actual like grace and
like how we're going to figure this out and how do we recommend it being using which is one thing I
their paper on how to use and how they're going to incorporate it into their work and how they're
going to help their developers use it top tier I think oxide is going to continue to push the
industry in the right direction and that's why their company is so successful because like
that seems like just a intelligent logical thing to do and I think people are too busy just
ignoring the fact that we have to learn how to work with AI and change like the way that we're
being productive and kind of come up with a new process and just saying it's magic box and it's
like this elephant in the room that I think we could grow so much by just figuring out like hey
how do we use this in production how do we like actually teach people how to use it but nobody's
talking about it it's like the elephant in the room just use it and shut up and it's your fault
if it breaks yeah and I can really agree with you like I was the the shift has been the same where
people are involved but the pressure of it today is way worse and in it there was a little bit of
that I think when we moved to cloud because that was on people's credit cards and we're like oh
yeah your lambda function ran wild and now my credit card got maxed out like that was an outage
that would happen that caused problems for people that was more serious than like oh a server crashed
right like oh like no I have to pay this bill now but that's more of being accountable financially
right like it's being accountable but it's still like you know shit happens and the AI
AI is propping up the United States economy right now that's what I'm saying nobody's going to be
like this is going to be like a you know a trial and error people are just like it's magic and you
must use it you know but I also think like anything is going to take time to make it a process and
nobody's like we've talked about how execs love it we've talked about how it vibe codes some cool
stuff but nobody's talked about how you get from there to production nobody it's the elephant in
the room we're not talking about how to train juniors to use it how to make sure you're learning
and it's not making you just dumb like it could actually probably be more profitable if we would
figure out how to get from point a to point b like it's not even like shade it would figure out how
to actually make it a thing I don't know so you see a role for AI in this in in CICD and platform
space yes I mean I think I like many engineers do view AI as a tool executives like notwithstanding
being like this is productivity magic and will replace people and like the doom and gloom of
late-stage capitalism come to fruition I view it as a tool when I spoke at CubeCon EU several years
ago about OPA and conf test and chat GPT was out and was terrible and I remember like being like
write me a rego policy and it gave me all this stuff that like is invalid code and I was like
and I think now with I mean who knows where AI will be in May when this comes out but I think
now it's like Opus 4.6 just came out I've found real value with with that tool but I do think
that the downside we're not really united and as a as a career or even at a company level it's it's
hard to unite in how people are using AI I think because there is both there like people who really
hate AI and are like I'll never use AI and then people who are like I really love AI they're like
well I'm just gonna stay quiet or the people who really hate AI are like well I'm just gonna stay
quiet and so we're like not yeah it's it's definitely an elephant in the room I do think
I think another issue is just that like AI is so verbose and so then people are opening these PRs
with you know again it's the it's the early startup nightmare of like the one engineer who
in the middle of the night changes 400 files and writes like 10,000 lines of code and it's just
like please like look good to me like thumbs up approve yeah I think AI is really similar
I don't know the answer or the solution or even how to start to get there probably like speak about
it more openly I think um and as as employees in tech like you know I also think it's like
like what you just said is speaking about it openly isn't it wild like we in this podcast
we've talked about GitOps DevOps SRE like all of these things there's been some sort of process
whether it's the right process the wrong process they've argued the process to death but like in
normal developer way no one is arguing the process in which to use AI into like in production and we
would get value from that because that's how our industry works we come up with a process somebody
else like rips your process apart and then they name it five different things even though you're
doing the same process and you fight it out somewhere you know what I mean like nobody's
doing that why are we not doing that with this thing that's supposed to change everything instead
of deciding if we're going to use it or not use it you hate it or whatever kick the tires and
figure out how we can use it in a safe way that we can teach junior engineers we can make it secure
I will say the way I've been using it is it's improved my POCs dramatically where historically
I wrote almost every POC I ever did in bash like the first time I wrote something it was just in
bash because I was familiar with it I can't deal with you like you just like that you are like I'm
gonna pick the one thing that everybody hates and I'm gonna use the one commonality that like I knew
how to write it and like but I was like like full-on like websites and APIs in bash like I was just like
you know what like I don't until I got to a certain point like I would move over like those
javascript people who are obsessed with javascript but with bash it's just wild I could do it I could
do it but like I've been able to do so much more not because of my limitations but because I know
roughly how the thing should look and and be able to fill in the deals like I wrote a mobile app
wrote like I vibe coded a mobile app as a POC oh no no no not in bash with with AI with he would
try it though Eve like he he if anybody would try it he would try it it would be like javascript with
a back end of bash yeah but like I vibe coded a mobile app to like see if the POC would work
so that I could show the process and the value to get budget to pay a person right like that was
that's that was my thought process in in use case here was I want something to exist I need to show
why and how it might work and I need to at least prove it up but beyond just a description right
like here's a paper of like how this thing's like no no I'm going to show you a video of what I
think should be possible in this thing and I will show 10 of it right like maybe not everything
works but we we get there right and it took me about a day and a half to vibe code it it cost
me 40 in clod code of like okay this is enough sessions that there we go let's let's put this
out there record a video and now I can go through the process of hey here's the thing that I want to
exist I found a consultancy or a freelancer or hire on a full dev if we think it's valuable enough
here's the budget we want to set aside for it we want to prove this out as something that we can
support and make it possible and and granted they can use AI in it as well like right they're going
to use some of that but they're going to have much more knowledge in how it should be set up
and how it can be maintained than me just like let's see if this one works and and get it work
enough to deploy to my phone but don't you think that in itself is a better value prop than we can
fire everybody and then like we'll just vibe code everything and it'll be great like you still saved
money by like making this thing more robust but like faster you proved it that it would be something
that's productive and worth spending the money on why is that never the value prop when we like
are selling these things and for me I think the promise is bigger right because the promise is in
a year I don't need to hire someone right and when we're on opus 10 then it's like okay give me
the me the full app I've tried to like debug a 4,000 lined like it's rough like I think Claude's
a game changer but it's rough like yeah but to me like I can go and start a proof of concept three
different ways figure out why one way is the best way and then it helps me to get started when I'm
deer in the headlights overwhelmed with something you know and helps you break down and plan and go
really in depth and then go pull me sources then I can go read those sources it makes you so much
more efficient and it does it's really helped me not to be stuck as much you know I think there's
a really good space for AI in incident response of looking over all the logs everywhere forever
and being like this might point you toward like more toward a solution I think that would be a
application of it that I'm sure someone is I'm sure there's some startup somewhere yeah it's
really good at logs well think about it but the human eye right like our eyes get tired watching
line to line to line you're you're more likely to miss something so if you can get something
to at least like narrow it down a little bit then you can go in there and like it's like the perfect
use case really like it's one of the really good use cases I was just posted on blue sky like the
other day of like we spent trillions of dollars to make computers read their own error messages
right like that is the whole point I was just like you're not wrong I have used AI a lot and
found a lot of value in it actually for soft skills for like forming messages or copy pasting
some slack messages and being like is this person being like rude to me or am I just reading into
that and then also being like here's my response can you make it like not rude or mean or like
I hate this person and then Claude's like you should hate this person and I'm like thanks
let me tell you Claude will Claude will make you feel better about some things like
you're absolutely right and uses it to like reply to like her ex-husband and I'm just like
one time I used it when uh when my partner and I had a fight and I was just like he doesn't
understand and Claude was like well look at this from from his perspective and I was like thanks
therapist Claude I'm just like I went from like how do people like use AI as a therapist to like
okay okay I see how you went down the like I see how you went down the like wrong road
maybe I'm not gonna go through it but like it's good for taking the emotion out of it and then
listing some logic for you like it's not bad it's also a good hype man probably too good for a few
people but like yeah a little you have to be careful that you're not you know they're they're
real dangerous there I think we're gonna be okay but some of those red pill podcasters I'm like
I can see how this would be really bad for you yeah as a woman in tech I'm using it for the
confidence boost of like you're not an imposter and they're using it exactly they're used we're
like no you don't have that problem please stop it's so real I felt it in my soul we're right
about time um Eve I'm curious is there is there any advice you would give to someone coming into
whatever we want to call the platform engineering CICD DevOps whatever we want to call this space
of of something that they should should or shouldn't be doing something they should try
to look at or even just how to like get started and have some more I don't say authority but
some better designs on the real world's implications of how we ship code to customers
oh I would say there are a couple things one is um and maybe this isn't universal but it's
absolutely true for me is like just go do something as much as I disliked my computer
science class where my professor was like just go do something just go do something just go
like write a kubernetes namespace spec and then a pod spec and use like kubernetes on docker desktop
to like play I have always learned best by like doing and actually like using using my hands to
type and make make something I'd say the second thing is if you are lucky enough to have someone
in your life who is you know a senior engineer or you're a junior at a company or you have
friends who graduated a couple years before you and are now working is to get comfortable
saying that you don't know and like asking for help I think I was very lucky in my first job
that I had an incredible mentor who was like don't stay stuck for too long like don't stay
stuck for like more than half an hour before sending me a team's message and being like I
don't know what's happening I think I think it's really valuable to get comfortable with the feeling
of like not knowing and then from admitting even if you're just admitting to yourself like I don't
know you can take it to Claude and like no one will ever have to know if you're like me and like
I'm a very introverted person who's like I don't know what I'm doing it might not know the answer
but it gets you unstuck though yeah but just just get comfortable with not knowing or get maybe like
get comfortable with discomfort because I feel like there is a lot of that in tech which is
constantly changing and you're constantly learning new things shout out to mentors like that because
they are like the make it break it especially I think especially in women's careers because it's
so much harder to be able to like come from that place of imposter syndrome and people kind of
already assume that you don't know what you mean what you're talking about so like I think those
mentors are just really really important what was your handle again and how did you come up with it
help me exit vi and and again like I came up through I was not like a dev I was doing data
analysis and so the first time that I had to commit something to get I was like how do I get
out of this like where am I help me and I think it's it's a joke that everyone understands I think
one of my most popular videos on youtube is is 10 ways to exit vi which is because it shows up in
google searches it's just like yeah no here's here's 10 ways you can get out did you really
find 10 whole ways yeah oh there's more than that actually people left comments it was just like I
had no idea did I did one time I was I was trying to to show my partner how how difficult it is to
exit and he's a data scientist and I was like try to exit this and he just like closed the terminal
and I had had a tab like a different tab that I was actually doing something on and I was like oh
the joke is on me that's my fault oh oh no wouldn't the joke goes wrong yeah yeah I think that's like
so relatable but also like do you just wonder like is it some mean gotcha that they don't teach you
git or like vi in like school like are they trying to hurt people's feelings like or is that just to
make sure the rate of dropout is like not increasing it's the gatekeeper like and then they're just like
we hired an intern and it knows both and I'm just like is it like someone told them the secret like
that's like unfair Eve thank you so much for coming on the show uh we will have links to
your handles in the show notes everyone can check it out and uh thanks everyone for listening and
we will talk to you again soon you're gonna get like a bunch of likes on your like instagram and
they're gonna be like for me going through all your videos and like relating I was gonna back
go through the backlog yeah thank you so much
thank you for listening to this episode of fork around and find out if you like this show please
consider sharing it with a friend a co-worker a family member or even an enemy however we get
the word out about this show helps it to become sustainable for the long term if you want to
sponsor this show please go to fafo.fm sponsor and reach out to us there about what you're
interested in sponsoring and how we can help we hope your system stay available and your
pagers stay quiet we'll see you again next time