Episode 201

full
Published on:

10th Mar 2026

201: What I ACTUALLY Do as a Data Analyst

Help us become the #1 Data Podcast by leaving a rating & review! We are 67 reviews away!

I'm a senior data analyst with 10+ years of experience and I'm breaking down exactly what I did, what tools I used, and what problems I solved across very different industries.

πŸ’Œ Join 30k+ aspiring data analysts & get my tips in your inbox weekly πŸ‘‰ https://datacareerjumpstart.com/newsletter

πŸ†˜ Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training πŸ‘‰ https://datacareerjumpstart.com/training

πŸ‘©β€πŸ’» Want to land a data job in less than 90 days? πŸ‘‰ https://datacareerjumpstart.com/daa

πŸ‘” Ace The Interview with Confidence πŸ‘‰ https://datacareerjumpstart.com/interviewsimulator

⌚ TIMESTAMPS

00:00 – What nobody tells you about data analyst work

01:00 – Predicting refinery outcomes with math models

04:05 – When data analytics meets machine learning

07:00 – Finding needles in millions of log files

09:23 – How one analysis ended up driving marketing & sales

πŸ”— CONNECT WITH AVERY

πŸŽ₯ YouTube Channel

🀝 LinkedIn

πŸ“Έ Instagram

🎡 TikTok

πŸ’» Website

Mentioned in this episode:

πŸš€ March Cohort β€” Data Analyst Bootcamp (Starts March 9th)

Ready to break into data analytics? Our March cohort kicks off with a live call on March 9th at 7pm ET where you'll meet your peers and mentors on day one. Save 20% when you enroll now, plus get two free bonuses: 6 months of Data Fairy (your AI co-pilot through the bootcamp) and a bonus course β€” "The AI-Proof Analyst: Why Thinking Still Wins." Claim Your Spot β†’ https://datacareerjumpstart.com/daa

https://datacareerjumpstart.com/daa

Transcript
Speaker:

Avery Smith-1: I'm a senior data analyst

with 10 plus years of experience.

2

:

What did I do in those 10 years?

3

:

What tools did I use?

4

:

What problems did I solve?

5

:

That is the topic of today's episode,

and I'm gonna tell you everything

6

:

so that way you know what to expect

as a data analyst in the future.

7

:

I've had a really vast career where I've

worked for one of the biggest oil and

8

:

gas companies in the world, and I've

also worked for a 10 person biotech

9

:

startup that you've never heard of.

10

:

Before, so let's get into it.

11

:

By the way, if you're new here, my

name is Avery Smith and I try to

12

:

share useful data content that will

help you start your data career.

13

:

If that's of interest to you, you

gotta check out my newsletter.

14

:

30,000 other aspiring data

analysts are already subscribed.

15

:

Go to data career jumpstart.com/newsletter

16

:

or find the link in the

show notes down below.

17

:

So the first company I wanna

talk about is ExxonMobil.

18

:

And what was it like being a data analyst

and a data scientist at ExxonMobil?

19

:

Obviously this is one of the

biggest companies in the world.

20

:

There's like 70,000 employees and

they do a lot of different things.

21

:

Now, I worked in the downstream.

22

:

Part of the business, which

basically means the refiners.

23

:

These are the people that are taking oil

and turning it into gasoline essentially.

24

:

And what do we do there as data analysts?

25

:

Well, we tried to make a mathematical

model of every single part of the

26

:

refinery, and I don't think this is,

you know, groundbreaking to those who

27

:

are in the oil and gas business or

any sort of manufacturing business.

28

:

If you can create what's called

like a digital twin or like a math

29

:

twin of your process, you'll be able

to experiment with the math model

30

:

instead of experimenting in real life.

31

:

So you can be like, well, if I twisted

this temperature, or I changed this

32

:

pressure, or we, you know, added

this new oil, what would change?

33

:

Would we make more money?

34

:

Would we make less money?

35

:

What would go well?

36

:

What would go poorly instead of actually

experimenting In real life, you can

37

:

experiment with these simulations with

your data model, and that way you don't

38

:

actually have to do it in real life.

39

:

Now to create these models, there's lots

of different ways that you can do them.

40

:

I'm not getting into the

nitty gritty of like.

41

:

Modeling these types of things.

42

:

But when you think model, the simplest

version that you can think of in

43

:

your head is linear aggression.

44

:

And if you're not familiar

with linear aggression, you

45

:

learned it definitely in school.

46

:

It's the simple thing

of Y equals MX plus B.

47

:

That's the simplest form.

48

:

So basically you have an input.

49

:

An X.

50

:

If based upon your input, can you

predict what the output is going to be?

51

:

If it, you know is a linear relationship,

you'll be able to have the slope that's

52

:

the m and some sort of a y intercept,

and basically guess what the output

53

:

the Y is going to be based on the X.

54

:

Now you can do that a

lot more complicated.

55

:

You could do multivariate, linear

regression, which is like y equals.

56

:

M1 X one plus M two X two plus X 3M three.

57

:

Oh, it's so confusing.

58

:

But my whole point here is like we

were doing these mathematical models,

59

:

and the simplest form that you

can think of is linear aggression.

60

:

So I created a lot of these

models as a data analyst.

61

:

And I also used data analytics to try to

understand our simulation results better.

62

:

So we'd actually run dozens,

hundreds, thousands of simulations

63

:

trying, you know, different things.

64

:

Well, what if this pressure went up by a

little bit, or this temperature went down?

65

:

To actually look at a thousand

different results is really hard to do.

66

:

So we used data analytics

to try to understand the

67

:

results a little bit better.

68

:

And a lot of this was done in a

Power BI dashboard, so I used a lot

69

:

of Power BI dashboards right there.

70

:

And to do the modeling.

71

:

We actually did a lot in Excel, believe

it or not, and we did a lot in Python

72

:

and we even used a more proprietary

software that you don't hear a whole lot.

73

:

It's from sas.

74

:

It's called Jump, JNP, to do our modeling.

75

:

So those are the tools that we're using

at Axon, and that's the problem that

76

:

we're trying to solve is basically,

hey, if we wanna make changes inside of

77

:

our huge manufacturing system, can we

actually come up with a way to test it

78

:

before testing it in real life so we can

kind of know and expect what to happen?

79

:

I think that's common for,

you know, manufacturing.

80

:

I think that's common for any sort of

like time series data you might have

81

:

is if you can create a model, it's

useful for the company to be able

82

:

to predict the future and be able to

figure out what's going to happen.

83

:

A lot of the times this type of

analytics is called prescriptive

84

:

analytics, where you're actually like

trying to not predict what's going

85

:

to happen in the future, but trying

to decide if you make these changes.

86

:

How will the system basically be affected?

87

:

The next data job I wanna talk about was

when I was a data analyst at this nano

88

:

biotech startup, like think 10 people.

89

:

When I joined the company, this

company made really cool nano sensors.

90

:

So think of it as almost like a game

boy, uh, game, like from the olden days,

91

:

that's like the size of this little board.

92

:

And on this board there was a bunch

of different sensors this, you

93

:

know, chemistry company had built.

94

:

And the sensors would basically react to

what was in the air and we would track.

95

:

How their electricity basically,

or their, their amperage or their

96

:

current, through these different

sensors would change when these

97

:

different chemicals in the air hit it.

98

:

So, for example, if you were holding

it in the air, you know, all the

99

:

lines would be kind of stagnant.

100

:

But for example, let's say you

brought an orange next to it, it

101

:

would basically smell the orange.

102

:

And each sensor would react differently

to that orange being nearby.

103

:

And when you have, uh, an array of

these 12 different sensors, you can

104

:

basically create the equivalent of

like a fingerprint, but for smells.

105

:

So think of it as like the smelling device

that would basically take smell prints.

106

:

My job as a data analyst there was to

actually look at the time series data.

107

:

'cause we'd run these experiments where

you'd have like basically background

108

:

noise for a certain amount of time

and then you'd introduce something

109

:

like an orange for maybe 30 seconds

and then take the orange away.

110

:

And we'd look at these time series and

we're trying to use these time series data

111

:

to actually create these smell prints.

112

:

And that's a very difficult thing to do.

113

:

It actually most of the

time took machine learning.

114

:

So once again, this is maybe a

more advanced data analyst role.

115

:

'cause most data analyst roles.

116

:

You're not really using machine learning.

117

:

This type of machine learning is often

called classification, where you're

118

:

basically trying to match data to a

certain category based off of its data.

119

:

So for example, I could bring

an apple near it, right?

120

:

And the sensors would react.

121

:

Maybe they'd go all down, and if

I brought an orange next to it,

122

:

maybe all the sensors would go up.

123

:

And so you can come up with some sort of

an algorithm that would be like, okay,

124

:

if the sensors go up, it's an apple.

125

:

If they go down, it's an orange.

126

:

Now that's really oversimplifying

it because apples and oranges,

127

:

those are only two things that

exist in the universe, right?

128

:

There's like so many

different things that exist.

129

:

We were playing a little

bit bigger stakes.

130

:

You can think of it when

you go to uh, TSA line and.

131

:

And sometimes they, you know, swab you

and they're trying to see if you have

132

:

like any drugs or any bombs on you.

133

:

That was kind of the stakes that we were

playing with in some of our use cases.

134

:

So I would take this data that oftentimes,

you know, was time series based.

135

:

We usually had like 12 to 16 to

24 different sensors on there.

136

:

And I would try to make these

smell prints using classification

137

:

models in machine learning.

138

:

Now, a lot of the time I was

doing this in Python python's.

139

:

Great for doing things

in machine learning.

140

:

There was even some simple

algorithms that I created that were.

141

:

Based in Excel, but

they are pretty simple.

142

:

The more complicated stuff.

143

:

I was doing Python at the time.

144

:

Also, just because we were doing

a lot of these experiments, SQL

145

:

would've been really helpful.

146

:

We weren't actually using SQL

as much as we should have.

147

:

We really should have been using sql.

148

:

Uh, looking back on it a little bit more.

149

:

The third experience I wanna tell you

about was when I was doing my own,

150

:

uh, data science consultancy firm,

and I got hired by a cybersecurity

151

:

company to help them with a few things.

152

:

So obviously we live in this digital age.

153

:

Cybersecurity is really

important, so there's a lot of

154

:

opportunity in cybersecurity.

155

:

And the interesting thing

about cybersecurity is a

156

:

lot of the data is like.

157

:

Hidden in logs, because basically anything

you do online, anything you do on the

158

:

internet gets logged one way or another.

159

:

Like it's, it's in there.

160

:

They're capturing everything, but when

you capture everything, you're kind

161

:

of capturing nothing at the same time

because it's really hard to figure out

162

:

what's the signal amongst so much noise.

163

:

And so this company in particular

was basically getting a bunch of

164

:

internet logs for companies in what

you can consider their workspaces.

165

:

So for instance, all of their Microsoft

logs, all of their Google logs, if

166

:

they're using Slack, their Slack logs,

maybe their employee customer history.

167

:

Just think of like anything

a company might be interested

168

:

in from a cybersecurity stand.

169

:

We were just getting a bunch of the logs.

170

:

Now in these logs, there's maybe

little needles in the haystack.

171

:

There's maybe little gems

that can be pulled out.

172

:

It requires a lot of analysis to

try to figure out what's in there.

173

:

Just imagine you're getting

like a ton of hay and you have

174

:

to find this little needle.

175

:

And so my job was to go in there and try

to see if there was any needles, anything

176

:

that was like really worth diving into

and investigating more, and also just

177

:

summarizing everything that was happening.

178

:

This is how many logins

you had on Google today.

179

:

This is how many, you know,

logouts you had on Microsoft.

180

:

You know, this is how many users

you had from these different states.

181

:

Just like from these giant enterprise

organizations where they have thousands of

182

:

employees and a bunch of things going on.

183

:

Like how do you know

everything's going okay?

184

:

Are you sure that like everyone

is where they say they are?

185

:

Are you sure you don't have any intruders,

you know, people accessing stuff from

186

:

a place that you probably shouldn't?

187

:

Those types of things.

188

:

So we were basically taking.

189

:

These huge dumps of logs that weren't

really important, that weren't really

190

:

interesting, and aggregating them and

trying to find the interesting things.

191

:

And then also making sure

that nothing nefarious was

192

:

going on to do that analysis.

193

:

I was actually using all of Python, but I

could really choose what tool I wanted to.

194

:

I just chose Python personally because

I'm very comfortable in Python.

195

:

I'm, I'm decently good at Python, uh,

and I can do things quickly with Python.

196

:

I probably couldn't have done

this as easily, like in Excel.

197

:

You probably could have done similar

stuff in SQL if you wanted to.

198

:

One thing I really like about Python

is it can do anything, maybe not

199

:

extremely well, but it can do anything.

200

:

Um, so like I was doing all my analysis.

201

:

Uh, in Python and I was creating

data visualizations in Python.

202

:

They even used a lot of the insights

I found, like in terms of aggregates.

203

:

They basically like aggregated all of

their customers data and would publish

204

:

like a, a yearly or, or biannual

report of like cybersecurity incidents.

205

:

And so they were kind of like with graphs

that I was creating with some of these

206

:

KPIs or metrics that I was monitoring.

207

:

That way they could kind of inform

the cybersecurity, you know, fields

208

:

all of their customers about like

what the trends and what we were

209

:

seeing on a big picture standpoint.

210

:

And that was actually really useful 'cause

people would start to like read that and

211

:

be like, oh, I really like this company.

212

:

I wanna work with them.

213

:

And that would bring in new customers.

214

:

So even though like I was doing

that analysis for individual

215

:

customers at an individual level.

216

:

That analysis actually ended up being

really useful for their marketing

217

:

team as well to get more sales and

more customers in the pipeline.

218

:

Now, I've actually worked for way

more than just these three companies.

219

:

I've probably done work for

about 12, including like the Utah

220

:

Jazz, Harley Davidson, and some

other really big names like MIT.

221

:

If you want to hear more about

those, I'll be talking about

222

:

them more in my newsletter.

223

:

So you can

subscribe@datacareerjumpstart.com

224

:

slash newsletter, and

I'll be talking more.

225

:

About these experiences in the newsletter,

but if you want me to talk about it

226

:

on the podcast or YouTube as well,

let me know in the comments down below

227

:

and maybe I'll do some future episodes

on that if we get enough comments.

228

:

As always, thanks for watching

and I'll see you in the next one.

Listen for free

Show artwork for Data Career Podcast: Helping You Land a Data Analyst Job FAST

About the Podcast

Data Career Podcast: Helping You Land a Data Analyst Job FAST
The Data Career Podcast: helping you break into data analytics, build your data career, and develop a personal brand

About your host

Profile picture for Avery Smith

Avery Smith

Avery Smith is the host of The Data Career Podcast & founder of Data Career Jumpstart, an online platform dedicated to helping individuals transition into and advance within the data analytics field. After studying chemical engineering in college, Avery pivoted his career into data, and later earned a Masters in Data Analytics from Georgia Tech. He’s worked as a data analyst, data engineer, and data scientist for companies like Vaporsens, ExxonMobil, Harley Davidson, MIT, and the Utah Jazz. Avery lives in the mountains of Utah where he enjoys running, skiing, & hiking with his wife, dog, and new born baby.