201: What I ACTUALLY Do as a Data Analyst
Help us become the #1 Data Podcast by leaving a rating & review! We are 67 reviews away!
I'm a senior data analyst with 10+ years of experience and I'm breaking down exactly what I did, what tools I used, and what problems I solved across very different industries.
π Join 30k+ aspiring data analysts & get my tips in your inbox weekly π https://datacareerjumpstart.com/newsletter
π Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training π https://datacareerjumpstart.com/training
π©βπ» Want to land a data job in less than 90 days? π https://datacareerjumpstart.com/daa
π Ace The Interview with Confidence π https://datacareerjumpstart.com/interviewsimulator
β TIMESTAMPS
00:00 β What nobody tells you about data analyst work
01:00 β Predicting refinery outcomes with math models
04:05 β When data analytics meets machine learning
07:00 β Finding needles in millions of log files
09:23 β How one analysis ended up driving marketing & sales
π CONNECT WITH AVERY
π₯ YouTube Channel
π€ LinkedIn
πΈ Instagram
π΅ TikTok
π» Website
Mentioned in this episode:
π March Cohort β Data Analyst Bootcamp (Starts March 9th)
Ready to break into data analytics? Our March cohort kicks off with a live call on March 9th at 7pm ET where you'll meet your peers and mentors on day one. Save 20% when you enroll now, plus get two free bonuses: 6 months of Data Fairy (your AI co-pilot through the bootcamp) and a bonus course β "The AI-Proof Analyst: Why Thinking Still Wins." Claim Your Spot β https://datacareerjumpstart.com/daa
Transcript
Avery Smith-1: I'm a senior data analyst
with 10 plus years of experience.
2
:What did I do in those 10 years?
3
:What tools did I use?
4
:What problems did I solve?
5
:That is the topic of today's episode,
and I'm gonna tell you everything
6
:so that way you know what to expect
as a data analyst in the future.
7
:I've had a really vast career where I've
worked for one of the biggest oil and
8
:gas companies in the world, and I've
also worked for a 10 person biotech
9
:startup that you've never heard of.
10
:Before, so let's get into it.
11
:By the way, if you're new here, my
name is Avery Smith and I try to
12
:share useful data content that will
help you start your data career.
13
:If that's of interest to you, you
gotta check out my newsletter.
14
:30,000 other aspiring data
analysts are already subscribed.
15
:Go to data career jumpstart.com/newsletter
16
:or find the link in the
show notes down below.
17
:So the first company I wanna
talk about is ExxonMobil.
18
:And what was it like being a data analyst
and a data scientist at ExxonMobil?
19
:Obviously this is one of the
biggest companies in the world.
20
:There's like 70,000 employees and
they do a lot of different things.
21
:Now, I worked in the downstream.
22
:Part of the business, which
basically means the refiners.
23
:These are the people that are taking oil
and turning it into gasoline essentially.
24
:And what do we do there as data analysts?
25
:Well, we tried to make a mathematical
model of every single part of the
26
:refinery, and I don't think this is,
you know, groundbreaking to those who
27
:are in the oil and gas business or
any sort of manufacturing business.
28
:If you can create what's called
like a digital twin or like a math
29
:twin of your process, you'll be able
to experiment with the math model
30
:instead of experimenting in real life.
31
:So you can be like, well, if I twisted
this temperature, or I changed this
32
:pressure, or we, you know, added
this new oil, what would change?
33
:Would we make more money?
34
:Would we make less money?
35
:What would go well?
36
:What would go poorly instead of actually
experimenting In real life, you can
37
:experiment with these simulations with
your data model, and that way you don't
38
:actually have to do it in real life.
39
:Now to create these models, there's lots
of different ways that you can do them.
40
:I'm not getting into the
nitty gritty of like.
41
:Modeling these types of things.
42
:But when you think model, the simplest
version that you can think of in
43
:your head is linear aggression.
44
:And if you're not familiar
with linear aggression, you
45
:learned it definitely in school.
46
:It's the simple thing
of Y equals MX plus B.
47
:That's the simplest form.
48
:So basically you have an input.
49
:An X.
50
:If based upon your input, can you
predict what the output is going to be?
51
:If it, you know is a linear relationship,
you'll be able to have the slope that's
52
:the m and some sort of a y intercept,
and basically guess what the output
53
:the Y is going to be based on the X.
54
:Now you can do that a
lot more complicated.
55
:You could do multivariate, linear
regression, which is like y equals.
56
:M1 X one plus M two X two plus X 3M three.
57
:Oh, it's so confusing.
58
:But my whole point here is like we
were doing these mathematical models,
59
:and the simplest form that you
can think of is linear aggression.
60
:So I created a lot of these
models as a data analyst.
61
:And I also used data analytics to try to
understand our simulation results better.
62
:So we'd actually run dozens,
hundreds, thousands of simulations
63
:trying, you know, different things.
64
:Well, what if this pressure went up by a
little bit, or this temperature went down?
65
:To actually look at a thousand
different results is really hard to do.
66
:So we used data analytics
to try to understand the
67
:results a little bit better.
68
:And a lot of this was done in a
Power BI dashboard, so I used a lot
69
:of Power BI dashboards right there.
70
:And to do the modeling.
71
:We actually did a lot in Excel, believe
it or not, and we did a lot in Python
72
:and we even used a more proprietary
software that you don't hear a whole lot.
73
:It's from sas.
74
:It's called Jump, JNP, to do our modeling.
75
:So those are the tools that we're using
at Axon, and that's the problem that
76
:we're trying to solve is basically,
hey, if we wanna make changes inside of
77
:our huge manufacturing system, can we
actually come up with a way to test it
78
:before testing it in real life so we can
kind of know and expect what to happen?
79
:I think that's common for,
you know, manufacturing.
80
:I think that's common for any sort of
like time series data you might have
81
:is if you can create a model, it's
useful for the company to be able
82
:to predict the future and be able to
figure out what's going to happen.
83
:A lot of the times this type of
analytics is called prescriptive
84
:analytics, where you're actually like
trying to not predict what's going
85
:to happen in the future, but trying
to decide if you make these changes.
86
:How will the system basically be affected?
87
:The next data job I wanna talk about was
when I was a data analyst at this nano
88
:biotech startup, like think 10 people.
89
:When I joined the company, this
company made really cool nano sensors.
90
:So think of it as almost like a game
boy, uh, game, like from the olden days,
91
:that's like the size of this little board.
92
:And on this board there was a bunch
of different sensors this, you
93
:know, chemistry company had built.
94
:And the sensors would basically react to
what was in the air and we would track.
95
:How their electricity basically,
or their, their amperage or their
96
:current, through these different
sensors would change when these
97
:different chemicals in the air hit it.
98
:So, for example, if you were holding
it in the air, you know, all the
99
:lines would be kind of stagnant.
100
:But for example, let's say you
brought an orange next to it, it
101
:would basically smell the orange.
102
:And each sensor would react differently
to that orange being nearby.
103
:And when you have, uh, an array of
these 12 different sensors, you can
104
:basically create the equivalent of
like a fingerprint, but for smells.
105
:So think of it as like the smelling device
that would basically take smell prints.
106
:My job as a data analyst there was to
actually look at the time series data.
107
:'cause we'd run these experiments where
you'd have like basically background
108
:noise for a certain amount of time
and then you'd introduce something
109
:like an orange for maybe 30 seconds
and then take the orange away.
110
:And we'd look at these time series and
we're trying to use these time series data
111
:to actually create these smell prints.
112
:And that's a very difficult thing to do.
113
:It actually most of the
time took machine learning.
114
:So once again, this is maybe a
more advanced data analyst role.
115
:'cause most data analyst roles.
116
:You're not really using machine learning.
117
:This type of machine learning is often
called classification, where you're
118
:basically trying to match data to a
certain category based off of its data.
119
:So for example, I could bring
an apple near it, right?
120
:And the sensors would react.
121
:Maybe they'd go all down, and if
I brought an orange next to it,
122
:maybe all the sensors would go up.
123
:And so you can come up with some sort of
an algorithm that would be like, okay,
124
:if the sensors go up, it's an apple.
125
:If they go down, it's an orange.
126
:Now that's really oversimplifying
it because apples and oranges,
127
:those are only two things that
exist in the universe, right?
128
:There's like so many
different things that exist.
129
:We were playing a little
bit bigger stakes.
130
:You can think of it when
you go to uh, TSA line and.
131
:And sometimes they, you know, swab you
and they're trying to see if you have
132
:like any drugs or any bombs on you.
133
:That was kind of the stakes that we were
playing with in some of our use cases.
134
:So I would take this data that oftentimes,
you know, was time series based.
135
:We usually had like 12 to 16 to
24 different sensors on there.
136
:And I would try to make these
smell prints using classification
137
:models in machine learning.
138
:Now, a lot of the time I was
doing this in Python python's.
139
:Great for doing things
in machine learning.
140
:There was even some simple
algorithms that I created that were.
141
:Based in Excel, but
they are pretty simple.
142
:The more complicated stuff.
143
:I was doing Python at the time.
144
:Also, just because we were doing
a lot of these experiments, SQL
145
:would've been really helpful.
146
:We weren't actually using SQL
as much as we should have.
147
:We really should have been using sql.
148
:Uh, looking back on it a little bit more.
149
:The third experience I wanna tell you
about was when I was doing my own,
150
:uh, data science consultancy firm,
and I got hired by a cybersecurity
151
:company to help them with a few things.
152
:So obviously we live in this digital age.
153
:Cybersecurity is really
important, so there's a lot of
154
:opportunity in cybersecurity.
155
:And the interesting thing
about cybersecurity is a
156
:lot of the data is like.
157
:Hidden in logs, because basically anything
you do online, anything you do on the
158
:internet gets logged one way or another.
159
:Like it's, it's in there.
160
:They're capturing everything, but when
you capture everything, you're kind
161
:of capturing nothing at the same time
because it's really hard to figure out
162
:what's the signal amongst so much noise.
163
:And so this company in particular
was basically getting a bunch of
164
:internet logs for companies in what
you can consider their workspaces.
165
:So for instance, all of their Microsoft
logs, all of their Google logs, if
166
:they're using Slack, their Slack logs,
maybe their employee customer history.
167
:Just think of like anything
a company might be interested
168
:in from a cybersecurity stand.
169
:We were just getting a bunch of the logs.
170
:Now in these logs, there's maybe
little needles in the haystack.
171
:There's maybe little gems
that can be pulled out.
172
:It requires a lot of analysis to
try to figure out what's in there.
173
:Just imagine you're getting
like a ton of hay and you have
174
:to find this little needle.
175
:And so my job was to go in there and try
to see if there was any needles, anything
176
:that was like really worth diving into
and investigating more, and also just
177
:summarizing everything that was happening.
178
:This is how many logins
you had on Google today.
179
:This is how many, you know,
logouts you had on Microsoft.
180
:You know, this is how many users
you had from these different states.
181
:Just like from these giant enterprise
organizations where they have thousands of
182
:employees and a bunch of things going on.
183
:Like how do you know
everything's going okay?
184
:Are you sure that like everyone
is where they say they are?
185
:Are you sure you don't have any intruders,
you know, people accessing stuff from
186
:a place that you probably shouldn't?
187
:Those types of things.
188
:So we were basically taking.
189
:These huge dumps of logs that weren't
really important, that weren't really
190
:interesting, and aggregating them and
trying to find the interesting things.
191
:And then also making sure
that nothing nefarious was
192
:going on to do that analysis.
193
:I was actually using all of Python, but I
could really choose what tool I wanted to.
194
:I just chose Python personally because
I'm very comfortable in Python.
195
:I'm, I'm decently good at Python, uh,
and I can do things quickly with Python.
196
:I probably couldn't have done
this as easily, like in Excel.
197
:You probably could have done similar
stuff in SQL if you wanted to.
198
:One thing I really like about Python
is it can do anything, maybe not
199
:extremely well, but it can do anything.
200
:Um, so like I was doing all my analysis.
201
:Uh, in Python and I was creating
data visualizations in Python.
202
:They even used a lot of the insights
I found, like in terms of aggregates.
203
:They basically like aggregated all of
their customers data and would publish
204
:like a, a yearly or, or biannual
report of like cybersecurity incidents.
205
:And so they were kind of like with graphs
that I was creating with some of these
206
:KPIs or metrics that I was monitoring.
207
:That way they could kind of inform
the cybersecurity, you know, fields
208
:all of their customers about like
what the trends and what we were
209
:seeing on a big picture standpoint.
210
:And that was actually really useful 'cause
people would start to like read that and
211
:be like, oh, I really like this company.
212
:I wanna work with them.
213
:And that would bring in new customers.
214
:So even though like I was doing
that analysis for individual
215
:customers at an individual level.
216
:That analysis actually ended up being
really useful for their marketing
217
:team as well to get more sales and
more customers in the pipeline.
218
:Now, I've actually worked for way
more than just these three companies.
219
:I've probably done work for
about 12, including like the Utah
220
:Jazz, Harley Davidson, and some
other really big names like MIT.
221
:If you want to hear more about
those, I'll be talking about
222
:them more in my newsletter.
223
:So you can
subscribe@datacareerjumpstart.com
224
:slash newsletter, and
I'll be talking more.
225
:About these experiences in the newsletter,
but if you want me to talk about it
226
:on the podcast or YouTube as well,
let me know in the comments down below
227
:and maybe I'll do some future episodes
on that if we get enough comments.
228
:As always, thanks for watching
and I'll see you in the next one.
