New year, new plans, new to-do list, and an opportune time to start something completely new. This year I will aim to actively write contents online for myself and others. So I am going to begin this journey by writing this first post with these
1. Intents
- A retrospection on my Data Science(DS) journey for the past 5 years.
- To motivate aspiring data practitioners. I think it is helpful to go through someone else’s journey.
- To share my thought processes and personal/general reflection.
2. Conundrums (2014)
I first read about big data during my 2nd year’s internship at Singapore Police Force in 2013. It was an internship on “Big Data in Policing” where I had to do a lot of reading on the subject matter. It was at that time when I learnt of big data analytics which was an up-and-coming field (at that time) that is an amalgamation of Statistics, Computer Science and Business Domain Knowledge. That piqued my interest because it contains two of my favourite things, Statistics and Programming.
Serious career planning started back in 2014 during a 3rd year’s internship in Bangalore at Infosys’ electronic city campus. All interns stay in Infosys’ campus and it was time-consuming to travel outside. So I ended up with a lot of long nights and time to contemplate about my career which was something I constantly worried about. I realised that
Especially after going through Singapore’s education system myself, we were never taught to think for ourselves. It was always about exams, grades, fixed subjects we were all forced to do, outscoring the guy sitting next to you in class, and making your parents proud. However in university and life, there are no more “goals” fixed for you. Suddenly there’s all kinds of decision you have to make for yourself like module selection, internship application and thesis advisor. Same for career planning..
It will always be daunting no matter at which stage of your life. Mired in uncertainty and self-doubts, you are never sure if you are good enough to be shortlisted for a job interview. There’s just way too many moving parts, too many variables. So using the most common mathematical way of simplifying things, I fix the variables except one and analyze that on its own. Do that for every variable. Eventually I was able to narrow down the problem and distill it down into a couple of angles:
- Government services or Private sector?
- Government service: Iron rice bowl, guaranteed above average salary, red tapes, less autonomy, harder to leave.
- Private sector: Uncertain prospects, riskier compared to government service which also directly translates to more opportunity to hustle hard.
- Statistics or Actuarial Science or Programming?
- Statistics: Natural path since that is my major.
- Actuarial Science: A time-tested path to a stable and good paying career as long as you continue to take certifications.
- Programming: I enjoyed R programming a lot.
- Industry (e.g. banking finance, telco, tech)?
- I was pretty open-minded about the industries. It didn’t really matter to me.
- Money or Passion?
- Money: It’s definitely important but is it important enough for me to jeopardize long term goals for short term gains?
- Passion: I was really certain about what I enjoyed doing which was Statistics and Programming at that time (and even now).
It’s a lot easier to think of things this way.
For me, my guiding principle has always been to follow my strength, not passion. Passions are hard to find. Passions that pay well are even harder to find. Some activities like playing sports, drawing and design does not always work out in Singapore. Sometime, and honestly, you may not be even good at whatever your passion lies. Luckily there is an easier solution. Described in the book “So Good They Can’t Ignore You”, it talks about how skills trump passion in the quest for work you love. It is an inspiring book and having read it, it affirms my prior belief (hehe) that passion can be cultivated through mastery in a skill.
Initially, I took up Statistics not because I knew I was passionate about it (I wasn’t sure.). I took it because I knew it’s one of the few majors I will have the best shot at being better at since I have always done well in Mathematics (You will never find me in a drawing or art class voluntarily.) I digress but the point is that I wasn’t worried that I would be diving into a career that I will not be passionate about. I can just cultivate that passion after becoming good at it. But first, you need to focus on a skill or area that you have the best shot at being good.
So anyway, I planned and studied the career planning problem logically. I was also brutally honest to myself and recognized what was my strengths/weaknesses. After considering all things, I roughly knew what I had to do and the next step was to decide on a path and stick to it.
3. Action (2014 - 2015)
Back then there wasn’t a major for Data Analytics in National University of Singapore(NUS)(They came up with Business Analytics in 2015 after I graduated.) At that time I felt that whatever I have learnt in NUS still did not prepare me for a career in DS. It was never meant to since it was only a major in Statistics. So during my India internship, I also went ahead and enrolled myself into Prof Andrew Ng’s Machine Learning MOOC on Coursera. Till this day, even after taking more than 30 MOOCs since 2015, this MOOC remains the most important and impactful course for me. The fundamentals and theories of Machine Learning(ML) taught in the course continues to stay relevant to my work.
After I came back from India to continue my 4th and last year of studies, I also went ahead to take up and complete all 10 courses under the Data Science Specialisation taught by Johns Hopkins University. I did this concurrently during my 4th-year school work. It was a beginner DS specialisation that helped boost my confidence to apply for DS jobs. At the same time, I also started to dabble into Kaggle and try out some competitions.
During the job application process, I applied very widely across multiple industries as long as they were open to hiring junior staff for DS work. Take note that you have to read the job description carefully and make sure that it is a DS and not a BI role. A good way to do this is to look at the set of technical skillsets they are looking out for and ask good questions during the interview. I also applied for a couple of statistician roles for good measures. Eventually, I was offered two positions before graduation and I had the luxury of time to think it through carefully. (Good to apply for jobs early if you are still in school.)
One of them was a statistician role at a Ministry in Singapore while the other role was a data analyst role at a local telecommunication company. I accepted the data analyst role although it pays around 23% lesser than the statistician role. The key motivation was because of the DS learning opportunity as well as the alignment with my long term goal. It was not easy to give up the higher paying job since I had a study loan to pay. However, I was really grateful for being offered the data analyst role because I felt I was not fully qualified for the job (Don’t worry about that. No one expects a fresh graduate to be fully qualified anyway). I believed I was offered because of the extra DS activities I was doing outside the school and my boss was also willing to groom a junior staff.
4. StarHub (2015 - 2016)
At StarHub, I was a data analyst in their relatively new data science team. There were a couple of things that I really liked about the team/role:
- Teammates: Full of talented and motivated individuals/seniors who are so willing to share their knowledge and help out whenever they can. (This is not easy to find)
- Collaborative Environment: I can’t speak for other teams in the department but I felt my team had this sharing and open culture. There wasn’t any politics within the team and we collaborated very closely for many different projects. This is so important for a junior staff because I got to emulate what the senior guys were doing and that accelerated my learning by many folds.
- Technology: Within 1 year, I was exposed to a lot of technologies which was how I picked up so many new stuff in a short period. I literally said yes to almost every task even if I didn’t know the language required. Before the team shifted their stack, I had to write codes in Pig(new), Java(new), Python(new) and R. Eventually I also have to write pyspark(new) code. (I can still write in Pig and pyspark. Python became my lingua franca in replacement of R. These are useful DS tools. I can’t write java anymore.)
- Mentor: I was allocated a mentor who is very smart, experienced and willing to share his knowledge. I was able to share any concern or ask any question very freely and that helped a lot to boost my knowledge and confidence. (It is important to find a mentor)
- Data: StarHub, a local telecommunication company, has tons of customer data. It was one of the biggest pull factors when I accepted the job. You need data to do data science. It’s simple as that. (One of the key questions you need to ask during an interview.)
- Jobscope: I liked that it was in a startup-ish environment. I was involved in all sorts of work like ad-hoc data analysis, engineer logic for new features, design algorithms, attend external meetings, and even present demos on our product for potential clients. I took up full ownership for my own areas and had almost full autonomy. In hindsight, it’s amazing I was entrusted with those responsibilities and that was mostly due to my boss.
- Boss: Also a veteran in the industry. I appreciate the trust he placed on the team to do our job by doing little micro-management. I also liked how he was very frank with me. He was not afraid to scold me when I made a mistake(yea one time I got scolded very badly for some mistake.). He also gave constructive feedbacks which I really appreciate a lot. I believe that only bosses who give a damn will give constructive feedback.
Strictly speaking, this role was missing the ML side of DS. So at the same time, I was still actively exploring Kaggle competitions and reading other Kagglers’ scripts. (You should keep track of your long term goal. I knew my role in StarHub lacks the ML aspects which is why I actively kept up my ML knowledge. My Kaggle projects were a big topic during my interviews at DBS.)
5. DBS (2016 - 2018)
After 1.5 years at StarHub, I landed an opportunity for a data scientist role at DBS Bank. I would be in charge of supporting regional consumer banking group for the bank especially for Singapore, Hong Kong, Taiwan and China. Seeing that it is an ML role with greater responsibility, I took up the job immediately.
It was at DBS Bank where I immensely honed my skills in practical ML. I was involved in the whole end to end process of every DS project and here are some main steps:
- It always starts with Business: Understanding business requirements need active listening and open discussions with all parties. That is only half of the job. The other half is to for you to frame the business problem into a DS problem. What is the event definition or target variable? What are the success factors for this model to work, for business to want to use it? What are the timeline and major milestones?
- Exploratory Data Analysis (EDA): Pulling data from respective databases (e.g. Hive, Teradata, SAS database). Exploration of datasets and its relationship with the target variable. Data Cleaning. Preliminary variable selection is also required because there are thousands of variables in the database.
- Model Building: Trying out different models, usually starting with the simpler models (e.g. GBM, RF, Logistics Reg) because they are much easier to explain to business.
- Model Evaluation: Evaluating the results of the models, picking the top few models and share the results with business.
- Present Results: Presenting results, models and variables used to business to convince them.
Practical ML is an iterative process. We meet business users very often and repeat some of these steps all the time. I have omitted a lot of details and steps for brevity. Also, I didn’t have to deploy my own models which is something I need to improve on.
It was also at DBS Bank where I built my first recommender system under the mentorship of one of the seniors. I stayed in Shanghai for 2 months to build a Hybrid User-Item Based Collaborative Filtering model for their relationship managers. When I came back, I was tasked to build a more complex recommender system completely on my own as the sole contributor (That senior left the bank shortly after I returned to SG.) It was difficult at first because no one else in the team has any idea how to help or build one. It was after reading a lot of papers and articles that I managed to find and implement a deep learning model that outperforms the old rule-based model and fulfil multiple business requirements.
At DBS Bank is where I realised the importance of alignment between business and DS. Without business, there will probably be no need for DS in the first place.
6. Summing Up
That was a pretty detailed recollection of how I got started and most of the things I’ve in the past 4-5 years. I am definitely still not a very seasoned data practitioner (I don’t like the term data scientist.) and there are just SO MANY areas to research and read up on. I still spend around 10-20 hours every week to read up and pick up new DS knowledge.
Thanks for reading the post. Hope it was informative and helped in some way!
7. Links
Prof Andrew Ng’s Machine Learning MOOC on Coursera
Data Science Specialisation taught by Johns Hopkins University
Bachelor of Science in Business Analytics
So Good They Can’t Ignore You: Why Skills Trump Passion in the Quest for Work You Love