With the recent rise of the data science trend and how crucial it is to organizations, do you really have to learn data science?
For people who know me by reading, meeting me in person or attended my talks or class before, would know that I didn’t actually become data scientist by choice. In fact I become one through serendipity.
I started with a degree in IT, master by research in social media and Ph.D. in text mining. I was doing all this way back even before the before the term data science and big data was coined.
So, what is big data?
Big data, on the other hand, is kind of like a buzzword. Many of us are handling and facing some data that we have never encountered before. Big data is explained as the 3v, referring to velocity, volume, variety. This has been discussed many times in business aspects. For instance, a retail chain can have real-time data flowing from all different outlets, for us to monitor the stock levels. That is velocity. A bank is now handling all the online transactions that used to be stored on a passbook. That is volume. Variety is a tricky one, an insurance company or credit card company now have access to the reliability of a customer not only by having their spending profile and income but there is other information that can be garnered from their social media profiles like Facebook & Instagram.
You might say those are for business, however on a personal level, we are also in the information explosion era where you get overloaded with information every day. This information could be from the news, blogs, social media, but in fact, you are a content generator yourself.
Let’s take another look at the 3V’s mentioned earlier. On a personal level, we are getting more concerned than ever about storing information on our device, making sure it has enough memory to keep videos of your kids, recordings of the concerts you go to and also photos of food. We are also concerned about the amount of information coming too quickly – we always have more than enough emails to read from newsletters we subscribe to, on top of our work & family emails. You are also reading articles that your friends shared on Twitter, Facebook, Google Plus and more.
Generally, big data is data from the everyday actions we do, whether taken and used as a business advantage or used on a personal level.
What is Data Science Then?
Big data and data science are like twins, they both always get mentioned together when someone talks about big data or data science. But data science does not equal to big data. Data science can be a wide coverage of topics from basic statistics to big data tools as shown by Swami Chandrasekaran in his the famous data science curriculum roadmap.
In general, we can say that data science is a new field that emerges in the past few years, to address the challenges posed by the explosion of data. This road map sort of captures everything.
What about being a Data Scientist?
According to Michael Hochster, Director of Research from Pandora, there are two types of data scientists: Type A (Analysis) data scientist, and Type B (Building) Data scientist.
Type A data scientist primarily is an analytical person. A person who is familiar with methods used by statisticians (and probably one themselves). Their primary concern is to make sense of data but in a fairly static way. Type A data scientist are considered data scientist as they know something beyond the statisticians, from data cleaning, dealing with datasets, to visualization and so on.
Type B data scientist shares some basic or common knowledge with Type A data scientist in terms of the statistical skill level, but they come from a strong programming background and are usually great coders or software engineers. In contrast, Type B data scientist can program better than Type A data scientist, and build models that will end up being used in a product. Type A data scientist, on the other hand, can be better in terms of modeling, reasoning, research and experimental design.
I personally fall somewhere in between Type A and B, due to my background. I definitely start off as a Type B data scientist as my background is in networking, database, data structure and algorithms. Plus, I have been working as software engineer and CTO for quite some time. However, due to my Ph.D. training, I slowly became a researcher. As a researcher, a lot of time building a workable product is not enough, even though your product works well. We are “forced” into the philosophical thinking mode where we need to understand the reason as to why we are building the product, and where we are heading to. What is the impact and contribution the product can create for the human race, mankind, or at least the field itself?
If you ask me, I consider myself fall into the mixed breed of A & B, B dominant.
So do you really have to learn data science?
The short answer is, no.
Data science is not for everyone especially if you don’t need it for your job immediately. However, it is still good to have a certain level of exposure to at least know what data science is, and what are the actual problems that can be solved using data science.
Who knows when the information you’ve learned can be helpful when the time comes, right? The best way to start learning data science in the beginning stages is to browse online resources, such as John Hopkins data science MOOC to gain some basic understanding of data science or join the Harvard CS109 course. If you have huge interest into specializations such as machine learning, you may also look into the Stanford machine learning course by Andrew Ng, the founder of Coursera and now the Chief Scientist of Baidu.
If you have a data science or big data question that needs an immediate answer, you can always count on me to help you. Drop me a message on my contact page or leave a comment below.
Also, tell me what you think about data science below. Do you think everyone should learn it, no matter the work they do or industry they are in? Comment below. 🙂