So You Want to Be a Data Scientist…


Over the past 10 years, people’s access to the Internet has reached an all-time high, especially after increases in Internet speed, which paved the way for users to access more websites, faster and cheaper.

Almost every aspect of our daily lives takes place online, from keeping in touch with loved ones on Facebook to banking to restaurant and hotel reservations. Our dependence on the Internet, from surfing the news to searching for products, has resulted in a considerable amount of data that is a gold mine for marketers, health care providers, insurance companies and many other businesses and nonprofits.

The rapid and broad expansion of Internet use has left us with a tremendous amount of data that tells us a lot about ourselves. The smart way of dealing with this data is to analyze and mine it to extract important knowledge. Knowledge that can help us think and act smarter in the future.

Websites like Facebook, Amazon and Yelp! deal with massive amounts of consumer data that help advertisers and businesses make smarter decisions regarding the services they offer.

The task of extracting knowledge from a set of huge data is part of the job description of a Data Scientist. Data Scientists analyze large sets of data and build models that can determine consumer trends.

For instance, a Data Scientist who works for a hypermarket can discover trends and patterns that have to do with consumer purchasing behavior, like if you buy product X, you will likely buy product Z – customers who buy milk are most likely to buy cookies with it.

The Data Scientist informs top management of this trend and the cookies will be placed next to the milk section to increase and encourage sales.

Other examples, like recommendation systems, are also based on models created by Data Scientists who use every customer’s purchasing history to build a model that suggests products for the customer to buy, as in Amazon or eBay.

Data Scientists analyze all types of data, not just consumer data. Health care analytics is an important task that requires Data Scientists to analyze large amounts of health care data to determine how certain factors effect the overall well-being of patients, such as how appointment wait times affect their overall health condition.

As exciting as the Data Scientist role seems to be, it requires lengthy study and practice. First, you don’t have to be an engineer or a statistician to become a Data Scientist. But you typically need to go through three phases.

Phase 1: Research

Start by reading introductory data mining books to understand the algorithms behind data analysis.

Phase 2: Learn a Data Analysis Programming Language

Learn how to use certain languages like Python Analytics, R and Weka. Gaining experience working with Python Nltk is a plus. Also learning how to use enterprise miners like SAS is really beneficial.

Phase 3: Application

Start applying what you have learned on many different sorts and types of data. You can find online data repositories that provide free data for you to use and experiment with. Websites like Kaggle are a great training ground for data scientists.

I’m convinced that the demand for Data scientists will increase as we move to the cloud. As more and more applications move there, more data that needs to be analyzed will come along with it, thus more Data Science jobs will be created.



WE SAID THIS: Don’t miss Q&A: Bassem Fayek, Co-Founder of