This is the first in a series of contributions from Wizeline's very own Data Science team! Learn more about the authors below, and be sure to check back every other week for posts that aim to demystify the complex, fascinating field of Data Science.
Data Science is the science of learning from data, with all that this entails. This definition from renowned mathematician David Donoho—while broad—is spot on.
At a high level, Data Science (DS) is the combined approaches to taking sets of data, analyzing them and enabling people to learn from the accrued insights. There are different meanings to this “learning”, such as using descriptive methods to understand the past, and using other methods to predict what will happen (think the Netflix recommendation algorithms). DS has become the new paradigm to drive business value, and the possibilities are endless.
At it’s core, Data Science is a scientific-driven approach to data analysis. Sounds redundant, but it’s worth highlighting that DS is closer to a science than, say, project management or similar business practices. It includes the following tasks, in no particular order or importance, depending on the project:
- Data Collection: Collection or creation of data from one or various sources, e.g. a manual input, web scraping or a new experiment’s output
- Data Preparation: Conversion of the recorded data into something actionable and meaningful (this is also referred to as Data Munging)
- Data Exploration: Analysis of data to summarize their salient aspects, which in turn leads to hypotheses formulation
- Data Modeling: This may include prediction, forecasting, and other related algorithms falling under machine learning, deep learning, statistics, mathematics, and related fields helping us understand new observations
- Data Visualization: Tells stories about the data, specifically the outcome of analysis
Data Science is often thrown around as a catch-all buzzword. Some definitions reduce it to “a new form of business intelligence” incorporating big data, or as a rebranding of analytics, statistics, or data mining. Each of these areas of knowledge play a role in DS, but may or may not come into place within a given project. DS is not exclusively any one of those.
A non-exhaustive list of the most common fields within Data Science includes computer science, statistics, modeling, data analytics, mathematics, and domain knowledge. Although you may have read that big data is a characteristic of DS, it is not a requirement, since all data presents challenges regardless of its volume. It is true that for most analysis, the more observations the better, but this should be addressed on a project by project basis.
Data Science is by no means new, as it was initially proposed 50 years ago with the same name. In the publication ‘The Future of Data Analysis’, it was conceptualized as a direct evolution of statistics with six basic areas, including computing with data, DS tools and techniques assessment, and theory to support the tools generated. Although DS is now a relatively well known field, it will require much attention, knowledge and talent for progress to be made.
Stay tuned for our next post: Why do you need Data Science? Join the team, see all open positions here!
About the authors:
Ana has held Data Science and Engineering roles at Accenture and Tec de Monterrey. She earned a M.Sc. in Big Data Science from Queen Mary University of London, and her main interests are machine learning, network analysis and user behavior modeling.
Juan spent several years at (HP) Labs as a Research Assistant. He earned a M.Sc. in Statistics and Operational Research at the University of Edinburgh with distinction. He is currently a lecturer in the Industrial Engineering department of ITESM.