The business and IT worlds are drowning in a vast sea of data. Data is everywhere. It is rushing at us from every direction like a 50 foot tsunami waves about to consume us. So, what do we do? How do we survive? Can we make sense from all this data and how do we derive value from it?
The answer lies in the hands of a competent data scientist. A modern wizard of data, these experts are masters of several diverse disciplines – math and statistics, programming and database management, subject matter industry knowledge, and communications and data visualization. They can conjure up answers to well formed questions and cleanse both structured and unstructured data sources with ease to uncover answers.
To master data, we need first to understand data. We must seek answers to questions and explore data looking for patterns, trends, and aberrations, within the data sets. We need to embrace the data and manage it by twisting and squeezing it until the tiniest of drops of information drip out of it. Once we have wrung out some droplets of information, we combine these perils with other extractions to form sets of knowledge. If we are lucky, then we can combine several pools of knowledge to formulate a pond of wisdom.
But what is data science?
Data science is detective work. You pose a question and then tease the data to see if it will provide an answer. Often there are more than just one question. Formulating the questions to ask of the data is the hard part.
A data scientist must be curious by nature. They must be asking “why?” about everything, not just related to the data, but in all aspects of life.
Once you have your data sets, and you pose your questions, and you analyze the angles and massage the data, you get some answers. These answers may not be what you expected nor may they be even useful. It is an iterative process of trail and error. Being a good data science detective takes patience and persistence to try over and over to learn something new.
You look for cause and effect, you ponder correlations, and collaborations between data sets. Relationships between data sets often yield the best understanding. Data can be found from legacy sources stored for future use. It can be used from external sources like weather and statistical data. And, it can be real-time data that flows from systems and devices. Data can be purchased or found for free from government agencies.
This is data science.
The role of a data scientist is possible today because we have access to:
- massive volumes of data
- data from a vast diversity of sources
- an abundance of algorithms to test the data
- software tools to apply these algorithms
- cloud access to use compute, storage, and analytical resources that we could never afford on our own
- low cost storage to host the data
There’s never been a better time to be a data scientist.
In data science, we classify data by the ‘four Vs’. These four Vs are: Volume, Variety, Velocity, and Veracity. A fifth V may be Value. After all, it is the value that we seek from the data that drives all of our data science efforts in the first place.
If capitalizing on big data depends on hiring scarce data scientists, then the challenge for managers is to learn how to identify that talent, attract it to an enterprise, and make it productive. None of those tasks is as straightforward as it is with other, established organizational roles. Start with the fact that there are no university programs offering degrees in data science. Some more progressive universities are now starting undergrad and graduate programs aimed at data science but they are few and just beginning, so it will take time for their impact to be realized. There is also little consensus on where the role fits in an organization, how data scientists can add the most value, and how their performance should be measured.
The most important characteristic of a data scientist is curiosity. You need to be curious and question why things happen, what things mean, what might happen next, and are these findings important or able to provide meaningful value.
Data scientists realize that they face technical limitations, but they don’t allow that to bog down their search for novel solutions. As they make discoveries, they communicate what they’ve learned and suggest its implications for new business directions. Often they are creative in displaying information visually and making the patterns they find clear and compelling. They advise executives and product managers on the implications of the data for products, processes, and decisions.
Given the nascent state of their trade, it often falls to data scientists to fashion their own tools and even conduct academic-style research. Yahoo, one of the firms that employed a group of data scientists early on, was instrumental in developing Hadoop. Facebook’s data team created the language Hive for programming Hadoop projects. Many other data scientists, especially at data-driven companies such as IBM, Google, Amazon, Microsoft, Walmart, eBay, LinkedIn, and Twitter, have added to and refined the tool kit.
What kind of person does all this? What abilities make a data scientist successful? Think of him or her as a hybrid of data hacker, analyst, communicator, and trusted adviser. Most of all, the best data scientists are master storytellers. They can communicate the insights that they concocted from the data and share it freely with the senior leadership who sit around the table. The combination is extremely powerful – and rare.
Some people call Data Science “glorified statistics”, while some call it “the new electricity?” But the more I delve into this, the more I see data scientists as masterful magicians creating magic from the rawest of data that is often just a confused mess to the rest of us. They pull out insights like a rabbit from a hat and help to craft strategies and corporate directions. They feed wisdom to the senior leadership team much like Merlin guided King Arthur.
So, are data scientist sexy, or is this job sexy? Many believe that ‘thinking, awareness, knowledge, technical skills, IT, and general ‘nerd’ capabilities’ are the new definition for sex appeal. Well, that is my story anyway, and I will stick to it. Being old, grey haired, balding, and fat is not as appealing as I was thought it to be. The data was wrong in this case so this new hypothesis is now formed. lol Perhaps you should decide on your own.
But, this is a hot job right now and if I was starting my career over again today, there is no doubt that I would dream of being a data scientist.
About the Author:
Michael Martin has more than 35 years of experience in systems design for broadband networks, optical fibre, wireless and digital communications technologies.
He is a Senior Executive with IBM Canada’s Office of the CTO, Global Services. Over the past 14 years with IBM, he has worked in the GBS Global Center of Competency for Energy and Utilities and the GTS Global Center of Excellence for Energy and Utilities. He was previously a founding partner and President of MICAN Communications and before that was President of Comlink Systems Limited and Ensat Broadcast Services, Inc., both divisions of Cygnal Technologies Corporation (CYN: TSX).
Martin currently serves on the Board of Directors for TeraGo Inc (TGO: TSX) and previously served on the Board of Directors for Avante Logixx Inc. (XX: TSX.V).
He serves as a Member, SCC ISO-IEC JTC 1/SC-41 – Internet of Things and related technologies, ISO – International Organization for Standardization, and as a member of the NIST SP 500-325 Fog Computing Conceptual Model, National Institute of Standards and Technology.
He served on the Board of Governors of the University of Ontario Institute of Technology (UOIT) and on the Board of Advisers of five different Colleges in Ontario. For 16 years he served on the Board of the Society of Motion Picture and Television Engineers (SMPTE), Toronto Section.
He holds three master’s degrees, in business (MBA), communication (MA), and education (MEd). As well, he has diplomas and certifications in business, computer programming, internetworking, project management, media, photography, and communication technology.