This article is about one of the very popular and debated topics in the world of technology: Data Analysis. First, I would be familiarizing the audience with the basic concepts of Data Analysis and eventually presenting the subtle link between Data Analysis and Big Data.
What is data?
To introduce, Data is facts and figures, stored in the form of 0s and 1s on which one can perform predefined operations with the objective of deriving some meaning out of it or to draw some conclusion. Data is collected from various sources and analyzed to find solutions, find answers to questions, prove theories etc.
What is Analysis?
In simple terms, analysis refers to breaking a whole component into its elementary or fundamental parts for specific scrutiny.
In other words, it is a process of converting raw data with the goal of retrieving valuable information and deriving conclusions.
What is Data Analysis?
Data analysis is a process of searching, exploring, refining, converting and modeling raw data and transforming it into useful and meaningful information supporting decision-making. Organizations and tech giants spend a considerable amount of time and money for this process of data refining and analyzing to obtain some actionable insights.
The Link between Data Analysis and Big Data
SO what is Big Data? The simplest meaning that can come to our minds is, data that is ‘big’ is known as Big Data. You know what? You are not that wrong when you say that. The only thing you need to add or specify is in what manner is the data is big or huge. That’s all. To know about Big Data in a detailed manner please read the article (the link).
Big Data is actually extremely large chunks of data or data sets that can be analyzed, studied or examined computationally or statistically to expose patterns, trends, and connotations, particularly unfolding human behavior and relations. Hence we require various techniques for processing and refining such colossal volume of data. This process is termed as Data Analysis.
Concepts Associated with Big Data
The 3 common concepts associated with big data are volume, variety, and velocity.
- Volume: The term Big Data itself is associated with a dimension which is gigantic. The size of the data plays an important role in defining the assessment of the data. Hence, Volume or Size characterizes the data as Big Data.
- Variety: The following facet of Big Data is its variety. Heterogeneous and data from varied sources such as emails, videos, photographs, has more variety in it. As a result, the data has more facts and information even though the volume may not be that huge. This diversity of amorphous information also falls under the category of Big Data.
- Velocity: The next characteristic is the velocity. This refers to the speed of data generation. This type of Big Data may be flowing in from sources like commercial developments, application logs, social media sites, sensor, and mobile devices etc. The flow of data is enormous and incessant.
Fun Fact: Did you know that 1021 bytes is equal to 1 zettabyte or one billion terabytes forms a zettabyte?
Observing such figures one can easily comprehend why the term Big Data was coined in the first place and why such a tremendous amount of data needs to be organized and sorted. So, let’s move on to how on to that.
Classifications of Big Data:
Classifying and arranging such large chunks of data is necessary. One can generally find Big Data in 3 forms:
Any type of data that can be stored, retrieved and managed in a methodical procedure is labeled as structured data. Pre-determination of the datatype of these unprocessed data makes it easier to keep the data in a fixed format.
For example, employee data of an organization can be considered as a form of structured data(we find the name, DOB, Address etc of each employee stored in an organized manner).
This is, fundamentally, the opposite of Structured Data. Any data which does not have a fixed format or cannot be stored in a methodical manner is known as unstructured data. Besides such gigantic data size, such unrestricted data is very challenging to handle when it comes to processing and sorting it. For e.g. a file containing texts, images, audio files, video files etc.
Semi-structured data, as you would have guessed by now, falls right in the middle of Structured Data and Unstructured Data. In other words, it can contain both organized and unorganized data. data represented in an XML file is an example of a semi-structured data. (<name><age><profession>).
Data Analysis has found many practical uses and applications in almost all the fields. Let us have a look at some.
Data Analysis in Health Care:
A huge amount of medical data, stored over a long period of time helps in studying disease patterns, their symptoms, causes, and remedial therapies. Subsequently, this helps doctors and medical scientists in restructuring the whole paradigm with improved health care techniques and faster cure.
Data Analysis in Online Security:
Believe it or not, data analysis has even found a way to help us secure our online data. It is now used in identifying suspicious behavior on the internet, identifying ways to prevent such cyber attacks and also in reducing felony by monitoring the real-time actions of users.
Data Analysis Model
Now that we know what is data analysis, let’s explore the vital steps involved in analyzing any data. Gwen Shapira, a renowned Solutions Architect at Cloudera and an Oracle ACE Director, summaries six vital steps of data analysis which, she says, are consistent irrespective of the data existing and believes will work across organizations
- Deciding on the objectives: Set goals for the data scientists to find the direction in which the data is moving.
- Identify business levers: Do the analysis on the raw, unprocessed data to achieve the desired targets, using the right analysis technique.
- Data collection: Collect data from varied yet reliable sources (data has to be authentic). The data must have volume, variety, and velocity. This is the most crucial step.
- Data cleaning: The next crucial step is the analysis of data. Data accuracy must never be lost while data refining. This is something analysts must always keep in their minds. Also, the data processing should result in something meaningful that shall help in policy making and revenue generation.
- Grow a data science team: This team should include people from various professional backgrounds such as scientists, analysts, developers, testers etc. so as to gain the maximum possible output.
- Optimize and repeat: This is a step that gives you a chance to attain perfection in whatever you’re doing. You must always strive to give your best in the next cycle so as to maintain consistency and improve performance.
Benefits and Challenges
Statistics have shown that data analysis is a way for enterprises to process the data they need to make improved decisions. This helps them serve their clientele better, increase productivity and generate more profits. Hence, the merits of data analysis and processing are unfathomable. Some of these techniques are that they aid in crafting more effective advertising drives, attaining a better understanding of customers.
The Career in Data Analytics
The career in data analytics is increasing swiftly with the growing needs of data analysts in giant companies. One thinking of a career in data analysis must be proficient enough to do a profound research in the business. One must possess quantitative as well as qualitative interpretation skills and of course the inquisitiveness to study new things. One should be able to handle a problem as a critic and break it down, find meaning in it and crack into it in a logical manner.