Data Science can be understood as the multidisciplinary field to acquire knowledge from data. It is called multidisciplinary because it uses techniques and theories that are drawn from multiple disciplines — software engineering, coding, mathematics, idea of statistics and domain.
So, it can be said that data science combines software engineering and statistics along with domain expertise for end-to-end analysis of data (basically large data sets) and for solving analytically complex problem. End to end analysis of data means that we are not only dealing with creation of data but also, the processing and storing of data and the insight of dealing with scenario where the data may be small, big or unstructured. This type of analysis is done by Data Scientists.
A data scientist is the one who works in the field of data science by making predictions, using statistics and machine learning. Data scientists also develop data product. In the field of data science we focus more on data than on coding.
In our daily life, we come across huge amount of data flow. Such as, traffic on road, people interacting with each other and many more. Whatever we are doing at present is a creation of new data. We also come across unstructured data- the feed on social sites, digital images, videos, audio files etc. So, these data are not in the structured manner. They don’t have proper format.
Earlier we were not using these data or say, the data flow was not utilized. Also, we couldn’t do predictive analysis based on such data. Now, data science has enabled us to the make decision and predict from the present report. We can work on the present situation by taking suitable actions. One thing to be remembered is-In a typical business field we only deal with the structured data but in the field of data science we also deal with unstructured data.
How knowledge is acquired from data?
Data scientist works on raw database level for acquiring knowledge from data. First of all data exploration and investigation is done in order to find out leads. The data can be both structured and unstructured. The leads are basically patterns within the data. For this analytics is required. They dig up meaning from the given data. This digging up requires both, machine learning and statistics. And after that operations are performed accordingly.
Why do we need Data Science?
· Pattern Discovery: This can be understood with an example of XYZ sweet shop that experiences huge number of purchase during the month of October and November because of the festive season. Furthermore, there are some specific kinds of sweets that are being purchased during those months. So, here is a hidden pattern that we can get from data of the shop. We can acquire knowledge by such patterns and can take actions that can be useful for improving business outcomes. Pattern discovery is basically the monitoring of the interest of user and then making decision accordingly.
· Product recommendation: It is like- products that are search frequently on e-commerce site can be monitored to give recommendations to new users. Also, when we are connected to person A on a social site and if that person A is being connected to a person B, then we can get a recommendation for connecting to person B. Prediction can also be used for fore-detection, if somebody is trying to get an unauthorized access to our data, then it can be known. This can be predicted by knowing their visiting pattern to our network.
· It helps in market analysis: By using data science we can find out target customers. We can find about who requires our product or services. On this basis, marketing strategies can be made. For ex- it will be relevant to promote books or course to a student.
· Prediction: Prediction in data science is made by pattern discovery. The best example is of sports. During cricket match we can see lot of data on our television sets showing runs made by any player, over, wickets and more. By using such data, predictions are made about how the team is likely to play or what will be the further situation.