Data Gathering
Data plays an important role in almost every field whether it’s trade, social science or any other field of science and study. In every field there’s usage of information and therefore we need to collect data. The process of gathering data, information in a way so that the data can be further evaluated is known as data gathering. After the data is gathered data scientists apply descriptive and analytic methods to the topic they have to work on. They do so for accessing the quality and content of the data.
The method of gathering data varies depending on the nature of the information and objective of the user.
In this article we are going to know about two things:
· What are the types of data collected by the user?
· What are the sources where we can find these datasets?
So let’s first know about the types of data that user gathers and finds suitable to work with. There are basically two types of data:
1. Quantitative data: This type of data is measurable because it is in the form of quantities or numbers. They are expressed in the form of size, amount, price etc.
2. Qualitative data: These data are descriptive rather than being statistical in nature as they deal with value (quality).
The fundamental or the most common source of data is the internal database of the data warehouse. Any data which lies inside of an organization is termed as internal data. Internal data is the most valuable information because it is the data which we have already collected, refined and pre-processed. And it is ready for analysis.
Let us talk about the most common sources of datasets.
1. Relational database: It is a model that stores data sets in the form of tables from where we can retrieve data easily. For this we don’t require to reorganize the tables.
2. Transactional database: This is a type of database that can be modified, updated and deleted. These types of data represent transaction in database. When any transaction is not complete then these data can roll back.
3. Data Warehouse: Data warehouse is the collection of data incorporated from various sources in any organization. It helps in taking and guiding management decisions. Suppose xyz organization has a business. And sees certain data indicating that a particular age group/people are making purchases more than the other age groups during a certain period of time. So, here we are getting a report.
4. Log Files: This is a file that keeps a record of processes, messages and communication between various software applications and operating system. We have different kinds of application like web applications, windows application and so many others. All of them generate log files. Log files also contains the IP address of the source from the request is generated.
There are many sources of data available at no cost. Some are public domain and some are copyrighted. One should be sure to verify and check the license. Every datasets are unique in their way. Here are some of the source where we can find freely available datasets:
KDNuggets: http://www.kdnuggets.com
UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/index.php
Wikidata: http://meta.wikimedia.org/wiki/wikidata
Kaggle: http://www.kaggle.com
We can download or clone datasets from these sources. Apart from these there are many other such sources.
Gathering of data involves adding to data holdings. We can:
1. Collect new data
2. Use our own previously collected data.
3. We can also reuse someone other’s data or we can acquire data from internet.
4. We can purchase data.
Why do we need to collect or gather data?
It is apparent from the name itself why we need to gather data. In data gathering we are concerned with the exact collection of data so that it can be used to answer queries, test supposition and evaluate results. Reasons for gathering data are as follows:
1. It helps in improving decision-making process: In almost every field one of the most vital decisions to be taken is the resource allocation and usage. If appropriate data is not collected then it will be quite difficult to make choice for using resources efficiently. Decision making becomes easier and smoother if there’s data driving them. Without data we can’t convert into useful information that could help in taking decision.
2. Data gathering facilitates in finding answers: This can be understood by taking an example of a kid. Suppose the kid sees a butterfly for the first time. The kid will be mesmerized by the beauty of butterfly. He will become curious about what is it. He will try to find the answer to his question. The kid will ask his parents about the butterfly by describing its features. H will be told it is a butterfly. Now the child has found an answer through the data he acquired. Similarly, we get answer to our queries through data. When an organization wants to learn about some insights about market so that it can bring new strategies, it gathers data and works accordingly.
3. Quality of results is improved through data gathering: When data is gathered then lots of insights are generated from data. This helps in carrying out certain research and monitoring the progress. By this we can make improvements and hence results will be better.
Why should we need certain techniques in data gathering?
1. Techniques or methodology helps in ensuring the accuracy and integrity of data being collected. By using the appropriate method for collecting data can result in collecting better quality of data. That means, we can collect error free and reliable data. This will help in generating quality information.
2. Data Gathering can be sometimes expensive. It can also cost us time. Therefore, we need to make sure we make the best of it.