In data science and big data you’ll come across many different types of data, and each of them tends to require different tools and techniques. The main categories of data are these:
- Structured data
- Unstructured data
- Natural language data
- Machine-generated data
- Graph-based data
- Audio, video, and images data
- Streaming data
Let’s explore all these interesting data types.
1. Structured data
- Data that is stored in a defined field inside a record and is dependent on a data model is referred to as structured data.
- Because of this, storing structured data in tables inside databases or Excel files is frequently simple.
- Database management and querying are best done with SQL, or Structured Query Language.
- Additionally, you can encounter complex data that is difficult to store in a conventional relational database.
- One example is hierarchical data, like a family tree.
- The world isn’t made up of structured data, though; it’s imposed upon it by humans and machines. More often, data comes unstructured
2. Unstructured data
- Unstructured data is data that isn’t easy to fit into a data model because the content is context-specific or varying.
- One example of unstructured data is your regular email (figure 1.2).
- Although email contains structured elements such as the sender, title, and body text, it’s a challenge to find the number of people who have written an email complaint about a specific employee because so many ways exist to refer to a person, for example.
- The thousands of different languages and dialects out there further complicate this.
3. Natural language data
- Natural language is a special type of unstructured data; it’s challenging to process because it requires knowledge of specific data science techniques and linguistics.
- The natural language processing community has had success in entity recognition, topic recognition, summarization, text completion, and sentiment analysis, but models trained in one domain don’t generalize well to other domains.
- The concept of meaning itself is questionable here.
4. Machine-generated data
- Machine-generated data is information that’s automatically created by a computer, process, application, or other machine without human intervention.
- Machine-generated data is becoming a major data resource and will continue to do so.
- The analysis of machine data relies on highly scalable tools, due to its high volume and speed.
- Examples of machine data are web server logs, call detail records, network event logs, and telemetry
5. Graph-based data
Graph-based data represents entities and their relationships as nodes and edges in a graph. This makes it a powerful tool for modeling complex relationships between entities, such as social networks, financial transactions, and knowledge graphs.
For example, in a social network, people are represented as nodes and their friendships are represented as edges. This allows us to analyze things like the spread of information, the formation of communities, and the influence of individuals.
6. Audio, video, and images data
Audio, video, and images are collectively known as multimedia data. This type of data is characterized by its rich and complex nature, and it can be challenging to store, process, and analyze. However, it also has the potential to provide valuable insights that other types of data cannot.
Here are some examples of how multimedia data is used:
- Computer vision: Analyzing images and videos to understand the content, such as identifying objects, people, and actions.
- Speech recognition: Converting spoken language into text.
- Natural language processing: Understanding the meaning of text and speech.
- Medical imaging: Analyzing medical images to diagnose diseases.
- Entertainment: Creating movies, games, and other forms of entertainment.
7. Streaming data
Streaming data is data that is generated in real-time and continuously over time. This type of data is becoming increasingly common, due to the growth of the Internet of Things (IoT) and other sensors that generate data constantly.
Here are some examples of how streaming data is used:
- Fraud detection: Analyzing financial transactions in real-time to identify fraudulent activity.
- Traffic monitoring: Monitoring traffic flows in real-time to optimize traffic management.
- Social media analysis: Analyzing social media posts in real-time to understand public opinion and trends.
- Industrial automation: Monitoring and controlling industrial processes in real-time.
- Scientific research: Collecting and analyzing data from scientific experiments in real-time.
Reference:
DavyCielen, Arno.D.B.Maysman, Mohamed Ali, “Introducing Data Science” ManningPublications, 2016
0comments:
Post a Comment
Note: only a member of this blog may post a comment.