News

Industrial

Navigate:Home > News > Industrial > Details

How should the sensor be handled?

Update:2021-09-07 Views:5022

Why it is necessary to purify sensor data, why it has a wide impact on all aspects of system design.

The sensor is a bond that fascinates the digital world and the physical world. However, getting valuable data is not easy. In fact, many designers who have just entered the Internet of Things field do not have enough psychological expectations for the confusion of sensor data.

Leading customers to believe that the large amount of “wrong” data they saw was not due to the failure of the sensor itself, and became the daily work of MtientLab, an Internet of Things motion sensor company. The reason for the data error is that there are some key steps in the system design that integrate these sensors for data cleansing.

“I am dealing with these complaints every day,” MbientLab CEO Laura Kassovic said in a recent speech. She also warned engineers that they must correctly understand the difficulty of training the Internet of Things through machine learning. “Over the years, tools and hardware have made great strides, but there is still little improvement in the basic understanding of data processing.”

“I welcome the use of sensors to solve problems and study complex topics,” Laura Kassovic said. "This is a brave behavior, very interesting, and the idea is very open, but it is also very difficult. Many people do not use the correct method, fail to solve the actual problem, and blame our failure on our sensor. I feel helpless. Be aware that the sensor does not lie, it has no prejudice, the sensor data is always correct. The user abuses or misunderstands the sensor data, but gives the pot to the sensor!"

In fact, sensors are not always easy to use, and all the data generated by the sensors is not very valuable. The key is to figure out what data is valuable, separate the data, and then discard other garbage data.

According to Aart de Geus, chairman and co-CEO of Synopsys, "Most sensor data is not the key to system value. But there are some exceptions, such as artificial eye. Some can be classified into artificial intelligence devices, such as Watches for various measurements. What can these measurements reflect? Can you predict the onset of a heart attack? If so, the value of this type of data is very high, how much are you willing to pay for it? If it can be prompted a minute ago You, you can write down your last words to your wife, 'Thank you, I love you'. If you have an hour of early reminder time, you can call the emergency center, if there are hours, the value and risk of the data Will change again."

In various applications, data appears in a variety of different forms, considered to be clean data in one application scenario, and may require further cleanup in another scenario. Some data cleanup can be done locally, while other data can be cleaned up in the data center.

“If you have a facial recognition application, only certain authorized employees are allowed to enter the building,” Geus said. “You update the AI network in the edge device every month to ensure that the AI network can recognize all faces. Because there are a lot of people entering the building at any time, this work is not easy. However, this security level is not too high. Not all data needs to be updated at any time."

In other applications, the data needs to be cleaned up in real time. Recently there has been a tragic case that is ready. On October 29, a Boeing MAX 8 aircraft crashed in Indonesia’s Lion Air, and all the crew were killed. The current investigation may be moving in the direction of targeting the sensor. The recovered black box data shows that the two angle of attack sensors (AOA) data are inconsistent during the flight. Obviously, half of the data is incorrect, which is enough to mislead the aircraft's anti-stall system and turn the aircraft down until the machine crashes.

It is still too early to judge what happened in this flight accident. "Probably not just a sensor problem, after all, there are many links to this data processing system," said Mahesh Chowdhary, director of ST's strategic platform and IoT Center of Excellence. “First a sensing part, then a connecting part, and finally to the calculation part. There are algorithms that can look at the sensor data and determine the direction of the aircraft. Multiple functions must work in concert to provide information about the direction of the aircraft.”

Of the large amounts of data provided by sensors, not all data is useful, and even data that we believe is valuable may be contaminated or inaccurate. From seemingly simple IoT systems to more complex safety-critical systems, when sensor system design fails, can we simply identify data—especially contaminated dirty data—as the culprit? How do you judge the sensor is bad? The data is wrong? Maybe the logic of the algorithm or the firmware that reads the data has failed? In order to pick out the real cause of failure, we must first agree on what is “dirty data”.

"This is an ambiguous area. Is the sensor working properly? Well, it's hard to say that it doesn't work as you think. Is it a user-induced error or a sensor failure? I found that the current definition of dirty data. It's a very vague concept. Sometimes, because of a flaw in the user's system, if your sensor is working properly, the user system won't work." Robert Pohlen, product line director at TT Electronics, said that this is a design sensor. A company that helps customers create a variety of sensor-based systems.
Data processing path

To figure out the difference between clean data and dirty data, it's important to see how the data reaches point B from point A.

In summary, the raw data of the sensor requires back-end processing. The base sensor converts the original signal from one form of energy to an analog or digital signal, which may or may not require an external power supply. The original original conversion was derived from real-world analog signals: force, heat, light, magnetism, and sound. After the sensor is switched, it continues along the signal path inside the sensor or on the printed circuit board. If necessary, the analog signal can be converted into a digital signal through adjustment and amplification. The data is then sent to a microprocessor or other type of computing unit, the noise is further filtered by the algorithm, and the relevant information is extracted in the manner required by the application.

The computing architecture is just beginning to investigate how to do this data processing efficiently. Some data needs to be preprocessed on the edge device, and other data is sent to a more powerful server for cleanup.
"Edge computing will play a huge role," said Robert Blake, president and CEO of Achronix. "The basic building blocks are all there. We need to figure out how to effectively move the sensor data in any format, how the memory hierarchy involved in the data movement process is designed so that the best computing performance can be achieved. In a word, how is it? Improve the computational efficiency of sensor data."

Some operations require immediate action based on the data used to identify trends over time. Extraction of such data is critical, and it is important to clean up data that has lost value. Considering the existence of multiple types of such data, and in some cases, multiple data types are required to model the physical world or to determine whether someone should be immediately medically ill, this data extraction and removal effort is even more difficult.

The data may also be clean at first, but dirty after an update or virus intrusion. Rambus researcher Helena Handschuh said: "On a global scale, all components need to be as secure as possible, so you want to build trust from the hardware. After the component is safely launched, the communication data itself has a certain degree of credibility. Some systems may also have unsafe unknown components, which requires intrusion detection and software analysis of the data to see if there is any damage to the data and components. In the car, we want to detect those that give abnormal or strange data. Parts, this is not only a component safety issue, but also personal safety."

Dirty data must be cleaned up, but where it gets dirty and how it gets dirty determines the next action. Whether the sensor itself produces dirty data, designers need to take this into account at the outset. “Resolving sensor problems requires a lot of expertise,” Kassovic said. “It requires designers to understand sensors at the hardware level, understand the data extracted from the sensors, and have experience in software (algorithm) development.”

For example, from the data understanding level, do not confuse accelerometer data with GPS data. “The accelerometer only measures the acceleration of the body,” she said. “Most people can't understand why it can't replace GPS. GPS gives the absolute position of the body in space. Each application is unique enough to require a unique method to extract the correct final data most reliably. Many users It is believed that the data from the sensors should be identical to their university textbooks, which is not the case.
Real-world sensor data is not perfect. When you open your physics, engineering or computer science textbook, you will see that the book is full of perfect motion curves. However, when you get data from the real world, the actual curve looks very different from the perfect curve in the book. The real world is full of noise and errors. ”

Each application is unique enough to require a unique approach to extract the correct final data most reliably.
Understand the data

So, how to deal with dirty data? The first step is to understand and interpret the data output by the sensor. The accuracy of sensor data is often relative, not absolute. Sensor reading data in the real world is not perfect.
Sensor manufacturers are concerned with the basic issues of noise, filters, and algorithms, and provide system designers with appropriate help tools. Some system designers and platform vendors stand in the perspective of the system's client, focusing on whether the data in their database is valid, and they provide a monitoring tool to help identify whether the data is in error.

“I found dirty data in the analog signal link and the digital link data was clean,” said TT Electronics' Pohlen. “Many different sources induce noise. You can pick up electrical noise in the harness, and components with degraded performance can also generate electrical noise.”
In Pohlen's eyes, noise caused by an external influence on the actual sensing mechanism is not considered dirty data. "For example, for an optical sensor, if there is an ambient light source, you can't think that it is dirty data because the data it gives is not what you really want to measure, because it is correctly measured whether it is a natural light source or not. brightness."

Uncalibrated sensors typically produce more dirty data than calibrated sensors. "The dirty data we usually refer to basically refers to uncalibrated raw sensor data, as well as data with a lot of noise on the signal," said ST's Chowdhary. "In addition to the physical components that sense signals using certain phenomena, such as measuring Coriolis acceleration to detect the rotation of equipment, people or mobile phones, there are signal conditioning units in the system. These signal conditioning modules can work under different conditions, too It can operate in low power mode to minimize sensor current consumption. However, if operating in low power mode, the noise of the sensor data will increase because it is obvious that the power consumption for signal conditioning is greater. The cleaner the data."

“Considering all these different levels, we can define the dirty data as the uncalibrated sensor output and the sensor data affected by the noise, whether the noise comes from the signal conditioning module or external interference,” says Chowdhary. He also classifies external disturbances (such as when the magnetometer is affected by an external magnetic field) into dirty data.

Even in the same batch of sensors, there may be manufacturing variations in different sensors. Once deployed to the application site, the sensor can be damaged. For example, ground crews can damage aircraft sensors and even critical angle of attack sensors. The sensor may age and deteriorate, so it needs to be recalibrated periodically.

You can understand the data from a business perspective.

“In a sensor-based device network, dirty data may be generated by a single or multiple problems. The problem may be due to time series hopping, incorrect measurement of the sensor unit itself, date/time not being calibrated in time, between sensors Inappropriate associations, incorrect aggregation of cross-domain data points, etc. It may also be considered dirty data simply because the resulting data does not meet business objectives, is unstable or cannot be used," said Pratikh, director of product marketing at Liaison Technologies. The company helps put the available data on a platform for business use.

Others also give their own specific definitions of dirty data. "Dirty data is data that is reported by your device in the correct format, but is somewhat ineffective. We can't even explain the data," said James Branigan, co-founder of IoT system integrator Bright Wolf. . "You can read it, but you will find that some data is actually completely invalid."

In the smart Internet of Things and the Internet of Things, the risk of dirty data is that it can pollute the company's large database, trigger other dangerous behaviors, and waste money. “Dirty data can be a problem because in all these IoT systems, when you look for value in the data and perform some programmatic analysis on the input data, you will analyze the results to some extent. Feedback on the enterprise system," Branigan said. "After analyzing and feeding back the data, something interesting happens. But if you build the analysis on a bad assumption - dirty data - then garbage input will inevitably lead to garbage output. Dirty data may bring you Real damage, because these virtually invalid data can cause some automated operations to be disabled, resulting in real economic costs."

Branigan found three dirty data. "The first physical fault from the sensor. It can neither detect changes in the environment nor detect its own faults, although it will still provide you with well-formed data, but this data is completely garbage. The second comes from the device. Software error in running firmware. Even newer versions of firmware may produce well-formed but completely wrong data. The third dirty data is really awful, you need to understand the specific machine operations in order to understand how to interpret the incoming data. If you don't understand this, you will interpret the wrong data as valid data, but the rest of the system will give a different explanation."
So, can dirty data be washed out?

Data cleaning tool

There are many tools to help clean the data. “There are a lot of great tools out there, like the popular Matlab, Labview and Python. Our own MetaWear API can help implement data filters in all major coding languages. I usually advise customers to use them most familiar. Tools, not forcibly selling our own APIs. Python is a great tool with many machine learning libraries that are open source, easy to use, and well documented,” said Kasovic of MbientLab. MbientLab also uses Bosch's FusionLab because they not only provide their own sensors, but also sell Bosch sensors.

MEMS market leader Bosch Sensing Technologies also provides sensors and libraries for sensors that help sensors detect, interpret, monitor, perceive, and predict intent, writes Marcellino Gemelli, who is responsible for MEMS product portfolio development. STMicroelectronics offers libraries, drivers and sensor setup tools, as well as microcontrollers that help simplify design.
Finding professionals with the right expertise is not easy. "You can't send a software engineer to work as a firmware engineer," Kassovic said.

From a business perspective, it will take too much time for data scientists to participate in cleaning data. "Now, all kinds of machines are constantly generating data, which may produce new levels of dirty data that are more complex than human-generated dirty data, which will be the focus of dirty data cleaning," Branigan said. “There are a lot of data cleaning tools in the big data market, but these tools are all centered on data scientists. For a relatively static data set, data scientists clean it, analyze it, and then find something interesting. This way it responds to humans. The speed at which data is generated is really efficient, but it's hard or even impossible to cope with the speed at which machines generate data. You end up with an automated system that takes real-time data from the device, analyzes it streamlined, and then outputs the results to the enterprise. In a business system to automate business operations."

Turning the sensor to digitization may help. "Digital communication is definitely good. The sensors that you get and collect high-quality data, is its noise generated by simulation? I see the natural trend of the digital shift in the sensor industry, where you can build some error checks. Function. There is a certain noise interval in the digital system. If these noises appear in the digital circuit channel, who cares about it? Because the data is either 1 or 0, it is almost impossible to reverse the data. You can join the data transmission. The verification mechanism, if the verification fails, you can throw the data away," Pohlen said.

"Although the raw data may be filtered, compensated, and corrected, in most cases, the user's operation is limited," Marcello Gemelli, who is responsible for the business development of the Bosch sensor MEMS portfolio, pointed out in a recent article.

“The first step in overcoming these challenges is to implement and integrate the appropriate cleaning tools,” said Parikh of Liaison Technologies. “These cleaning tools not only deal with data quality, but also verify data source identity, credibility, and time series from a project perspective. Each project has its own unique requirements. Project implementers can apply some common technical means, but must Be prepared to make large-scale customizations as needed to achieve business goals."

Liaison Technologies provides services such as data cleaning, filtering, management, and deduplication testing. “One of the key functions we provide is tracking the pedigree of the data, the link tracking from the original source of the data to the cleaned structured data.”

For safety critical systems, redundancy can be an excellent and expensive solution. TT Electronics' Pohlen said, "Everyone wants to achieve a higher ASIL rating, but do they have to commit to providing more sensing capabilities? Similarly, the ASIL rating can also be attributed to whether the data is correct and how to interpret these on the back end. Data, unless you can do some kind of self-diagnosis in the sensor, the best way is redundancy."



Applications|Products|Solutions|News|Partners|Supports|Contact
Copyright © 2002-2025Copyright @ 2017 Sumzi Electric Technical Co,.Ltd Of SHENZHEN . All rights reserved.No.11091659 for ICP Registration Purpose in Guangdong