Two out of five homes worldwide have at least one smart device that is vulnerable to cyber-attacks, according to an assessment by Duke University. In the U.S. a new plan was unfolded by the Biden Administration in the form of a label that helps the consumers to gauge whether the device is secure and protected from bad actors trying to spy on you or sell your data.
The White House has announced plans to roll out voluntary labelling for Internet-connected devices like refrigerators, thermostats and baby monitors that meet certain cybersecurity standards, such as requiring data de-identification. Many smart home devices collect data and data is often packaged and sold on. in many cases, the data about individuals is identifiable.
This touches on the notion of data democracy, which can be defined as the “methodological framework of values and actions that benefit and minimize any harm to the public or the typical user.”
But does a voluntary code go far enough? And even then, what is needed for ‘true’ de-identification?
De-identification is the general term for the process of removing personal information from a record or data set. De-identification does not reduce the risk of re-identification of a data set to zero. Rather, the process produces data sets for which the risk of re-identification is very small.
When applied to metadata or general data about identification, the process is also known as data anonymization. Common strategies for de-identification include deleting or masking personal identifiers, such as personal name, and suppressing or generalizing quasi-identifiers, such as date of birth.
Although de-identification can be robust, the Biden plans are relatively weak in that the agreement between big industry and the consumer is a voluntary one and even within the data identification framework there will be differences in the overall security and privacy arrangements for a set of data.
For more privacy focused firs, what the most robust methods of deidentification? These include:
- Omission: The simplest method, just don’t include the data in the dataset (e.g. full names)
- Rounding or grouping: binning numeric or categorical data into larger groups (e.g. ages or occupations).
- Random noise addition: adding or removing random amounts to numeric or geographical data (e.g. dates or sample locations).
- Pseudonymization: this is performed by replacing real names with a temporary ID. It deletes or masks personal identifiers to make individuals unidentified.
Essential to the success of such measures is with consumer awareness. The more the consumer knows and the more the consumer engages in practices that seek out products where de-identification is embedded the more likely privacy practices will become the norm.
The average household in the U.S. now has more than 20 devices connected to the Internet, each one is potentially collecting and sharing data. For instance, fitness trackers measure your steps and monitor the quality of your sleep; smart lights track your phone’s location and turn on as soon as you pull in the driveway; Video doorbells collect images of who’s at the door — even when you are not home.
To achieve true data democracy the collection of data by the big technology companies needs to be freely, and consciously, given by consumers by consent or withheld by default. If data is to be given, it must be rendered fully anonymous and for this de-identification is essential.
