FFilter - tool for screening open data for private/sensitive information before publication

Dmitry Kachaev's picture
June 1, 2012 - 19:30 -- Dmitry Kachaev
Revision #22ForkRecommend a Solution

One of the obstacles and concerns organizations have before publishing open data is unintentional disclosure of private information about individuals or other sensitive information.

 

 

Example: 

Example for unintentional disclosure during the publication of open data can be publication of peoples names, addresses, bank account and credit card numbers, etc.

Extra Credit: 

One of the most widely used tool for wokring with data is MS Excel. It will be great to have native plug-in that can run such checks/filtering directly from Excel application.

 

Diversity of data formats makes it important flexibility in way data is loaded into the tool. Support of widely used formats such as XML, JSON, CSV/Excel, ESRI Shapefiles, RDBMS/SQL, etc will make this tool easier to use.

 

Next Steps and Sustainability: 

Solution should be a simple tool/application that can take open data in any machine readable format and scan it for presence of private/sensitive data.

 

Preferably tool should be highly configurable to tune filtering/search for different problems (e.g. people related, bank accounts/credit cards related, special catchwords, etc)

Qualitative Impact: 
If solution is established, it may prevent potential leaks and may eliminate some risks of publishing open data. This will lead to more organization publishing open data and will streamline data publication/review process.
Quantitative Impact: 
This solution will have direct impact on organization and teams inside organizations that works on opening data. Indirectly it will lead to more data available for various data users: citizens, journalists, developers and other organization that may be interested in data.
Problem Definition Category: 
Event Group: