Data that can assist in tracking corruption is often not clean because of mistakes made when entered manually, low quality of an original source, or scanning non-digital documents. This requires an online platform that can import generic data sets to be editable, tagged, and also allow for tracking of all change history to preserve integrity.
There is no online platform available to solve this problem -given there are hundreds of thousands of records- that easily allows users to import a large data set, and then share permissions with a large group of trusted volunteers to help clean up that data. Integrity is important, so changes and source of changes (the author[s]) needs to be recorded and tracked for every field and row. Finally, it also needs a tagging facility so that users can easily tag rows for various uses such as categorization issues, either by data, or type of information to be extracted.
We need it to be user friendly enough, so that normal users can import the data (often information from spreadsheets or database table dumps) and manage communities of users who clean up this information. They should also be able to easily export this cleaned up data again for others to use. A solution to this problem can help the government clean up data to release it to the public. Citizens can also use it to clean and organize data to improve transparency or research corruption issues.
Some of type of information we can extract from an unclean database to detect corruption:
If we can clean up this data to be more accurate while preserving integrity by logging all records of changes in terms of classifications of government departments,misspellings and tagging eg. names of politicians, government vs private sector projects etc. we will be able to extract and map out corruption and abuse of power for Malaysian construction projects.