We’ve previously announced on Twitter our ideas for democratising data, specifically Hansard, but we want to go into a little more depth here about how we’re trying to achieve this. This may get technical, but we’ll break it down as best we can.
Finding meaning in unstructured text is one of the big real-world challenges. Almost everything ever produced – and being produced – by humans is in the form of text, whether that be the latest bestselling novel, company reports, tweets, or debates in the Houses of Parliament.
Search “Brexit problem” on the Hansard website, get “Light Pollution”…
Search engines are great at finding keywords in documents, and there has been a move towards entity linking with Google’s Knowledge Graph that shows snippets when you search for a person or place. But if you want to search something very specific, for example, “How many of those on the starting grid of the Australian Grand Prix finished?”, you’re out of luck.
There is no knowledge extracted from the information contained within the text. To a machine, it’s just bytes and characters.
Enter Natural Language Processing (NLP)
In recent years, there has been a push towards the use of neural networks and statistical modelling techniques to try and make some sense of text. It is by leveraging these new open-source technologies that myPolitico are attempting to make Hansard more meaningful.
When you search “Brexit problem” (it’s just an example) you should be able to see relevant results. Not just those that contain “Brexit” and “problem”, but results based on the knowledge contained within the debate. We could go one further and create our own Knowledge Graph from the debates and then transform natural-language queries into a form understood by knowledge graph engines such as Neo4J by parsing the question and returning results (or direct answers) that way. Exciting stuff!
This is quite a hard problem to solve, and it is by no means complete in the AI tech world, but we can still make an impact by bringing these technologies closer to the public where it is needed: understanding what happens in our political system.
If Brexit has highlighted one thing above all, it’s that the electorate wants greater transparency and access to information. The tech is there, but the solutions aren’t being implemented.
We’ll be detailing over the next few weeks each part of the expected pipeline (not finalised yet), but we hope that you find it interesting coming along on this journey with us.