purplecat

I seem to have a long list of things I want to blog about, hopefully I'll actually manage to get down to it properly this week!!

Anyway to start another of my (obviously not remotely weekly) 100 papers in AI.

100 Current Papers in Artificial Intelligence, Automated Reasoning and Agent Programming. Number 6

Vivi Nastase and Michael Strube, Transforming Wikipedia into a Large Scale Multilingual Concept Network, Artificial Intelligence (2012) (In Press)

DOI: 10.1016/j.artint.2012.06.008
Open Access?: Not that I can find.

Knowledge acquisition isn't really my field but this paper caught my eye largely, I confess, because it had "Wikipedia" in the title.

It's widely recognised that a fundamental component of any intelligent system is going to be some general knowledge. Researchers have been looking into the problems of acquiring, representing and then using such a knowledge base pretty much since Artificial Intelligence was dreamed up in the 1960s.

This paper clearly isn't the first to suggest that Wikipedia could be used as part of this process, though I'm not knowledgeable enough to really know how original its proposals are.

The paper suggests though that Wikipedia's info boxes and categories can be used to structure the data that is extracted from it - for instance to deduce information such as "Brian May is a member of Queen(Band)" and "Annie Hall was directed by Woody Allen".

It presents algorithms for mining Wikipedia's categories and info boxes in order to create such facts and organise them as a concept network (i.e. turning relationships, like is a member of, into lines in a graph and the objects like Brian May, into nodes where the lines meet up). It is then possible to do further processing on these concept networks, and to run comparisons between networks from different language versions of Wikipedia to produce a multi-lingual concept network.

The resulting resource, WikiNet is available for download as is a visualisation and application building tool. WikiNet was compared against a number of similar knowledge bases the most famous of which is WordNet a large lexical database of English. Obviously WikiNet is multi-lingual which WordNet isn't and it can be built and updated rapidly, however it lacks the coverage of WordNet.

Flat | Top-Level Comments Only

From:

wellinghall.livejournal.com

That is interesting - thank you.

daniel-saunders.livejournal.com

Would Wikipedia be the main source of this Artificial Intelligence's knowledge base? Because, regardless of the technical advantages, I can see serious practical flaws there.

purplecat

Well, on one level, they are only extracting the structure rather than the text - i.e. they don't analyse any of the free text, just the categories and info boxes. Which at least should remove some of the wilder nonsense.

That said, it doesn't seem, per se, to be any less error-prone than most other ways of doing it - at least in the absence of reliable natural language processing (and for that you probably need a good knowledge base to start out with - a bit chicken and egg really).

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Transforming Wikipedia into a Large Scale Multilingual Concept Network

(no subject)

(no subject)

(no subject)

Profile

May 2026

Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags