|
Feature Name |
Description |
|
Creator Features |
|
|
Creator Name* |
Username of article creator |
|
Creator Days on Site |
How long has article creator had a WP account? |
|
Creator Num Edits |
How many edits has the article creator made sitewide? |
|
Creator Status |
Creator’s account status (open or blocked) |
|
Num other pages |
Number of other kept pages created by article creator |
|
Userpage |
Does article creator have a userpage? |
|
Page Features |
|
|
Title* |
Article Title |
|
Createdate* |
Date article was created |
|
File Size! |
Size of the entire file, including all revisions |
|
Has Talk Page! |
Does the article have an accompanying talk page |
|
Topic Features |
|
|
Num links to here |
Number of other Wikipedia articles linking to the article |
|
Num links in from Web |
Number of external web pages linking to article |
|
Pageviews** |
Number of visits page has had |
|
Num Hits |
Number of search engine results for the page title |
|
Article Features |
|
|
Num categories |
Number of WP categories the article belongs to |
|
Num images |
Number of images in the article |
|
Num references |
Number of references in the article |
|
Num sections |
Number of sections in the article |
|
Num out Wikilinks |
Number of links in the article to other WP pages |
|
Infobox* |
What kind of infobox does the article contain? |
|
Total Size in Bytes |
Length, in bytes, of the final version of the article |
|
Revision Features |
|
|
Num Revisions |
Number of revisions to the article |
|
Num registered edits |
Number of edits made by registered users |
|
Num anonymous edits |
Number of edits made by anonymous users |
|
Num unique Editors |
Number of unique users who edited the article |
|
Time to Delete |
Number of days between article creation and proposal for deletion |
|
More than half anon |
Boolean; are more than half of the edits made by anon users? |
|
Has main editor |
Boolean; has one user created > 50% of the content? |
|
Creator is main editor |
Boolean; is the article creator the main editor? |
|
Likelihood Autobio |
String similarity between article title and creator’s username |
|
Text* |
Bag of all words in the final version of the article |
|
Language Features |
|
|
Normalized noun count |
# of nouns in article, normalized by article length |
|
Normalized verb count |
# of verbs in article, normalized by article length |
|
Normalized adjective count |
# of adjectives in article, normalized by article length |
|
Normalized adverb count |
# of adverbs in article, normalized by article length |
|
FK reading level |
Flesch-Kincaid reading level of the article |
|
SMOG reading level |
SMOG reading level index |
|
Cl level |
Coleman-Liau reading level index |
|
Level avg |
Average of 5 reading level indexes |
|
FK reading ease |
Flesch-Kincaid reading ease measure of the article |
* Not used for classification because these features are extremely sparse.
** Only used in New dataset because of a MediaWiki software error that miscounted pageviews in December 2011.
! Not used in the Original dataset classification because feature selection found that they reduced accuracy.