Wikipedia talk:Categorization

Small categories

I saw a question from Naraht at the help desk on whether there is a minimum number of entries in a Category and whether there are exceptions here. I know that WP:SMALLCAT is a former guideline, is there any current guidance? TSventon (talk) 15:42, 24 March 2026 (UTC)[reply]

TSventon Thank you for bumping it over here. I've dealt with two somewhat similar situations in the last 3 months.
  1. Category:Historically segregated African-American schools in the United States. I've only created state level categories in Category:Historically segregated African-American schools in the United States by state or territory if the state has three or more, and left them in the parent cat if there were one or two.
  2. Category:RuPaul's Drag Race episodes I had only been creating the season by season subcats if there were more than three entries that weren't just redirects. Another editor created season cats for the seasons that only had one and I wasn't sure enough of WP:SMALLCAT to undo it.
So advice please.Naraht (talk) 16:59, 24 March 2026 (UTC)[reply]
WP:NARROW. It's really just a question of whether it enhances navigation, or is merely overcategorisation. Other questions to ask yourself: Is this a topic that a well-referenced article could be written about? Is the topic broad enough for such a grouping to have value, or so narrow as to hinder navigation. Also, do any lists or navigation boxes already exist for this?
I hope this helps. - jc37 13:06, 28 March 2026 (UTC)[reply]
@Naraht:. TSventon (talk) 13:12, 28 March 2026 (UTC)[reply]
Jc37, TSventon For the Drag race episodes, there is an article (multiple pages) for each of the seasons so far, so based on that, I'm good with the addition of categories even if only one article has been created out of the 8-14 episodes. They are folded into an extensive navbox as well, don't know if that helps or hurts. Unfortunately, the idea of whether a well referenced article could be written about put not only Category:Historically segregated African-American schools in the United States splitting into states as a worse idea, if affects a number of more established categories as being less appropriate, for example a category of the historically black colleges in a state as a category. If it doesn't make sense to have a single article covering both the private historically black colleges/universities in Georgia with the state created historically black colleges/universities does that make them less appropriate to have as a category?Naraht (talk) 20:58, 29 March 2026 (UTC)[reply]
@Naraht: When it comes to diffusing large categories, it's more about navigation if the subcategories fit into an existing scheme, such as separating American categories by state or territory. The problem there becomes when users do that for all categories even when there's not many articles to diffuse, or when they create a category like "X by Y" with only one subcategory in it. There's no hard and fast rule; it's just a judgement call. If a categorisation scheme exists and seems useful, I don't think it matters whether there could be an article written about every category in the tree. Mclay1 (talk) 00:20, 30 April 2026 (UTC)[reply]
Mclay1 Thanx. For the Historically segregated AA schools in the US, it is a situation where there are 200 and let's say Georgia has 17, Mississippi has 15, etc. but Massachusetts and Connecticut only have one or two. I know this is probably a situation where this could be discussed if anyone brought up a CFD, but I thought I'd ask here in advance.Naraht (talk) 14:42, 30 April 2026 (UTC)[reply]
@Naraht: In a situation like that where a category can be fully diffused but some subcategories will be underpopulated, opinions will be divided on whether those small categories are useful and should exist or not. Personally, I wouldn't create a category for just one article. If it were brought to CfD, the majority consensus would likely be to upmerge them. Mclay1 (talk) 14:54, 30 April 2026 (UTC)[reply]
Fair enough.Naraht (talk) 15:11, 30 April 2026 (UTC)[reply]

Should singing synthesis sample libraries be considered "software"

Hello, why are Wikipedia's articles on sample libraries performance-sampling based singing synthesizers dubbed as "software"? Some examples in these lists:

List of Vocaloid products

Category:Singing software synthesizers

Most singing synthesis systems are based on performance-sampling techniques. Basically, they use a library of recordings of singing and process those recordings to produce their synthesis. An overview of this and a practical example is shown in this paper by one of the pre-eminent figures in the field: https://repositori.upf.edu/items/e5fd582d-da51-4b24-a376-44289967d116?locale=en (unfortunately this it seems to be down right, so I have uploaded a copy here: https://smallpdf.com/file#s=d72ae8f3-2328-4b4d-8c5f-183f04c5c54d)

I don't think these libraries differs significantly from other types of audio sampling libraries, and I think it is comparable to things like brush packs for drawing programs. Usually, software is considered to be executable code or at least that controls a given system in an expressive enough way within a given domain.

Wikipedia's own definition of software: Software "Software consists of computer programsthat instruct the execution of a computer.Software also includes design documents and specifications."

I feel that these articles should instead describe them as something along the lines of "singing synthesis sample libraries" or "vocal synthesis libraries". ~2026-23457-68 (talk) 18:29, 15 April 2026 (UTC)[reply]

Computer program, the sequence of instructions, is what you're thinking of here. It is a subset of software. The lead at Software indicates that it also includes design documents and specifications and I'd argue that it also includes certain data used by the computer program; certainly things like error messages, the icon and other GUI elements. In summary, it's arguable and drawing the line will not be trivial. ~Kvng (talk) 15:52, 24 April 2026 (UTC)[reply]

"All other punctuation marks (and punctuation-like diacritics) should be removed."

I'm confused about the part of the guideline about removal of "punctuation-like diacritics". It was added in Special:Diff/1311854344 by User:Mclay1 on 17 September 2025, but I don't see anything resembling this guideline in the prior version: Special:Permalink/1311292815. The relevant part of the guideline in the prior version has no mention of diacritics:

Hyphens, apostrophes and periods/full stops are the only punctuation marks that should be kept in sort values. The only exception is the apostrophe in names beginning with O', which should be removed. For example, Eugene O'Neill is sorted {{DEFAULTSORT:Oneill, Eugene}}. All other punctuation marks should be removed. (Commas can be added when re-ordering words, as in the previous example.)

The only relevant discussion at the time that I see in the talk page archives is Wikipedia talk:Categorization/Archive 19#"O'" rule for first names about the last name "O'Donel", for which the guideline is clear both before and after Mclay1's edit. Side note: it is also mentioned in Wikipedia:Categorization/Sorting names#Other exceptions.

The text of the guidelines (as of Special:Permalink/1350613034) implies that all diacritics are handled automatically by MediaWiki in its collation algorithm. The passage In English Wikipedia, sort order merges (ignores) case and diacritics. For example, "Baé", "Båf", "BaG" would be sorted in that order. and its footnote In 2016, English Wikipedia's category collation was changed to "uca-default", which is based on the Unicode collation algorithm (UCA). The most noticeable difference is that UCA groups characters with diacritics with their non-diacritic versions.

Part of my confusion is related to letters of other Latin-based alphabets which don't have "punctuation-like diacritics". As a concrete example, the section Wikipedia:Categorization/Sorting names § Sort by surname which says (as of Special:Permalink/1335883955):

[...] on English Wikipedia, the DEFAULTSORT value is Western order, overridden for Icelandic categories, where the sort key is as the name is written. Arnaldur Indriðason is sorted {{DEFAULTSORT:Indridason, Arnaldur}}, while the Icelandic category of photographers is done, [[Category:Icelandic photographers|Arnaldur Indridason]]. For the listas= parameter in project templates on article talk pages use the DEFAULTSORT value (since it mainly categorises in non-Icelandic categories)

but the whole guideline page doesn't mention the replacement of any alphabet with English. In the Icelandic example the only replacement is "ð" → "d", but the common practice seems to be to replace all letters specific to the Icelandic alphabet with their English alphabet analogues. For example, Þórhildur Sunna Ævarsdóttir has Aevarsdottir, Thorhildur Sunna and Thorhildur Sunna Aevarsdottir as category sort keys. The discussion from the footnote doesn't seem to mention it either. —⁠andrybak (talk) 16:11, 25 April 2026 (UTC)[reply]

@Andrybak: I added the clarification about punctuation-like diacritics to make clear they are included in the guideline for the removal of punctuation as opposed to the guideline explaining that diacritics on letters will be ignored. I was referring to characters such as ʻ in Hawaiʻi (island). I don't know if there's a better term for those characters? Mclay1 (talk) 16:24, 25 April 2026 (UTC)[reply]
Supposedly that specific character is a letter, but I think it's fair to say most English-speakers would expect Hawaii and Hawaiʻi to be sorted together. Mclay1 (talk) 16:27, 25 April 2026 (UTC)[reply]
I agree with this example of modifier letter turned comma – this is indeed what the readers would expect.
However, as a reader of the guideline page, I assumed that "punctuation-like diacritics [...] should be removed" meant that ü (U with diaeresis) should be converted into u (letter) via removal of the diaeresis and that Ç (C with cedilla) should be converted into C (letter). Diaeresis and cedilla are examples of diacritics that resemble punctuation. This is what my confusion is about.
Because names are an easy example of other alphabets being used in article titles, I tried checking what is written in WP:NAMESORT, but it doesn't mention letter/alphabet/diacritics conversion/replacement at all. —⁠andrybak (talk) 16:45, 25 April 2026 (UTC)[reply]
I'll try to clarify that then. Although, since diacritics attached to letters are ignored anyway, removing them in sort keys is fine and has historically been the general practice anyway. In terms of replacing non-English letters, that's an interesting point that we should investigate. Mclay1 (talk) 17:32, 25 April 2026 (UTC)[reply]
In the Icelandic example the only replacement is "ð" → "d" - this is wrong. The Icelandic letter concerned is called eth, and it's pronounced just like the hard "th" in English words like "this", "that" and "the other". This is in contrast to þ, or thorn, which is the soft "th" in English words like, well, "thorn", "thank" or "think". --Redrose64 🌹 (talk) 20:49, 25 April 2026 (UTC)[reply]
While eth is the phonetic equivalent of th, it is commonly transliterated as D because it was derived from D. For example, Sigurd is the Anglicisation of Sigurðr. That's standard, not just on Wikipedia. Mclay1 (talk) 05:18, 26 April 2026 (UTC)[reply]
I've tested the sorting of some characters. Æ automatically sorts as AE, and ð automatically sorts as D, so replacing those characters in sort keys is not necessary (but is also fine to do for clarity). However Þ sorts after Z, so it probably should be replaced by Th. ʻ sorts in between H and I for some reason, so I think it should be removed to avoid confusion. Mclay1 (talk) 05:34, 26 April 2026 (UTC)[reply]

Discrepancy between categories and lists of Czech films (discussion at WikiProject Film)

I've started a discussion at WikiProject Film about the fact that categories and lists of Czech films are defining "Czech" differently, and what to do about it. I'd love some input from people more familiar with categorization.

Wikipedia_talk:WikiProject Film#Discrepancy between categories and lists of Czech films dylansan (talk) 12:52, 8 May 2026 (UTC)[reply]

Discussion about history of instructions at WP:CFD/S

Please join the discussion at Wikipedia talk:Categories for discussion#Reconstructing history of instructions of Wikipedia:Categories for discussion/Speedy. —⁠andrybak (talk) 16:30, 16 May 2026 (UTC)[reply]

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.