14 research outputs found
Learning curves for characters and words.
<p>The green curves correspond to HSK word lists for levels 1 to 4 (shorter curve) and 1 to 6 (longer curve). The yellow curves correspond to word lists generated from two levels of beginner readers. All curves were created using the OLS character decompositions.</p
Learning curve parameters.
<p>The number of characters learned <i>N</i>, final learning efficiency Ξ<sub><i>f</i></sub>, and integral learning efficiency β©Ξβͺ for reference cumulative learning costs of <i>C</i><sub>0</sub> = 500 and <i>C</i><sub>0</sub> = 1500. The Yan et al. algorithm was optimized up to a cumulative learning cost of <i>C</i><sub>0</sub> = 4000.</p
Measures of learning efficiency.
<p>The curves <i>A</i> and <i>B</i> represent two different learning curves. For each curve, the final learning efficiency Ξ<sub><i>f</i></sub> is the cumulative usage frequency for a specific cumulative learning cost <i>C</i><sub>0</sub>, and the integral learning efficiency β©Ξβͺ is the average cumulative usage frequency between the origin and <i>C</i><sub>0</sub>. Curve <i>A</i> has higher Ξ<sub><i>f</i></sub> but lower β©Ξβͺ. Illustrated values for β©Ξβͺ are approximate.</p
The first 85 characters of our optimized learning order.
<p>Taken together these characters have a cumulative usage frequency of 0.42.</p
A network where our algorithm does not return the optimal character order.
<p>A hypothetical network where the integral learning efficiency of the order generated by the algorithm is lower than another possible order. Letters represent Chinese characters (for example, E is a compound character formed from primitives A and B) and the numbers are centralities. Both orders have identical final learning efficiencies.</p
Usage frequencies for the first 85 characters.
<p>The gray, green and blue bars correspond to the black, green and blue curves in Fig 8. Dark bars represent primitives and light bars represent compounds.</p
Usage frequency versus number of unique components for the 1000 most common Chinese characters.
<p>This plot shows the weak relationship between character usage frequency and complexity, the latter represented by the number of unique components used to construct the character. Usage frequency is normalized to 1.0 over the whole usage frequency data set, which encompasses more characters than shown in this plot. The six characters illustrated are the most common in each column. Note that the number of unique components is not the same as visual complexity: the characters ζ and θ―΄ have similar visual complexity (they are composed of similar numbers of strokes) but ζ is conceptually more simple, being, in the OLS character decomposition, composed of two relatively complex primitive components ζ and ζ, compared with the four from which θ―΄ is composed.</p
Illustration of the topological sort algorithm.
<p>The ordered list is processed from low to high centrality (right to left in the figure). Once η is reached, its components are checked in turn. η½ is found to lie to the right of η and so is repositioned to its left. Likewise εΊ is found to the right of η and is similarly repositioned. εΊ is positioned to the right of η½ because it has lower centrality. The centralities used in this figure are for illustrative purposes only.</p
Structural decomposition of the character η §.
<p>Primitive characters appear as characters in their own right whereas primitive components do not. The primitive component η¬ is an abbreviated form of the primitive character η«. The parameter <i>r</i> is the SUBTLEX-CH usage frequency rank of the character. Pronunciations are given in pinyin romanization. Note that each character is only assigned a single meaning even though most actually possess a range of broadly related meanings.</p
Measures of character clustering.
<p>The top panel shows the average distance, in number of characters, to the closest preceding component. The bottom panel shows the average distance, in number of characters, to another character which shares a component. Curves were generated with a fixed cumulative learning cost of <i>C</i><sub>0</sub> = 4000. Averages below 250 characters are not shown because in this region the averages fluctuate wildly.</p