词汇 | example_english_corpora |
释义 | Examples of corporaThese examples are from corpora and from sources on the web. Any opinions in the examples do not represent the opinion of the Cambridge Dictionary editors or of Cambridge University Press or its licensors. Total instances annotated in both training and test corpora. The method uses a measure of the specificity of a terminology candidate with respect to the target domain via comparative analysis across different corpora. My methods are predominantly quantitative, and my primary data comprise structured electronic corpora, which offer a solid basis for chronological comparisons. The research reported in the second paper is motivated the task of automatic knowledge acquisition from large corpora. Note, that this cautious measure may lead to underestimating the errors in the corpora. Like all other research designs, using large corpora and concordance programs opens up specific potential but also has limitations. Next, a set of textual corpora was compiled from which example sentences could be gathered. However, the overall rates of orthography influence are low in all three corpora. Thus, all predictor variables were based on adult corpora for uniformity across variables. A third limitation is that the computation of all predictor variables was based on adult corpora. Recent approaches to discourse processing, on the other hand, that allow robust parsing require extensive corpora investigations. Table 2 gives some statistics about the corpora, the participants, and the scores of the best performing systems. As extensive controlled annotated corpora were not still available at the time of the experiments, resources have been manually derived for them. Anomalies are detected using heuristics which we developed after analyzing our corpora. With regard to grammar checking, the number of unnecessary interventions is obviously considerably higher than the correct ones for the corrected corpora. Figure 7 shows the effect of using various corpora as training data for statistical language models used in the recognition of the test data. We found 543 offers-in-repairs in the present corpora. The topic of religion is even more revealing, because it represents the same proportion (3.6-3.8%) of topics discussed in both corpora. The fifth section is dedicated to the use of written corpora: textual analysis and historical sociolinguistics, the latter being treated in more depth (82-91). In summary, then, the three corpora are very similar in terms of size and types of texts. All three corpora consist of roughly one million words, containing 500 texts of approximately 2,000 words each, distributed across 15 text categories. The present study analyzes data from several large corpora which are now available. There is therefore no guarantee whatsoever that they are not going to be contradicted in other corpora. Evidence gleaned from reference corpora becomes the foil against which usage in individual texts can be judged. Using corpora to explore linguistic variation, 49-72. Many methods have been proposed to align bilingual corpora. Because of errors and misprints, real text corpora contain a fair number of invalid dates. Like atlas, some of these were originally built for processing speech corpora and have been extended for handling text. Similarly, corpus based approaches suffer from a lack of multilingual training corpora. Reference works and corpora can be in one, two, or more languages. Speech recognizers are generally evaluated by comparing their performance on pre-recorded and manually transcribed corpora. Table 3 shows the average per sentence frequency of each type of constraint for the three training corpora. The main problem is then the practical impossibility to compute all combinations for even relatively small corpora. A set of 3 048 types is common to all three corpora, accounting for around 75% of the tokens of each. Although the numbers of bigrams for the news corpora were not of a different order, the vocabulary sizes in the personal documents were clearly smaller. In addition, we believe that the performance of the system will improve with the number of examples and larger corpora will achieve better results. Here, a word frequency library based on the corpora is set up statistically. From these corpora, we can extract various information about the language such as subcategorization, the type of sentences, and the usage of words. The total size of the two corpora was 128,294 words. The existence of our two corpora (1972-74 and 1989-93) enables us to look at this problem. Table 1 lists the sizes of these corpora. Table 2 summarizes some style and content features of the available corpora. In one analysis, raw neighbourhood counts for each of the five corpora were examined and compared. In all of the corpora analysed, as word length increases, neighbourhood density decreases. Table 1 shows summary data for the children's corpora and the input samples used in these analyses. However, this influence is probably limited because an elevated rate of this error category was also observed in other more inclusive corpora. Table 2 summarizes the relevant ®gures for the three suf®xes in the three corpora. However, the andless construction and the corresponding and construction do not occur frequently enough in the relevant corpora to provide any reliable indications. The data in the article come from large electronic corpora. Many online dictionaries are now backed by corpora. The expansion of spoken corpora to embrace a wider range of language varieties is also raising new issues for pedagogical modeling. The coming decades in linguistics in general will almost certainly be dominated by the analysis of corpora. The same corpora were then searched for child initiations of ' happen ' and its variants. Therefore, frequency counts in the children's corpora had to be scaled to be comparable to adult frequency counts. About thirty percent of the utterances has a discontinuous finite predicate, and simple finite utterances make up the lion's share (around 70 %) of the corpora. We also cannot exclude the possibility that there may be discrepancies in the representativeness of the corpora underlying our estimates. Table 1 shows the corpora used and the number and age range of children in each corpus. Thus, for three of the four corpora considered, the corrected scaling factors more conservatively estimate frequency. Excluding this corpus, which is unique in its tendency for non-convex learning graphs, 90% of the curves in the remaining corpora are convex. If feasible, future replication studies should consider even larger corpora. Unfortunately, of the few sense-annotated corpora currently available, virtually all are tagged collections of a single ambiguous word such as line or tank. Second, the process avoids some problems that arise in using exhaustively annotated corpora for evaluation. The corpus-based bootstrapping algorithm illustrates how text corpora can be exploited to acquire semantic information semi-automatically, without the need for special resources. Of particular importance, it reduces the need for bilingual linguists and bilingual corpora during the development of the transfer engine. The use of corpora has resulted in systems that can cope with unrestricted text and use sophisticated techniques for lexical acquisition. Treebanks: building and using parsed corpora, pp. 5-22. Using this extraction method, we were able to extract 8.3 million examples in total from our three corpora. In the sixth chapter, automatic term acquisition from texts in corpora is considered. Lexical information is extracted from large corpora on the basis of the syntactic relations. While automatic projectivity verification is possible only for dependency banks, the problems discussed in this article are not unique to that type of corpora. Another issue that needs to be tackled at some point in the future is the use of unannotated corpora. However, the integration of corpora into general language learning and teaching practice has so far been disappointing. The first person occurred in all the sections of both corpora, with significant differences of use across the sections. Figure 2 presents the relative frequencies of object-verb and verb-object order in the two corpora as a function of developmental phase. In order to account for variations in the size of corpora, the following analyses are based on rates per 1000 utterances. Calculations on other constructions and other corpora are needed to confirm this figure. The corpora were then divided into utterances, based on the presence of a perceptible pause separating it from other locutions of the same speaker. However, an exploratory comparison of counter factuals in the three corpora did not reveal any notewor thy patterns. Isolated examples were found in the other three corpora. Allatostatins and allatotropins : is the regulation of corpora allata activity their primary function ? On the corpora they examined, an ideal system under these experimental conditions would only achieve an accuracy of around 60 percent. Chapter 6 describes existing annotated corpora, but more importantly explains how expensive and difficult they are to obtain. Both of these corpora were parsed twice: once with tags, and once without tags but with a morphological analyzer. To get a handle on this, we have looked at two other database-oriented corpora, one from the same domain and one from a different domain. However, constructions like these do not show up in our corpora of spoken negotiative dialogue. The trends described here were also observed when we tested different user corpora. At the same time, test texts were cut from different parts of the user corpora and used to measure the efficiency of adaptation. Open-source corpora: using the net to fish for linguistic data. The traditional construction with from is rare, but evenly spread throughout all four corpora. In this article, we use the examples as they appear in the corpora. The data for this study come mainly from electronic corpora, both diachronic and synchronic. Differing linguistic data in corpora may also make the extent to which variationist studies are comparable problematic. Still, this seems to be an area where more research is required and larger corpora need to be consulted. The contributors discuss grammar, syntax, lexis, speech, dialects, specialized corpora and software. On the contrary, both corpora include considerable variety. Evidently, corpora have increased in size as resources have expanded and techniques become more refined. The first remedy involves the use of larger corpora which will become increasingly available in the next few years. Natural language corpora have been available in computerised form for over thirty years. The rarity of such corpora reflects the effort required in the analysis. With such large corpora it is believed that improved language models may be created. These examples are from corpora and from sources on the web. Any opinions in the examples do not represent the opinion of the Cambridge Dictionary editors or of Cambridge University Press or its licensors. |
反思网英语在线翻译词典收录了377474条英语词汇在线翻译词条,基本涵盖了全部常用英语词汇的中英文双语翻译及用法,是英语学习的有利工具。