<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2018.611021</article-id><article-id pub-id-type="publisher-id">JCC-88778</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Falcon: A Novel Chinese Short Text Classification Method
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Haiming</surname><given-names>Li</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Haining</surname><given-names>Huang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Xiang</surname><given-names>Cao</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jingu</surname><given-names>Qian</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai, China</addr-line></aff><pub-date pub-type="epub"><day>07</day><month>11</month><year>2018</year></pub-date><volume>06</volume><issue>11</issue><fpage>216</fpage><lpage>226</lpage><history><date date-type="received"><day>15,</day>	<month>September</month>	<year>2018</year></date><date date-type="rev-recd"><day>24,</day>	<month>November</month>	<year>2018</year>	</date><date date-type="accepted"><day>27,</day>	<month>November</month>	<year>2018</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  For natural language processing problems, the short text classification is still a research hot topic, with obviously problem in the features sparse, high-dimensional text data and feature representation. In order to express text directly, a simple but new variation which employs one-hot with low-dimension was proposed. In this paper, a Densenet-based model was proposed to short text classification. Furthermore, the feature diversity and reuse were implemented by the concat and average shuffle operation between Resnet and Densenet for enlarging short text feature selection. Finally, some benchmarks were introduced to evaluate the Falcon. From our experimental results, the Falcon method obtained significant improvements in the state-of-art models on most of them in all respects, especially in the first experiment of error rate. To sum up, the Falcon is an efficient and economical model, whilst requiring less computation to achieve high performance.
 
</p></abstract><kwd-group><kwd>Short Text Classification</kwd><kwd> Word Vector Representation</kwd><kwd> One-Hot</kwd><kwd> Densenet Networks</kwd><kwd> Convolutional Neural Networks</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Nowadays, short text classification is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of text categorization have been studied, each of which deals with different types of documents and categories. With the popularity of social media, short text classification becomes an essential component in many applications, such as topic categorization to detect discussed topics, information filtering, and sentiment classification to determine the sentiment typically in product or movie reviews [<xref ref-type="bibr" rid="scirp.88778-ref1">1</xref>] .</p><p>Compared with the feature of general Chinese short text, the electric power complaint text has the following special characteristics:</p><p>・ The text relates to the field of electric power, which contains a large number of electrical professional vocabularies.</p><p>・ The characteristics of the text are not obvious, and more margin information is needed to analyze.</p><p>・ The text is mixed with too many symbols and numbers.</p><p>However, when dealing with shorter text messages, traditional techniques will not perform as well as they would have performed on larger texts.</p><p>Some obstructions are encountered when we plan to use Densenet method to deal with electric power complaint text:</p><p>・ Sparse features: The method of machine learning is to classify or predict based on features, and various features are constructed in text analysis to match the corresponding features. The effect of text classification is depending on features.</p><p>・ Difficult features representation: The sentence modeling aims at representing sentences as meaningful features of tasks such as text classification. The shorter and simpler for text, the harder feature representation.</p><p>In order to solve the above problems, we proposed the Falcon approach, which incorporates Resnet Networks and Densely Connected Convolutional Networks. To the best of our knowledge, no Densenet-based work on short text classification has been proposed to date. Meanwhile, we made several innovations in the data processing to accelerate the model training [<xref ref-type="bibr" rid="scirp.88778-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref5">5</xref>] . The main contributions to our work are as follows:</p><p>・ Feature extraction: We make innovations on the incorporation model to meet the needs of the classification in short text and make feature extraction become easy.</p><p>・ Reduce the dimension: In this paper, we use the one-hot vector, PCA dimensionality reduction, vocabulary dictionary and matching the vector id, to avoid additional calculation. The implementation of concat operation will inevitably increase the time complexity of the model.</p><p>・ Marginal feature flow: We first time propose the new architecture that incorporates Densenet with Resnet in terms of text classification, in order to increase the transfer of margin features.</p><p>We apply the proposed model on the short text classification task and achieved superior performance on various benchmarks.</p><p>The rest of this paper is organized as follows. In Section 2, we discuss some related works about the models of sentence representation and feature flow. Propaedeutics will be review in Section 3. The Falcon is presented in Section 4. Section 5 carries out the relevant experiment and analyzes the performance of Falcon. Finally, we conclude the paper and make an acknowledgement.</p></sec><sec id="s2"><title>2. Related Work</title><p>In this section we first give an overview of the current learning model in feature representation. Next, we review Densenet and channel shuffle that form the basis for Densenet-based Networks.</p><p>Models in feature representation. In many recent works of sentence representation, neural network models were constructed on either input word sequences or transformed syntactic parse tree [<xref ref-type="bibr" rid="scirp.88778-ref4">4</xref>] . Among them, Convolutional Neural Network (CNN) gets noticeable achievements. It all started with Kim Yoon [<xref ref-type="bibr" rid="scirp.88778-ref3">3</xref>] that adopted CNN for sentence classification in a simple model architecture (also called TextCNN). Its text matrix is convolved by multiple filters with varying window sizes of for multiple features [<xref ref-type="bibr" rid="scirp.88778-ref6">6</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref7">7</xref>] . Although it can perform well in many NLP tasks, one of its biggest problems is the fixed filter size.</p><p>It also has been shown that higher-level modeling on x l can help to increase the variation in the input, which should then to make it efficient to learn more margin features of between different layers [<xref ref-type="bibr" rid="scirp.88778-ref8">8</xref>] . For example, MSRA’s Ho Kaiming team [<xref ref-type="bibr" rid="scirp.88778-ref4">4</xref>] has obtained respectable improvements in deeper neural networks by learning a residual framework. To further improve the information flow between different layers, the densenet was introduced by Dr. Huang Gao [<xref ref-type="bibr" rid="scirp.88778-ref6">6</xref>] direct connections to any layer to all subsequent layers. Xiangyu Zhang et al. [<xref ref-type="bibr" rid="scirp.88778-ref9">9</xref>] , a new architecture which utilizes pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy.</p><p>Another model of mix properties of Text RNN + CNN was put forward by Siwei Lai et al. [<xref ref-type="bibr" rid="scirp.88778-ref9">9</xref>] . They apply a recurrent structure to capture contextual information. It obtains semantic vectors by convolving the context vector which is composed of Word, left-side context and right-side context. A disadvantage is that a long training time was consumed. Ying Wen et al. [<xref ref-type="bibr" rid="scirp.88778-ref7">7</xref>] improved the model of Siwei Lai [<xref ref-type="bibr" rid="scirp.88778-ref10">10</xref>] by adding a highway layer.</p><p>Dense connection. Densely connected networks proposed by Dr. Huang Gao [<xref ref-type="bibr" rid="scirp.88778-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref6">6</xref>] consist of multiple dense blocks, each of which consists of multiple layers. Each layer produces k features, where (K) is referred to as the growth rate of the network. It requires fewer parameters than traditional convolutional networks, as there is no need to relearn redundant feature-maps. Besides better parameter efficiency, another big advantage of Densenets is their improved flow of information and gradients throughout the network, which makes them easy to train.</p><p>Channel Shuffle. Xiangyu Zhang et al. [<xref ref-type="bibr" rid="scirp.88778-ref11">11</xref>] utilized two new operations, pointwise group convolution and channel shuffle in the architecture of CNN. However, although it achieves significant efficiency improvement for classification on accuracy, its efficiency improvement is less favorable for higher classification accuracy.</p><p>In summary, none of them can solve the problems we encountered by analyzing the above model. Hence, we proposed the approach to the next section to address these challenges. We adopted a variant of Densenet to increase the margin feature flow and reducing network complexity. Subsequently, it will achieve great performance on short text classification.</p></sec><sec id="s3"><title>3. Preliminary</title><p>Before the model has been proposed, we will review the three state-of-art application of convolutional neural networks to text data.</p><p>Consider a word vector group with vocabulary V that is passed through a convolutional network. The network comprises L layers, each layer implement a non-linear transformation H l ( . ) , where l indexes the layer H l ( . ) can be a composite function of operations, which in our case is the “Rectified Linear Unit” (ReLU). We denote the output of the l t h layer as x l .</p><p>Resnet. Traditional convolutional feed-forward networks connect the output the l t h as input to the l + 1 t h layer: x l = H l ( x l − 1 ) . ResNets [<xref ref-type="bibr" rid="scirp.88778-ref11">11</xref>] added a skip-connection that bypassed the non-linear transformations with an identity function:</p><p>x l = H l ( x l − 1 ) + x l − 1 (1)</p><p>On the one hand, Resnet bypass signal from one layer to the next via identity connections.</p><p>On the other hand, Resnet [<xref ref-type="bibr" rid="scirp.88778-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref7">7</xref>] makes this information preservation explicit about additive identity transformations.</p><p>Resnet aims to solve the problem of long distance transmission of feature combination problems or shallow information at different levels.</p><p>Densenet. To further improve the information flow between different layers, the densenets was introduced direct connections to any layer to all subsequent layers, the layer receives the feature-maps from all of the preceding layers:</p><p>x l = H l ( [ x 0 , x 1 , ... , x l − 1 ] ) (2)</p><p>where x k , k ∈ [ 0 , l − 1 ] , refers to the concatenation of the feature-maps. In order to implement easily, Densenet concatenates all of the front inputs into a single map.</p><p>Crucially, in contrast to Resnet, Densenet never combine features through summation before they are passed into a layer, instead, it combine features by concatenating them. Hence, the l t h layer has l inputs, consisting of the feature-maps of all preceding convolutional blocks. Its own feature-maps are passed on to all L − l subsequent layers. This introduces L (L + 1)/2 connections with a L-layer network, instead of just L, as in traditional architectures.</p></sec><sec id="s4"><title>4. Our Proposed Model</title><p>Compressed dimension. Since the dimensionality of region vectors determines the dimensionality of weight vectors, having high-dimensional region vectors means more parameters to learn. If p | V | is too large, the model becomes too complex (w.r.t. the amount of training data available) and training becomes unaffordable expensive even with efficient handling of sparse data; Therefore, one has to lower the dimensionality by lowering the vocabulary size | V | and the region size p, which may or may not be desirable, depending on the nature of the task.</p><p>With this representation, we have fewer parameters to learn. Essentially, the expressiveness of our alternative (which loses word order only within small regions) is somewhere between one-hot representation and word2vec.</p><p>Concat. One can simply implement a concat operation by adding one more concat layer upon existing networks after the 1 &#215; 1 convolutional layer (as shown in Equation (3)). The concat operation is following:</p><p>C k = R k ⊕ D k (3)</p><p>x l = H l ( [ x 0 , x 1 , ... , x l − 1 ] ) (4)</p><p>where R k and D k denote the extracted information at k-th step from Resnet and Densenet, ⊕ denotes the concat operator.</p><p>Average Channel Shuffle. The basic idea of average shuffle algorithm is the index i scans and copies the original data from front to back, between [0, 1] random a index j , the main effect is equivalent to exchang the values of i and j in the copy data, which advantage is that the time consumption is minimal. Which is enabling cross-group information flows for group convolution layers. The pseudocode of Average Shuffle is shown in the following Algorithm 1.</p><p>The model schematic architecture, which is shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>, different colors represent different features, concat and average shuffle operations are used to the categories of feature map and the flow of global feature information.</p></sec><sec id="s5"><title>5. Experiments and Evaluation</title><p>In this section, the effectiveness of the proposed model was validated by THUCNews datasets, and then handle the EPCT datasets. Hardware environment: 4 GB RAM, Nvidia Geforce GTX 970 M, 3 G. Software enviroment: All experiments are conducted on a Windows 7 professional 64 bit OS with a simple integrated experimental environment (anaconda 3 (64 bit) + python (3.6) + spyder) and an experimental framework of tensorflow (1.1.0).</p><sec id="s5_1"><title>5.1. Datasets and Data Preprocessing</title><p>The summary statistics of EPCT datasets are in <xref ref-type="table" rid="table1">Table 1</xref>.</p><p>We divided the dataset into training sets, validation sets, and test sets by 8:1:1. That is, 80% data were used for training the word2vec and classifier. In constructing the word vector model, the size of word vectors was set to 50 (i.e., each word was represented as a 50-dimensional vector.). In word vector representation, each word is represented as a vector in an arbitrary vector space. Then every word is represented as a numerical vector, we can compute relevancy between words. Continuous word vectors representation techniques have been proposed in [<xref ref-type="bibr" rid="scirp.88778-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref14">14</xref>] . The proposed two models such as “continuous bag of words” and “continuous skip-gram” can express an aspect of meaning of words. Both models are implemented to word2vec [<xref ref-type="bibr" rid="scirp.88778-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.88778-ref17">17</xref>] . The word2vec is a tool which realizes word vector representations to text set. The whole preprocessing workflow is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Description of datasets</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Date sets Classes Number Ave Length</th></tr></thead><tr><td align="center" valign="middle" >THUNews 20 20,000 264</td></tr><tr><td align="center" valign="middle" >EPCT 7 5000 93</td></tr></tbody></table></table-wrap></sec><sec id="s5_2"><title>5.2. Performance Results</title><p>The concat operation used in Equation (3) is not valid when the size of feature-maps changes. For convolutional layers with filter size 3 &#215; 3 (group convolution) and 1 &#215; 1, The transition layers used in our experiments consist of a batch normalization layer and an 1 &#215; 1 convolutional layer followed by a 2 &#215; 2 average pooling layer.</p><p>Several parameters of our model are summarized in <xref ref-type="table" rid="table2">Table 2</xref>.</p><sec id="s5_2_1"><title>5.2.1. One-Hot Vs. Word2vec</title><p><xref ref-type="table" rid="table3">Table 3</xref> shows the error rates of our proposed model in comparison with the baseline methods. The first thing to note is that on all the datasets, our model outperforms the baseline methods, which demonstrates the effectiveness of our approach.</p><p>To look into the details, on this task, while our model outperform all the baseline methods, which indicates that in this setting the merit of having fewer parameters is larger than the benefit of keeping word order in each region.</p></sec><sec id="s5_2_2"><title>5.2.2. Concat Vs. No Concat on EPCT</title><p>We first perform a set of experiments to validate the F1-score on Resnet, Densenet-BC and Concat model. <xref ref-type="table" rid="table4">Table 4</xref> results that beyond most competing methods are bold. All the results of are obtained using Concat operation. It’s obvious that Concat model performs better by a large margin, especially in “suggestion” and “complaint”.</p><p>Finally, the results with training sets of various sizes on THUNews and EPCT are shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p></sec><sec id="s5_2_3"><title>5.2.3. Comparison with State-of-the-Art Models</title><p>On THUNews and EPCT datasets, the best error rates we obtained by training were 8.6 and 7.5. Without exception, which is both better than other methods. Meanwhile, we also find that our model will achieve good performance in a short training time, which is shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>. Since excellent performances were reported on short text classification, we presume that their model is optimized for short sentences, but not for text categorization in general.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Parameters in our model</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Description</th><th align="center" valign="middle" >Values</th></tr></thead><tr><td align="center" valign="middle" >embedding dim</td><td align="center" valign="middle" >64</td></tr><tr><td align="center" valign="middle" >seq length</td><td align="center" valign="middle" >600</td></tr><tr><td align="center" valign="middle" >vocab size</td><td align="center" valign="middle" >500</td></tr><tr><td align="center" valign="middle" >hidden dim</td><td align="center" valign="middle" >128</td></tr><tr><td align="center" valign="middle" >batch size</td><td align="center" valign="middle" >64</td></tr><tr><td align="center" valign="middle" >num epochs</td><td align="center" valign="middle" >10</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Error rate (%) comparison with baseline methods. Short text classification on THUNews (2 K training documents) and EPCT (0.5 K training documents) indicates that most frequent word vector were used</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >THUNews</th><th align="center" valign="middle" >EPCT</th></tr></thead><tr><td align="center" valign="middle" >one-hot + CNN</td><td align="center" valign="middle" >11.47</td><td align="center" valign="middle" >9.50</td></tr><tr><td align="center" valign="middle" >word2vec + CNN</td><td align="center" valign="middle" >8.46</td><td align="center" valign="middle" >8.21</td></tr><tr><td align="center" valign="middle" >one-hot + Densenet</td><td align="center" valign="middle" >8.34</td><td align="center" valign="middle" >7.92</td></tr><tr><td align="center" valign="middle" >word2vec + Densenet</td><td align="center" valign="middle" >8.21</td><td align="center" valign="middle" >7.75</td></tr><tr><td align="center" valign="middle" >[ours]</td><td align="center" valign="middle" >8.06</td><td align="center" valign="middle" >7.63</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Model with/without concat (select the persuasive benchmark F1-score as an example.)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Categories</th><th align="center" valign="middle"  colspan="3"  >Model</th></tr></thead><tr><td align="center" valign="middle" >―</td><td align="center" valign="middle" >Densenet</td><td align="center" valign="middle" >Resnet</td><td align="center" valign="middle" >Concat model</td></tr><tr><td align="center" valign="middle" >praise</td><td align="center" valign="middle" >0.76</td><td align="center" valign="middle" >0.60</td><td align="center" valign="middle" >0.76</td></tr><tr><td align="center" valign="middle" >fault repair</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.86</td><td align="center" valign="middle" >0.93</td></tr><tr><td align="center" valign="middle" >suggestion</td><td align="center" valign="middle" >0.83</td><td align="center" valign="middle" >0.75</td><td align="center" valign="middle" >0.89</td></tr><tr><td align="center" valign="middle" >report</td><td align="center" valign="middle" >0.76</td><td align="center" valign="middle" >0.62</td><td align="center" valign="middle" >0.76</td></tr><tr><td align="center" valign="middle" >complaint</td><td align="center" valign="middle" >0.75</td><td align="center" valign="middle" >0.69</td><td align="center" valign="middle" >0.89</td></tr><tr><td align="center" valign="middle" >business consultation</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >0.88</td><td align="center" valign="middle" >0.97</td></tr><tr><td align="center" valign="middle" >opinion</td><td align="center" valign="middle" >0.65</td><td align="center" valign="middle" >0.59</td><td align="center" valign="middle" >0.72</td></tr></tbody></table></table-wrap></sec></sec></sec><sec id="s6"><title>6. Conclusions</title><p>In this paper, this is the first time that a novel short text classification method based on Densenet networks was proposed to address the electric power complaint text. Meanwhile the improvements in short text classification model making a comprehensive comparison of the classification efficiency were implemented. The model provides a new approach and a train of thought for the study of the other short Chinese texts.</p><p>In addition, the highly consistent classification results were found by comparison of the actual classification results of the state grid; this is due to the improvement of feature learning method in this model, just like reading, the richer the content of the book, the more knowledge you will learn.</p><p>Although some improvements have been achieved, one of the future works is to develop a more efficient short text classification model in enlarging short text features.</p></sec><sec id="s7"><title>Acknowledgements</title><p>To acknowledge all of the people that have contributed to this paper, the donors of THUNews dataset and the anonymous reviewers for their helpful feedback and suggestions. This work is also supported by the National Natural Science Foundation of China (No. 61272437 and 71203137).</p></sec><sec id="s8"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s9"><title>Cite this paper</title><p>Li, H.M., Huang, H.N., Cao, X. and Qian, J.G. (2018) Falcon: A Novel Chinese Short Text Classification Method. Journal of Computer and Communications, 6, 216-226. https://doi.org/10.4236/jcc.2018.611021</p></sec></body><back><ref-list><title>References</title><ref id="scirp.88778-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Aggarwal, C.C. and Zhai, C.X. (2012) A Survey of Text Classification Algorithms. Mining Text Data. Springer US, 163-222.  
https://doi.org/10.1007/978-1-4614-3223-4_6</mixed-citation></ref><ref id="scirp.88778-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Cho, K., Van Merrienboer, B., Gulcehre, C., et al. (2014) Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation.  
arXiv:1406.1078 [cs.CL]</mixed-citation></ref><ref id="scirp.88778-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">He, K., Zhang, X., Ren, S., et al. (2015) Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv:1502.01852 [cs.CV]</mixed-citation></ref><ref id="scirp.88778-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Huang, G., Liu, S., Laurens, V.D.M., et al. (2017) CondenseNet: An Efficient DenseNet Using Learned Group Convolutions. arXiv:1711.09224 [cs.CV]</mixed-citation></ref><ref id="scirp.88778-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Huang, G., Liu, Z., Maaten, L.V.D., et al. (2017) Densely Connected Convolutional Networks. IEEE Conference on Computer Vision and Pattern Recognition, 1, 2261-2269. https://doi.org/10.1109/CVPR.2017.243</mixed-citation></ref><ref id="scirp.88778-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Ioffe, S. and Szegedy, C. (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning, PMLR, 37, 448-456.</mixed-citation></ref><ref id="scirp.88778-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Iyyer, M., Enns, P., Boyd-Graber, J., et al. (2014) Political Ideology Detection Using Recursive Neural Networks. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1, 1113-1122.  
https://doi.org/10.3115/v1/P14-1105</mixed-citation></ref><ref id="scirp.88778-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Kalchbrenner, N., Grefenstette, E. and Blunsom, P. (2014) A Convolutional Neural Network for Modelling Sentences. arXiv:1404.2188 [cs.CL]</mixed-citation></ref><ref id="scirp.88778-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. International Conference on Neural Information Processing Systems, Curran Associates Inc., 1097-1105.</mixed-citation></ref><ref id="scirp.88778-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Chen, G., Ye, D., Xing, Z., et al. (2017) Ensemble Application of Convolutional and Recurrent Neural Networks for Multi-Label Text Categorization. 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, 14-19 May 2017, 2377-2383. https://doi.org/10.1109/IJCNN.2017.7966144</mixed-citation></ref><ref id="scirp.88778-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Liao, Q. and Poggio, T. (2016) Bridging the Gaps between Residual Learning, Recurrent Neural Networks and Visual Cortex.</mixed-citation></ref><ref id="scirp.88778-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representations in Vector Space.</mixed-citation></ref><ref id="scirp.88778-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Mikolov, T., Sutskever, I., Chen, K., et al. (2013) Distributed Representations of Words and Phrases and Their Compositionality. International Conference on Neural Information Processing Systems, Daegu, 3-7 November 2013, 3111-3119.</mixed-citation></ref><ref id="scirp.88778-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Mikolov, T., Yih, W.T. and Zweig, G. (2013) Linguistic Regularities in Continuous Space Word Representations.</mixed-citation></ref><ref id="scirp.88778-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Ji, Y.L. and Dernoncourt, F. (2016) Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks. 515-520.</mixed-citation></ref><ref id="scirp.88778-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, X., Zhou, X., Lin, M., et al. (2017) ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.</mixed-citation></ref><ref id="scirp.88778-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, Y. and Wallace, B. (2015) A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification.</mixed-citation></ref></ref-list></back></article>