Skip to main content
Log in

Limitations of shallow networks representing finite mappings

  • S.I. : EANN 2017
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Limitations of capabilities of shallow networks to efficiently compute real-valued functions on finite domains are investigated. Efficiency is studied in terms of network sparsity and its approximate measures. It is shown that when a dictionary of computational units is not sufficiently large, computation of almost any uniformly randomly chosen function either represents a well-conditioned task performed by a large network or an ill-conditioned task performed by a network of a moderate size. The probabilistic results are complemented by a concrete example of a class of functions which cannot be efficiently computed by shallow perceptron networks. The class is constructed using pseudo-noise sequences which have many features of random sequences but can be generated using special polynomials. Connections to the No Free Lunch Theorem and the central paradox of coding theory are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ba LJ, Caruana R (2014) Do deep networks really need to be deep? In: Ghahrani Z (ed) Advances in neural information processing systems, vol 27. MIT Press, Cambridge, pp 1–9

    Google Scholar 

  2. Ball K (1997) An elementary introduction to modern convex geometry. In: Levy S (ed) Flavors of geometry. Cambridge University Press, Cambridge, pp 1–58

    Google Scholar 

  3. Barron AR (1992) Neural net approximation. In: Narendra KS (ed) Proceedings of 7th Yale workshop on adaptive and learning systems. Yale University Press, New Haven, pp 69–72

  4. Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39:930–945

    Article  MathSciNet  MATH  Google Scholar 

  5. Bellman R (1957) Dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  6. Bengio Y, LeCun Y (2007) Scaling learning algorithms towards AI. In: Bottou L, Chapelle O, DeCoste D, Weston J (eds) Large-scale kernel machines. MIT Press, Cambridge

    Google Scholar 

  7. Bengio Y, Delalleau O, Roux NL (2006) The curse of highly variable functions for local kernel machines. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, pp 107–114

    Google Scholar 

  8. Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25:1553–1565

    Article  Google Scholar 

  9. Candès EJ (2008) The restricted isometric property and its implications for compressed sensing. C R Acad Sci Paris I 346:589–592

    Article  MATH  Google Scholar 

  10. Coffey JT, Goodman RM (1990) Any code of which we cannot think is good. IEEE Trans Inf Theor 36:1453–1461

    Article  MathSciNet  MATH  Google Scholar 

  11. Cover T (1965) Geometrical and statistical properties of systems of linear inequalities with applictions in pattern recognition. IEEE Trans Electron Comput 14:326–334

    Article  MATH  Google Scholar 

  12. DeVore RA, Howard R, Micchelli C (1989) Optimal nonlinear approximation. Manuscr Math 63:469–478

    Article  MathSciNet  MATH  Google Scholar 

  13. Donoho D (2006) For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Commun Pure Appl Math 59:797–829

    Article  MATH  Google Scholar 

  14. Donoho DL, Tsaig Y (2008) Fast solution of 1-norm minimization problems when the solution may be sparse. IEEE Trans Inf Theory 54:4789–4812

    Article  MathSciNet  MATH  Google Scholar 

  15. Fine TL (1999) Feedforward neural network methodology. Springer, Berlin

    MATH  Google Scholar 

  16. Gnecco G, Sanguineti M (2009) The weight-decay technique in learning from data: an optimization point of view. Comput Manag Sci 6:53–79

    Article  MathSciNet  MATH  Google Scholar 

  17. Gribonval R, Nielsen M (2003) Sparse representations in unions of bases. IEEE Trans Inf Theory 49:3320–3325

    Article  MathSciNet  MATH  Google Scholar 

  18. Ito Y (1992) Finite mapping by neural networks and truth functions. Math Sci 17:69–77

    MathSciNet  MATH  Google Scholar 

  19. Kainen PC, Kůrková V, Vogt A (1999) Approximation by neural networks is not continuous. Neurocomputing 29:47–56

    Article  Google Scholar 

  20. Kainen PC, Kůrková V, Vogt A (2000) Geometry and topology of continuous best and near best approximations. J Approx Theory 105:252–262

    Article  MathSciNet  MATH  Google Scholar 

  21. Kainen PC, Kůrková V, Vogt A (2001) Continuity of approximation by neural networks in \({L}_p\)-spaces. Ann Oper Res 101:143–147

    Article  MathSciNet  MATH  Google Scholar 

  22. Kainen PC, Kůrková V, Sanguineti M (2012) Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans Inf Theory 58:1203–1214

    Article  MathSciNet  MATH  Google Scholar 

  23. Kůrková V (1997) Dimension-independent rates of approximation by neural networks. In: Warwick K, Kárný M (eds) Computer-intensive methods in control and signal processing. Birkhäuser, Boston, pp 261–270 The Curse of Dimensionality

    Chapter  Google Scholar 

  24. Kůrková V (2012) Complexity estimates based on integral transforms induced by computational units. Neural Netw 33:160–167

    Article  MATH  Google Scholar 

  25. Kůrková V (2017) Sparsity of shallow networks representing finite mappings. In: Boracchi G (ed) Engineering applications of neural networks, vol CCIS 744. Springer, Berlin, pp 337–348

    Chapter  Google Scholar 

  26. Kůrková V (2018) Constructive lower bounds on model complexity of shallow perceptron networks. Neural Comput Appl 29:305–315

    Article  Google Scholar 

  27. Kůrková V, Sanguineti M (2008) Approximate minimization of the regularized expected error over kernel models. Math Oper Res 33:747–756

    Article  MathSciNet  MATH  Google Scholar 

  28. Kůrková V, Kainen PC (2014) Comparing fixed and variable-width Gaussian networks. Neural Netw 57:23–28

    Article  MATH  Google Scholar 

  29. Kůrková V, Sanguineti M (2016) Model complexities of shallow networks representing highly varying functions. Neurocomputing 171:598–604

    Article  Google Scholar 

  30. Kůrková V, Savický P, Hlaváčková K (1998) Representations and rates of approximation of real-valued Boolean functions by neural networks. Neural Networks 11:651–659

    Article  Google Scholar 

  31. Kůrková V, Sanguineti M (2017) Probabilistic lower bounds for approximation by shallow perceptron network. Neural Netw 91:34–41

    Article  Google Scholar 

  32. Laughlin SB, Sejnowski TJ (2003) Communication in neural networks. Science 301:1870–1874

    Article  Google Scholar 

  33. MacWilliams F, Sloane NA (1977) The theory of error-correcting codes. North Holland Publishing Co., New York

    MATH  Google Scholar 

  34. Mhaskar H, Liao Q, Poggio T (2016) Learning functions: when is deep better than shallow. CBMM Memo No. 045, May 31, 2016. https://arxiv.org/pdf/1603.00988v4.pdf. Accessed 29 May 2016

  35. Mhaskar H, Liao Q, Poggio T (2016) Learning real and Boolean functions: when is deep better than shallow. CBMM Memo No. 45, March 4, 2016. https://arxiv.org/pdf/1603.00988v1.pdf. Accessed 3 Mar 2016

  36. Pinkus A (1999) Approximation theory of the MLP model in neural networks. Acta Numer 8:143–195

    Article  MathSciNet  MATH  Google Scholar 

  37. Poggio T, Mhaskar H, Rosasco L, Miranda B, Liao Q (2017) Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int J Autom Comput. https://doi.org/10.1007/s11633-017-1054-2

    Google Scholar 

  38. Roychowdhury V, Siu KY, Orlitsky A (1994) Neural models and spectral methods. In: Roychowdhury V, Siu K, Orlitsky A (eds) Theoretical advances in neural computation and learning. Springer, New York, pp 3–36

    Chapter  Google Scholar 

  39. Schläfli L (1901) Theorie der Vielfachen Kontinuität. Zürcher & Furrer, Zürich

    Book  MATH  Google Scholar 

  40. Schroeder M (2009) Number theory in science and communication. Springer, Berlin

    MATH  Google Scholar 

  41. Tillmann A (2015) On the computational intractability of exact and approximate dictionary learning. IEEE Signal Process Lett 22:45–49

    Article  Google Scholar 

  42. Vaiter S, Peyre G, Dossal C, Fadili J (2013) Robust sparse analysis regularization. IEEE Trans Inf Theory 59:2001–2016

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the Czech Grant Foundation Grant GA15-18108S and institutional support of the Institute of Computer Science RVO 67985807.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Věra Kůrková.

Ethics declarations

Conflict of interest

The author declares that she has no conflict of interest.

Additional information

This work was partially supported by the Czech Grant Foundation Grants GA15-18108S, GA18-23827S and institutional support of the Institute of Computer Science RVO 67985807.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kůrková, V. Limitations of shallow networks representing finite mappings. Neural Comput & Applic 31, 1783–1792 (2019). https://doi.org/10.1007/s00521-018-3680-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3680-1

Keywords

Navigation