Python替换行符导致文字丢失

Python 替换行符导致文字丢失:

从网页爬取一段文字:

Consider the half-eigenvalue problem [公式] a.e.

[公式], where [公式], [公式], [公式] for [公式], and [公式] and [公式] are indefinite integrable weights in the Lebesgue

space [公式]. We characterize the spectra structure under periodic, antiperiodic, Dirichlet, and Neumann boundary conditions, respectively. Furthermore, all these half-eigenvalues are continuous in [公式], where [公式] denotes the weak topology in [公式] space. The Dirichlet and the Neumann half-eigenvalues are continuously Fréchet differentiable in [公式], where [公式] is the [公式] norm of [公式].

我使用 python 字符串 replace()方法将 \n 替换为空,出现了这样的情况:

space [公式]. We characterize the spectra structure under periodic, antiperiodic, Dirichlet, and Neumann boundary conditions, respectively. Furthermore, all these half-eigenvalues are continuous in [公式], where [公式] denotes the weak topology in [公式] space. The Dirichlet and the Neumann half-eigenvalues are continuously Fréchet differentiable in [公式], where [公式] is the [公式] norm of [公式].

对,没错,上面两行数据莫名消失了!!

我仅仅是这么做了而已:

content.replace('\n', '')

到底出了什么问题呢?

答案超乎我的意料——

我们加一个方括号,这样就能查看编码了,于是:

print([content])

它的打印结果是:

['Consider the half-eigenvalue problem [公式] a.e. \r\n[公式], where [公式], [公式], [公式] for [公式], and [公式] and [公式] are indefinite integrable weights in the Lebesgue\r\nspace [公式]. We characterize the spectra structure under periodic, antiperiodic, Dirichlet, and Neumann boundary conditions, respectively. Furthermore, all these half-eigenvalues are continuous in [公式], where [公式] denotes the weak topology in [公式] space. The Dirichlet and the Neumann half-eigenvalues are continuously Fréchet differentiable in [公式], where [公式] is the [公式] norm of [公式].']

居然含有\r(极有可能是 windows 操作系统特有的问题)我们如何还原呢?当然是把\r 也替换掉

print(content.replace('\r','').replace('\n',''))

这样就没有问题了,至于为什么不直接写为 replace(‘\r\n’,’’),你猜有没有可能有些字符串中没有\r 但有些有,所以分开写不会报错